일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
- AI
- SQL
- ggplot
- NLP
- word2vec
- 기계학습
- 그래프시각화
- 그래프
- 데이터
- 자연어처리
- R그래프
- Deeplearning
- 빅데이터
- r
- lstm
- CNN
- 하둡
- 주가예측
- 딥러닝
- 머신러닝
- 데이터처리
- Python
- R시각화
- pandas
- 데이터시각화
- Hadoop
- 빅데이터처리
- 데이터분석
- HIVE
- R프로그래밍
- Today
- Total
욱이의 냉철한 공부
[NLP 개념정리] 이해하기 쉬운 분류체계 본문
* Word Representation 관점 (Word Embedding)
1. Discrete Representation : Local Representation
1) One - hot Vector
- One - hot Vector
2) Count Based
- Bag of Words (BoW)
- Document-Term Matrix (DTM)
- (TDM)
- Term Frequency-Inverse Document Frequency (TF - IDF)
- N-gram Language Model (N-gram)
2. Continuous Representation
1) Prediction Based (Distributed Representation)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Word2Vec
- FastText
- Embedding from Language Model (ELMo) (Bidirecional Language Model (biLM) 활용)
2) CountBased (Full Document)
- Latent Semantic Analysis (LSA)) <-DTM
3) Prediction Based and CountBased (Windows)
- GloVe
*
Discrete Representation은 값 그 자체를 표현, 정수로 표현된 이산표현
Continuous Representation은 관계, 속성의미를 내포하여 표현, 실수로 표현된 연속표현
* Language Model 관점
1. Statistical Language Model
1) Prediction Based
- N-gram Language Model (N-gram)
- Naive Bayes Classifier
2) Topic Modeling
- Latent Semantic Analysis (LSA)
- Latent Dirichlet Allocation (LDA)
2. Neural Network Based Language Model
1) Prediction Based
- MultiLayer Perceptron (MLP)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Recurrent Neural Network Language Model (RNNLM)
- Char Recurrent Neural Network Language Model (Char RNNLM)
- Bidirecional Language Model (biLM)
=> 평가방법 : Perplexity (PPL)
* TASK 관점
1. Text Classification
- Naive Bayes Classifier
- Recurrent Neural Network (RNN) : many to one
- Long Short-Term Memory (LSTM) : many to one
=> 평가방법 : F1 score 고려
2. Part-of-speech Tagging(POS Tagging), Named Entity Recognition
- Bidirecional Long Short-Term Memory (Bi-LSTM) : many to many
- Bidirecional Long Short-Term Memory(Bi-LSTM) + Conditional Random Field (CRF) : many to many
=> 평가방법 : F1 score 고려
3. Machine Translation, Chatbot, Test Summarization, Speech to Text
- Long Short-Term Memory (LSTM) : sequence to sequence (seq2seq)
- Bidirecional Long Short-Term Memory (Bi-LSTM) +
Attention Mechanism : sequence to sequence (seq2seq)
- Transformer
=> 평가방법 : Bilingual Evaluation Understudy Score (BLEU Score)
4. Image Captioning
'데이터과학 > 개념 : NLP' 카테고리의 다른 글
[NLP 개념정리] Word Representation : 카운트 기반 단어표현 (0) | 2020.04.16 |
---|---|
[NLP 개념정리] Word Embedding : GloVe (0) | 2020.04.16 |
[NLP 개념정리] Word Embedding : Word2Vec (0) | 2020.04.16 |
[NLP 개념정리] Word Embedding : NNLM (1) | 2020.04.16 |
[NLP 개념정리] Word Embedding 개요 (0) | 2020.04.16 |