[NLP 개념정리] 이해하기 쉬운 분류체계
* Word Representation 관점 (Word Embedding)
1. Discrete Representation : Local Representation
1) One - hot Vector
- One - hot Vector
2) Count Based
- Bag of Words (BoW)
- Document-Term Matrix (DTM)
- (TDM)
- Term Frequency-Inverse Document Frequency (TF - IDF)
- N-gram Language Model (N-gram)
2. Continuous Representation
1) Prediction Based (Distributed Representation)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Word2Vec
- FastText
- Embedding from Language Model (ELMo) (Bidirecional Language Model (biLM) 활용)
2) CountBased (Full Document)
- Latent Semantic Analysis (LSA)) <-DTM
3) Prediction Based and CountBased (Windows)
- GloVe
*
Discrete Representation은 값 그 자체를 표현, 정수로 표현된 이산표현
Continuous Representation은 관계, 속성의미를 내포하여 표현, 실수로 표현된 연속표현
* Language Model 관점
1. Statistical Language Model
1) Prediction Based
- N-gram Language Model (N-gram)
- Naive Bayes Classifier
2) Topic Modeling
- Latent Semantic Analysis (LSA)
- Latent Dirichlet Allocation (LDA)
2. Neural Network Based Language Model
1) Prediction Based
- MultiLayer Perceptron (MLP)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Recurrent Neural Network Language Model (RNNLM)
- Char Recurrent Neural Network Language Model (Char RNNLM)
- Bidirecional Language Model (biLM)
=> 평가방법 : Perplexity (PPL)
* TASK 관점
1. Text Classification
- Naive Bayes Classifier
- Recurrent Neural Network (RNN) : many to one
- Long Short-Term Memory (LSTM) : many to one
=> 평가방법 : F1 score 고려
2. Part-of-speech Tagging(POS Tagging), Named Entity Recognition
- Bidirecional Long Short-Term Memory (Bi-LSTM) : many to many
- Bidirecional Long Short-Term Memory(Bi-LSTM) + Conditional Random Field (CRF) : many to many
=> 평가방법 : F1 score 고려
3. Machine Translation, Chatbot, Test Summarization, Speech to Text
- Long Short-Term Memory (LSTM) : sequence to sequence (seq2seq)
- Bidirecional Long Short-Term Memory (Bi-LSTM) +
Attention Mechanism : sequence to sequence (seq2seq)
- Transformer
=> 평가방법 : Bilingual Evaluation Understudy Score (BLEU Score)
4. Image Captioning