욱이의 냉철한 공부
[NLP 개념정리] 이해하기 쉬운 분류체계 본문
* Word Representation 관점 (Word Embedding)
1. Discrete Representation : Local Representation
1) One - hot Vector
- One - hot Vector
2) Count Based
- Bag of Words (BoW)
- Document-Term Matrix (DTM)
- (TDM)
- Term Frequency-Inverse Document Frequency (TF - IDF)
- N-gram Language Model (N-gram)
2. Continuous Representation
1) Prediction Based (Distributed Representation)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Word2Vec
- FastText
- Embedding from Language Model (ELMo) (Bidirecional Language Model (biLM) 활용)
2) CountBased (Full Document)
- Latent Semantic Analysis (LSA)) <-DTM
3) Prediction Based and CountBased (Windows)
- GloVe
*
Discrete Representation은 값 그 자체를 표현, 정수로 표현된 이산표현
Continuous Representation은 관계, 속성의미를 내포하여 표현, 실수로 표현된 연속표현
* Language Model 관점
1. Statistical Language Model
1) Prediction Based
- N-gram Language Model (N-gram)
- Naive Bayes Classifier
2) Topic Modeling
- Latent Semantic Analysis (LSA)
- Latent Dirichlet Allocation (LDA)
2. Neural Network Based Language Model
1) Prediction Based
- MultiLayer Perceptron (MLP)
- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)
- Recurrent Neural Network Language Model (RNNLM)
- Char Recurrent Neural Network Language Model (Char RNNLM)
- Bidirecional Language Model (biLM)
=> 평가방법 : Perplexity (PPL)
* TASK 관점
1. Text Classification
- Naive Bayes Classifier
- Recurrent Neural Network (RNN) : many to one
- Long Short-Term Memory (LSTM) : many to one
=> 평가방법 : F1 score 고려
2. Part-of-speech Tagging(POS Tagging), Named Entity Recognition
- Bidirecional Long Short-Term Memory (Bi-LSTM) : many to many
- Bidirecional Long Short-Term Memory(Bi-LSTM) + Conditional Random Field (CRF) : many to many
=> 평가방법 : F1 score 고려
3. Machine Translation, Chatbot, Test Summarization, Speech to Text
- Long Short-Term Memory (LSTM) : sequence to sequence (seq2seq)
- Bidirecional Long Short-Term Memory (Bi-LSTM) +
Attention Mechanism : sequence to sequence (seq2seq)
- Transformer
=> 평가방법 : Bilingual Evaluation Understudy Score (BLEU Score)
4. Image Captioning
'데이터과학 > 개념 : NLP' 카테고리의 다른 글
[NLP 개념정리] Word Representation : 카운트 기반 단어표현 (0) | 2020.04.16 |
---|---|
[NLP 개념정리] Word Embedding : GloVe (0) | 2020.04.16 |
[NLP 개념정리] Word Embedding : Word2Vec (0) | 2020.04.16 |
[NLP 개념정리] Word Embedding : NNLM (1) | 2020.04.16 |
[NLP 개념정리] Word Embedding 개요 (0) | 2020.04.16 |