[NLP 개념정리] 이해하기 쉬운 분류체계

데이터과학/개념 : NLP

[NLP 개념정리] 이해하기 쉬운 분류체계

냉철한 욱 2020. 4. 16. 19:23

* Word Representation 관점 (Word Embedding)

1. Discrete Representation : Local Representation

1) One - hot Vector

- One - hot Vector

2) Count Based

- Bag of Words (BoW)

- Document-Term Matrix (DTM)

- (TDM)

- Term Frequency-Inverse Document Frequency (TF - IDF)

- N-gram Language Model (N-gram)

2. Continuous Representation

1) Prediction Based (Distributed Representation)

- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)

- Word2Vec

- FastText

- Embedding from Language Model (ELMo) (Bidirecional Language Model (biLM) 활용)

2) CountBased (Full Document)

- Latent Semantic Analysis (LSA)) <-DTM

3) Prediction Based and CountBased (Windows)

- GloVe

Discrete Representation은 값 그 자체를 표현, 정수로 표현된 이산표현

Continuous Representation은 관계, 속성의미를 내포하여 표현, 실수로 표현된 연속표현

* Language Model 관점

1. Statistical Language Model

1) Prediction Based

- N-gram Language Model (N-gram)

- Naive Bayes Classifier

2) Topic Modeling

- Latent Semantic Analysis (LSA)

- Latent Dirichlet Allocation (LDA)

2. Neural Network Based Language Model

1) Prediction Based

- MultiLayer Perceptron (MLP)

- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)

- Recurrent Neural Network Language Model (RNNLM)

- Char Recurrent Neural Network Language Model (Char RNNLM)

- Bidirecional Language Model (biLM)

=> 평가방법 : Perplexity (PPL)

* TASK 관점

1. Text Classification

- Naive Bayes Classifier

- Recurrent Neural Network (RNN) : many to one

- Long Short-Term Memory (LSTM) : many to one

=> 평가방법 : F1 score 고려

2. Part-of-speech Tagging(POS Tagging), Named Entity Recognition

- Bidirecional Long Short-Term Memory (Bi-LSTM) : many to many

- Bidirecional Long Short-Term Memory(Bi-LSTM) + Conditional Random Field (CRF) : many to many

=> 평가방법 : F1 score 고려

3. Machine Translation, Chatbot, Test Summarization, Speech to Text

- Long Short-Term Memory (LSTM) : sequence to sequence (seq2seq)

- Bidirecional Long Short-Term Memory (Bi-LSTM) +

Attention Mechanism : sequence to sequence (seq2seq)

- Transformer

=> 평가방법 : Bilingual Evaluation Understudy Score (BLEU Score)

4. Image Captioning