[NLP 개념정리] 이해하기 쉬운 분류체계

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

욱이의 냉철한 공부

[NLP 개념정리] 이해하기 쉬운 분류체계 본문

데이터과학/개념 : NLP

[NLP 개념정리] 이해하기 쉬운 분류체계

냉철한 욱 2020. 4. 16. 19:23

* Word Representation 관점 (Word Embedding)

1. Discrete Representation : Local Representation

1) One - hot Vector

- One - hot Vector

2) Count Based

- Bag of Words (BoW)

- Document-Term Matrix (DTM)

- (TDM)

- Term Frequency-Inverse Document Frequency (TF - IDF)

- N-gram Language Model (N-gram)

2. Continuous Representation

1) Prediction Based (Distributed Representation)

- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)

- Word2Vec

- FastText

- Embedding from Language Model (ELMo) (Bidirecional Language Model (biLM) 활용)

2) CountBased (Full Document)

- Latent Semantic Analysis (LSA)) <-DTM

3) Prediction Based and CountBased (Windows)

- GloVe

Discrete Representation은 값 그 자체를 표현, 정수로 표현된 이산표현

Continuous Representation은 관계, 속성의미를 내포하여 표현, 실수로 표현된 연속표현

* Language Model 관점

1. Statistical Language Model

1) Prediction Based

- N-gram Language Model (N-gram)

- Naive Bayes Classifier

2) Topic Modeling

- Latent Semantic Analysis (LSA)

- Latent Dirichlet Allocation (LDA)

2. Neural Network Based Language Model

1) Prediction Based

- MultiLayer Perceptron (MLP)

- Neural Network Language Model (NNLM) or Neural Probabilistic Language Model (NPLM)

- Recurrent Neural Network Language Model (RNNLM)

- Char Recurrent Neural Network Language Model (Char RNNLM)

- Bidirecional Language Model (biLM)

=> 평가방법 : Perplexity (PPL)

* TASK 관점

1. Text Classification

- Naive Bayes Classifier

- Recurrent Neural Network (RNN) : many to one

- Long Short-Term Memory (LSTM) : many to one

=> 평가방법 : F1 score 고려

2. Part-of-speech Tagging(POS Tagging), Named Entity Recognition

- Bidirecional Long Short-Term Memory (Bi-LSTM) : many to many

- Bidirecional Long Short-Term Memory(Bi-LSTM) + Conditional Random Field (CRF) : many to many

=> 평가방법 : F1 score 고려

3. Machine Translation, Chatbot, Test Summarization, Speech to Text

- Long Short-Term Memory (LSTM) : sequence to sequence (seq2seq)

- Bidirecional Long Short-Term Memory (Bi-LSTM) +

Attention Mechanism : sequence to sequence (seq2seq)

- Transformer

=> 평가방법 : Bilingual Evaluation Understudy Score (BLEU Score)

4. Image Captioning

'데이터과학 > 개념 : NLP' 카테고리의 다른 글

[NLP 개념정리] Word Representation : 카운트 기반 단어표현 (0)	2020.04.16
[NLP 개념정리] Word Embedding : GloVe (0)	2020.04.16
[NLP 개념정리] Word Embedding : Word2Vec (0)	2020.04.16
[NLP 개념정리] Word Embedding : NNLM (1)	2020.04.16
[NLP 개념정리] Word Embedding 개요 (0)	2020.04.16

'데이터과학/개념 : NLP' Related Articles

욱이의 냉철한 공부

[NLP 개념정리] 이해하기 쉬운 분류체계 본문

[NLP 개념정리] 이해하기 쉬운 분류체계

* Word Representation 관점 (Word Embedding)

* Language Model 관점

* TASK 관점

'데이터과학 > 개념 : NLP' 카테고리의 다른 글

티스토리툴바