This is a draft.
In 2020, NLP is a study to try to understand human languages with computers. From my common sense, it is pragmatically impossible for human to “understand” our languages in numbers.
Yes, of course, scientism peopls could say “our world constitutes of quantum mechanical particles, and if we have enough machine power we can simulate a humam. So language is understandable with numbers.” But you should think once how many resource do you need to compute a human brain. This is why I said “pragmatically impossible.”
But this challenge is very very interesting for me! I started to lean NLP because I want to see the development of this challenge. I’m not a specialist in this field at all, but in the very first step of the learning I recognized most important part of the challenge are laid in this word-number translations.
As of 2020 April, I’ve seen in the Internet that Word2Vec is a common way to do that so I start to investigate it.
The idea is very simple so I thought it was invented around more than 20 years ago, but according to the Wikipedia page, the idea was published in 201 the idea was published in 2013.
I could get an overview of Word2Vec with this video in a bed. Awesome.
https://www.youtube.com/watch?v=64qSgA66P-8
Funny practice.
https://www.youtube.com/watch?v=zFScws0mb7M
His code.
https://github.com/SmokinCaterpillar/doc2vec_user_comments
I want to learn Doc2Vec and the next page is pretty awesome.
https://shuzhanfan.github.io/2018/08/understanding-word2vec-and-doc2vec/
About Negative sampling
https://aegis4048.github.io/optimize_computational_efficiency_of_skip-gram_with_negative_sampling
RNN for POS tagging https://www.youtube.com/watch?v=2AuMgtw-z6s
(To Be added.)
word2Vec single layer NN.
Original Paper of Doc2Vec.
https://arxiv.org/abs/1405.4053
spacy + doc2vec
https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/
Attention
https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
https://explosion.ai/blog/sense2vec-with-spacy
https://www.youtube.com/watch?v=64qSgA66P-8
need to be checked.