Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 574 Bytes

OnTextAndDL.md

File metadata and controls

11 lines (8 loc) · 574 Bytes

Ideas to keep in mind

  1. Using Word2Vec to get vectors, but maybe retraining with technical terms and vocabulary inherent to the domain.
  2. Either trying to capture attention (HAN style) or weighted average using TFIFD over the given vector outputs.
  3. Using PCA in the vectors to reduce the dimensionality.
  4. Try convolutional layer with flat kernels (1D).
  5. Word embedding avec BERT, XLNet.

Lots of NLP notebooks: check out https://github.com/nlptown/nlp-notebooks

https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e