Ideas to keep in mind
- Using Word2Vec to get vectors, but maybe retraining with technical terms and vocabulary inherent to the domain.
- Either trying to capture attention (HAN style) or weighted average using TFIFD over the given vector outputs.
- Using PCA in the vectors to reduce the dimensionality.
- Try convolutional layer with flat kernels (1D).
- Word embedding avec BERT, XLNet.
Lots of NLP notebooks: check out https://github.com/nlptown/nlp-notebooks