This is a sentiment classification using nlp for my course. The project utilizes a simple feedforward neural network (FNN) to classify the sentences into negative (0) or positive (1) class.
- train.txt: is used for training, the pattern within the training data is
[label] [sentence]
- dev.txt: is used for testing, the pattern within the testing data is
[label] [sentence]
- glove file: contains multiple word embeddings for words, the pattern within the glove file is
[word] [embeddings]
- Data is prepared by (first) loading the glove file into a dictionary which will be used to map words.
- Training & dev files are split by lines, normalized into lower case and all special symbols were removed.
- The lines are then tokenized using WhiteSpaceTokenizer()
A custom dataset was created using torch.utils.data.dataset.
Its functions include __len__
and __getitem__
The __getitem__
function utilizes the glove file to map inputs into embeddings, and return the torch version of both inputs and features.
The dataloader was also used to load train and dev data for training & evaluation.
The model has a simple architecture: first, inputs are being passed into a Linear layer which transform the inputs by multiplying it with a matrix, and uses a ReLU function to activates it. The next layer uses another Linear function to transform the hidden layer outputs, and uses a sigmoid function to map it to the correct output.
The training model was optimized using Adam optimizer with a learning rate of 1e-4.
- Collater could be used to ensure the width of training is covering all sentences length
- Further text preprocessing to remove stop words, recognize Named Entity
- Loading training data in batch size