- In a modern word that required fast robust and simple development process with Machine Learning, AI and Deep Learning, there are countless of projects require Natural Language Processing (NLP) classification problems such as Commodity Classification, Company Type Classification, Food Type classification, etc.
More and more people want to train, test and deploy NLP classification model without having to know the background of advanced in programming and AI knowledge. - This Framework will allow everyone to train, test, save and load their own model and deploy it wherever they want with some simple lines of code.
- It is recommended to create a virtual environment for your project when using
CatMod
as it will download and install packages and dependencies that might conflict with your dependencies on your machine. - If you don't mind about the version of the libraries listed in the
requirements.txt
you can leave it as it is.
- You can you pip install to download the project on your computer.
pip install cat-mod
- Import
CatMod
in your python file.
from cat_mod import CatMod
-
Download GloVe Embedded Vectors File to the desired folder.
-
Instanciate a new instance with a GloVe Embedded Vectors File.
cat = CatMod('[your_GloVe_file_path]')
e.g.
file_path = 'C:/User/Desktop/glove.6B.50d.txt'
cat = CatMod(glove_file = file_path)
file_path = 'Machintosh HD/Users/yourName/Desktop/glove.6B.50d.txt'
cat = CatMode(glove_file = file_path)
This Framework will allow you to input a .csv file with many columns but you have to specify 2 columns corresponding to values (X) and targets (Y).
Let's say you have a csv file product.csv
with columns look like this
company name | product name | category |
---|---|---|
... | ... | ... |
- You can use 1 out of 2 ways to load the csv file and load the pre-defined model into the instance.
cat.load_csv('[your_csv_file_path]', '[X_column_name]', '[Y_column_name]')
cat.load_model()
e.g.
cat.load_csv('product.csv', 'product name', 'category')
cat.load_model()
OR THE RECOMMENDED WAY
cat.load_model('[your_csv_file_path]', '[X_column_name]', '[Y_column_name]')
e.g.
cat.load_model('product.csv', 'product name', 'category')
We can also specify how many LSTM layers you want by adding the corresponding parameter.
cat.load_model('product.csv', 'product name', 'category', num_of_LSTM = 4)
Then we just do one more easy step:
cat.train([number_of_iterations])
e.g.
cat.train(10)
If the number of iterations is not specified, the number of iteration is 50. e.g.
cat.train() # 50 iterations
After training you can save your model on your local machine by using .save_weights([name])
method. (No file name suffix is needed)
cat.save_weights('my_model')
If the model is saved successfully we will see the folder appear in the same folder of your project
ProjectFolder
|---main.py
|---my_model
| |---...
| |
|
...
When we have saved the training file, we can reuse it in the future by just loading it back to a new instance.
There are 2 ways of doing it.
The RECOMMENDED way:
from cat_mod import CatMod
new_cat = CatMod(load_mode = True, load_file = 'my_model')
The other way:
from cat_mod import CatMod
new_cat = CatMod(glove_file_path = [the_GloVe_file_path_but_it_must_have_the_same_dimension_with_the_pre_trained_model])
new_cat.load_weights('my_model')
Prediction the the most easiest and provide many customization so that everyone can predict and export the predict result in .pd, .csv, .xlsx at their own need. e.g.
X = df['X']
new_cat.predict(X, to_csv = True, file_name = 'my_prediction')
The result will export out the csv file that have both column X and Y together.