- this particular repo is tuned for Python3 instead of Python2 as the original repo was for.
This code implements the model discussed in Deep Learning-Based Document Modeling for Personality Detection from Text for detection of Big-Five personality traits, namely:
- Extroversion
- Neuroticism
- Agreeableness
- Conscientiousness
- Openness
- Ubuntu 16.0.4 64bit (Tested)
- Python 3 (Tested)
- Theano 1.0.4 (Tested)
- Pandas 0.24.2 (Tested)
- Pre-trained GoogleNews word2vec vector (If you are using ssh try this)
process_data.py
prepares the data for training. It requires three command-line arguments:
- Path to google word2vec file (
GoogleNews-vectors-negative300.bin
) - Path to
essays.csv
file containing the annotated dataset - Path to
mairesse.csv
containing Mairesse features for each sample/essay
This code generates a pickle file essays_mairesse.p
.
Example:
python process_data.py ./GoogleNews-vectors-negative300.bin ./essays.csv ./mairesse.csv
A. Running using CPU
- Configure ~./theanorc:
[global]
floatX=float64
OMP_NUM_THREADS=20
openmp=True
B. Running using GPU
- Install libgpuarray
- Install cuDNN for faster training
- Add CUDA path to .bashrc:
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
PATH=${CUDA_HOME}/bin:${PATH}
export PATH
- Configure ~/.theanorc:
[cuda]
root=/usr/local/cuda
[global]
device=cuda
floatX = float32
OMP_NUM_THREADS=20
openmp=True
[nvcc]
fastmath=True
Note: Before these changes, every epoch took about 5 hours to complete. After them, it took less than an hour on CPU and about 45s on GPU (Improvements depend on your system spec)
A. Running on GPU
conv_net_train_gpu.py
trains and tests the model using GPU.(Alternatively, you can run "run.sh" and train all traits using word2vec at once)
B. Running on CPU
conv_net_train.py
trains and tests the model using CPU.
Both scripts require three command-line arguments:
- Mode:
-static
: word embeddings will remain fixed-nonstatic
: word embeddings will be trained
- Word Embedding Type:
-rand
: randomized word embedding (dimension is 300 by default; is hardcoded; can be changed by modifying default value ofk
in line 111 ofprocess_data.py
)-word2vec
: 300 dimensional google pre-trained word embeddings
- Personality Trait:
0
: Extroversion1
: Neuroticism2
: Agreeableness3
: Conscientiousness4
: Openness
Example:
python conv_net_train.py -static -word2vec 2
If you use this code in your work then please cite the paper - Deep Learning-Based Document Modeling for Personality Detection from Text with the following:
@ARTICLE{7887639,
author={N. Majumder and S. Poria and A. Gelbukh and E. Cambria},
journal={IEEE Intelligent Systems},
title={{Deep} Learning-Based Document Modeling for Personality Detection from Text},
year={2017},
volume={32},
number={2},
pages={74-79},
keywords={feedforward neural nets;information filtering;learning (artificial intelligence);pattern classification;text analysis;Big Five traits;author personality type;author psychological profile;binary classifier training;deep convolutional neural network;deep learning based method;deep learning-based document modeling;document vector;document-level Mairesse features;emotionally neutral input sentence filtering;identical architecture;personality detection;text;Artificial intelligence;Computational modeling;Emotion recognition;Feature extraction;Neural networks;Pragmatics;Semantics;artificial intelligence;convolutional neural network;distributional semantics;intelligent systems;natural language processing;neural-based document modeling;personality},
doi={10.1109/MIS.2017.23},
ISSN={1541-1672},
month={Mar},}