Personal-ITY is a corpus of data collected from YouTube annotated with MBTI personality labels.
Language: Italian
- 1048 users
- 92 average comments/user
- 115 average tokens/comment
Each line in the files is about one author. The information present are, in order, divided by a "\t":
- YouTube username
- MBTI label
- list of comments
In the pre-processed file:
- the punctuation is spaced
- URLs, hashtags, usernames and emojis are replaced with four distinctive labels
- comments are concatenated
MBTI traits:
- Extravert - Introvert
- Sensing - iNtuition
- Thinking - Feeling
- Judding - Perceiving
MBTI personality types:
- INTJ Architect
- INTP Logician
- ENTJ Commander
- ENTP Debater
- INFJ Advocate
- INFP Mediator
- ENFJ Protagonist
- ENFP Campaigner
- ISTJ Logistician
- ISFJ Defender
- ESTJ Executive
- ESFJ Consul
- ISTP Virtuoso
- ISFP Adventurer
- ESTP Entrepreneur
- ESFP Entertainer
Personal-ITY is described in:
- Elisa Bassignana, Malvina Nissim, Viviana Patti. Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in Italian. In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)
- Elisa Bassignana, Malvina Nissim, Viviana Patti. Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality. In Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media (PEOPLES 2020)