This project is to find what people are talking about Dr. Robert Malone on twitter recently. Who is Dr. Robert Malone? He is a virologist and immunologist. I never heard about him until his recent remarks about warning parents to be prudent to decide if to give their children Covid_19 shots. And later one of his interviews went virus, which caused his ban from Twitter. I would like to know that when people mention Robert Malone and the ban, what their main topics are? Positive or Negative?
I use Twitter API Tweepy help pull related tweets from Twitter, with search phrase "Robert Malone" and "Dr Malone". I am able to obtain over 20k tweets, each includes info of twitter username, ID, user description, tweet text, retweet number and etc. Different NLP techniques will be used, like NLP preprocessing, sentiment analysis, vectorization, dimensionality reductions, top modeling and clustering.
Tweets has been pulled and transformed into data frame. During preprocessing, I removed all url link and tags. Sentiment analysis is used for finding out people's attitude towards Dr. Robert Malone.
I also created a few baseline models using TfidfVectorizer, with an LSA topic modeler ( topic number is 10)
Most people hold a neutral attitude when their tweets revolves around Dr. Robert Malone, according to compound score proved by SentimentIntensityAnalyzer.
The Baseline top modeling shoes a initial topics like the following:
- Try use spacy package do better text preprocessing.
- Try do analysis for people hold different attitude by using their location and author description.
- Try to use different vectorization and topic modeling algorithm/ packages with tuning hyper parameters to find topics more interpretable.
- Try DeepMoji for Mojies in the text.
- Using clustering techniques to find users with similar opinion/attitude towards Dr. Robert Malone.