Machine learning model for bot detection #8

andreaschandra · 2020-05-17T14:48:03Z

a given topic or hashtag, we want to see if the population of tweets more likely to flood by buzzer or user organic

or

given a buzzer account, we want to see the major topics to buzzing about

This task includes

feature engineering (need to do text cleansing, preprocessing)
baseline model
early fine-tuning
evaluation
define feature set

rubentea16 · 2020-10-17T13:40:42Z

Prepare Social Politics Word Dictionary (SPWD)

Propose Feature Set :

username
name
is_name_social_political
desc
tweets
n_tweet
quoted_tweets
hashtag
n_tweet_use_hashtag
ratio_tweets_use_hashtag
n_photo
n_video
content_url

Feature Engineering :

andreaschandra · 2020-11-29T04:40:19Z

@rubentea16 kalo beragam teknik tapi scorenya masih jelek, mungkin labelingnya kurang konsisten atau kurang banyak

andreaschandra · 2020-12-05T03:40:09Z

Baseline model result @rubentea16

BernouliNB
accuracy: 0.78 | precision: 0.60 | recall: 0.21 | f score: 0.32

Linear SVM
accuracy: 0.85 | precision: 0.74 | recall: 0.57 | f score: 0.64

Random Forest
accuracy: 0.82 | precision: 0.74 | recall: 0.43 | f score: 0.54

Gradient Boosting
accuracy: 0.84 | precision: 0.73 | recall: 0.55 | f score: 0.63

AdaBoost
accuracy: 0.81 | precision: 0.63 | recall: 0.58 | f score: 0.60

rubentea16 · 2020-12-05T12:50:08Z

Baseline model result @rubentea16

BernouliNB
accuracy: 0.78 | precision: 0.60 | recall: 0.21 | f score: 0.32

Linear SVM
accuracy: 0.85 | precision: 0.74 | recall: 0.57 | f score: 0.64

Random Forest
accuracy: 0.82 | precision: 0.74 | recall: 0.43 | f score: 0.54

Gradient Boosting
accuracy: 0.84 | precision: 0.73 | recall: 0.55 | f score: 0.63

AdaBoost
accuracy: 0.81 | precision: 0.63 | recall: 0.58 | f score: 0.60

ini pake feature apa aja?

andreaschandra · 2020-12-06T10:11:32Z

@rubentea16 tweets aja, cek ini https://github.com/jakartaresearch/adi-buzzer/blob/dev/notebook/40_buzzer_classifier.ipynb

rubentea16 · 2021-01-17T08:29:46Z

Performance Benchmark

Notes :

multiple_feat = tweets, user_desc, is_name_social_political, ratio_tweets_use_hashtag, n_tweet, n_photo, n_video
single_feat = tweets
RFC = Random Forest Classifier(n_estimator=400)

Model	Desc	Features	Word Embedding	Accuracy	Precision	Recall	F1-score
RFC	-	multiple-feat	TF-IDF	0.84	0.75	0.33	0.45
RFC	-	single-feat	TF-IDF	0.84	0.72	0.35	0.47
SMOTE+RFC	Oversampling train data (Minor class)	multiple-feat	TF-IDF (desc = 3K dim & tweet = 50K dim)	0.86	0.66	0.62	0.64
SMOTE+RFC	Oversampling train data (Minor class)	single-feat	BPE (tweet = 300 dim)	0.86	0.68	0.57	0.62
SMOTE+SVC(default)	Oversampling train data (Minor class)	single-feat	BPE (tweet = 300 dim)	0.84	0.59	0.73	0.65
SMOTE+XGBoost(default)	Oversampling train data (Minor class)	single-feat	BPE (tweet = 300 dim)	0.86	0.66	0.62	0.64

andreaschandra · 2021-01-24T12:16:33Z

0.64

interesting

andreaschandra · 2021-03-27T03:57:16Z

Result after QA label

Algo	acc	pre	rec	fsc
Bernouli NB	accuracy: 0.78	precision: 0.75	recall: 0.21	f score: 0.33
SVM	accuracy: 0.85	precision: 0.75	recall: 0.60	f score: 0.67
Random Forest	accuracy: 0.81	precision: 0.77	recall: 0.34	f score: 0.47
Gradient Boosting	accuracy: 0.84	precision: 0.78	recall: 0.53	f score: 0.63
AdaBoost	accuracy: 0.82	precision: 0.67	recall: 0.56	f score: 0.61

andreaschandra · 2021-04-03T09:14:46Z

Algo	acc	pre	rec	fsc
Bernouli NB	accuracy: 0.82	precision: 0.54	recall: 0.69	f score: 0.61
SVM	accuracy: 0.87	precision: 0.69	recall: 0.65	f score: 0.67
RF	accuracy: 0.85	precision: 0.74	recall: 0.43	f score: 0.54
Gradient Boosting	accuracy: 0.87	precision: 0.72	recall: 0.54	f score: 0.62
AdaBoost	accuracy: 0.84	precision: 0.60	recall: 0.56	f score: 0.58

andreaschandra added pipeline: model the data need to be modeled priority: soon type: feature work: complex labels Oct 16, 2020

andreaschandra assigned rubentea16 Oct 16, 2020

andreaschandra changed the title ~~Machine learning model for buzzer detection~~ Machine learning model for bot detection Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine learning model for bot detection #8

Machine learning model for bot detection #8

andreaschandra commented May 17, 2020 •

edited by rubentea16

Loading

rubentea16 commented Oct 17, 2020 •

edited

Loading

andreaschandra commented Nov 29, 2020

andreaschandra commented Dec 5, 2020 •

edited

Loading

rubentea16 commented Dec 5, 2020

andreaschandra commented Dec 6, 2020 •

edited

Loading

rubentea16 commented Jan 17, 2021 •

edited

Loading

andreaschandra commented Jan 24, 2021

andreaschandra commented Mar 27, 2021 •

edited

Loading

andreaschandra commented Apr 3, 2021

Machine learning model for bot detection #8

Machine learning model for bot detection #8

Comments

andreaschandra commented May 17, 2020 • edited by rubentea16 Loading

rubentea16 commented Oct 17, 2020 • edited Loading

andreaschandra commented Nov 29, 2020

andreaschandra commented Dec 5, 2020 • edited Loading

rubentea16 commented Dec 5, 2020

andreaschandra commented Dec 6, 2020 • edited Loading

rubentea16 commented Jan 17, 2021 • edited Loading

Performance Benchmark

andreaschandra commented Jan 24, 2021

andreaschandra commented Mar 27, 2021 • edited Loading

andreaschandra commented Apr 3, 2021

andreaschandra commented May 17, 2020 •

edited by rubentea16

Loading

rubentea16 commented Oct 17, 2020 •

edited

Loading

andreaschandra commented Dec 5, 2020 •

edited

Loading

andreaschandra commented Dec 6, 2020 •

edited

Loading

rubentea16 commented Jan 17, 2021 •

edited

Loading

andreaschandra commented Mar 27, 2021 •

edited

Loading