-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module for classifiers and interaction analysis #6
Comments
I can take this issue up. Can someone tell me what the inputs and outputs of the model are? |
Inputs to the model are the tweet text, username of tweeter, number of existing retweets and likes. Output label is binary (1 or 0) determining whether the tweet is interesting or not to our user. These labels have been assigned in the train set based on whether has interacted (liked or retweeted) a tweet or not. |
Of course, we could do generate some of our own features from these too and/or introduce more inputs (if available) to the model. That's up to you |
@adiah80 had suggested collaborative ranking/clustering which is used in recommendation systems. This paper will be probably be useful to check out first |
Yeah cool. But I was thinking model output will be topic interested in. Like a multiclass classification problem |
What do you think can be good outputs for our model? |
Hmm but if the output is topic then labelling will be hard i guess |
Oh you meant "topic interested in" as an output? Yeah, only trends are assigned topics on twitter, I think |
But I think topics can be a feature in the train set. Like we can cluster the tweets in the train set and assign cluster number of the tweet as a feature. How does this sound? |
Hmm yeah we could try that |
Building upon issues #4 and #5, we need a general module that explores several classification models.
Some libraries that can be explored for classification models are scikit-learn, xgboost, and catboost among others.
Other NLP based methods can also be explored for analyzing the user's preferences. These would employ the tweet text and user interaction metrics.
The text was updated successfully, but these errors were encountered: