This repository contains code for a project focused on detecting signals of buy and sell from tweets related to cryptocurrencies. The project is divided into two main phases: an academic and research phase where various models and preprocessing methods are explored, and a product phase where a pipeline for signal detection is implemented using the best-performing approach.
The repository is organized as follows:
- crawl data/: Contains all codes that were used to crawl tweets and detect influencers.
- models/: Includes the pre-trained BERTweet model and any other models used in the project divided into three files classic models, sequential models, and transfer learning.
- crypto_social_media_main/: Source code for the signal detection pipeline and related utilities developed during the product phase.
- README.md: This file, provides an overview of the project and instructions for use.
During the academic and research phase, various models and preprocessing methods were explored to identify the most effective approach for signal detection from tweets. Jupyter notebooks in the models/
directory document the experimentation process and results.
In the product phase, a signal detection pipeline was developed using the best-performing approach identified during the academic and research phase. The pipeline, implemented in the src/
directory, takes a tweet as input and predicts whether it contains a signal to buy or sell cryptocurrency.
The main component of the project is the signal detection pipeline, which includes the following steps:
- Preprocessing: The tweet text is preprocessed to remove noise, tokenize, and prepare it for input into the model.
- BERTweet Model: The pre-trained BERTweet model is used to encode the preprocessed tweet text and extract relevant features.
- Classification: The encoded tweet features are passed through a classification layer to predict the signal (buy or sell).
To use the signal detection pipeline, follow these steps:
- This project requires Python 3.X.X, which can be be found here.
- Clone the repository to your local machine.
- Install the required dependencies listed in
requirement
. - Place your tweet data in the
data/
directory or use the provided sample data. - Run the signal detection pipeline script, specifying the input tweet data.
- Review the output to see the predicted signals for each tweet.
The project relies on the following dependencies:
- Python 3.X.X
- PyTorch 1.11.0
- Transformers 4.20.1
- Other standard Python libraries
For a full list of dependencies and their versions, refer to requirement
.
This project was inspired by the need for effective signal detection in cryptocurrency trading. Special thanks to the developers of BERTweet and other NLP tools used in this project.
For any inquiries or feedback, please contact the project maintainer:
- Name: Darya Zare
- Email: [email protected]
We welcome contributions and suggestions to improve this project!