An automated system for collecting and analyzing job postings from Telegram channels using Airflow.
./
├── poetry.lock
├── _production
│ ├── airflow
│ │ ├── dags
│ │ │ └── main_dag.py
│ │ └── plugins
│ │ ├── production
│ │ │ └── email_notifications.py
│ │ ├── raw
│ │ │ └── data_collection.py
│ │ └── staging
│ │ └── data_cleaning.py
│ ├── config
│ │ ├── config_db.py
│ │ ├── config.json
│ │ └── config.py
│ ├── __init__.py
│ └── utils
│ ├── common.py
│ ├── email.py
│ ├── exceptions.py
│ ├── llm.py
│ ├── prompts.py
│ ├── sql.py
│ ├── text.py
│ └── tg.py
├── pyproject.toml
└── README.md
- Automated data collection from Telegram channels
- Text processing and data cleaning
- LLM-based analysis and classification
- SQL database integration
- Email notifications system
- Comprehensive test coverage
- Airflow DAGs: Orchestration of data pipeline (
main_dag.py
) - Airflow Plugins:
raw/
: Data collection from Telegramstaging/
: Data cleaning and preprocessingproduction/
: Email notification system
- Utils:
common.py
: Shared utility functionsemail.py
: Email handlingllm.py
: LLM integrationsql.py
: Database operationstext.py
: Text processingtg.py
: Telegram API interactions
- Install dependencies using Poetry:
poetry install
- Configure the application:
-
Copy
.env.example
to.env
-
Update configuration with your credentials:
- Telegram API credentials
- Database connection details
- Email settings
- LLM API keys
- Other relevant settings
-
Copy
production/config/config.json.example
toproduction/config/config.json
-
Update
production/config/config.json
with your Telegram channel names and other settings
- Start Airflow:
airflow standalone
- Python 3.12+
- Poetry for dependency management
- Follow PEP 8 style guide
Key configuration files:
config/config.py
: Base configuration setupconfig/config_db.py
: Database configurationconfig/config.json
: Runtime configuration (not tracked in git)