Orchestrated Data Migration.' This cutting-edge endeavor leverages the power of Docker, Airflow, Python, MSSQL, and Snowflake to seamlessly execute Extract, Load, and Transform (ELT) processes
Astro/
|-- .astro
|-- dags/
| |-- dag.py
| |-- full_load.py
| |-- mssql_to_snowflake.py
|-- tests/
|-- .airflowignore
|-- .dockerignore
|-- .gitignore
|-- Dockerfile
|-- README.md
|-- Stop-Process
|-- packages.txt
|-- requirements.txt
- dags/: Contains the Airflow DAGs for data migration.
- dag.py: Main DAG definition.
- full_load.py: DAG for full data load.
- mssql_to_snowflake.py: DAG for migrating data from MSSQL to Snowflake.
- tests/: Contains test scripts for the project.
- Dockerfile: Configuration for building the Docker image.
- packages.txt: Lists packages to be installed in the Docker image.
- requirements.txt: Python dependencies for the project.
This DAG orchestrates the full load of data from MSSQL to Snowflake. It runs daily and includes the following tasks:
- Sensor Task (
wait_for_tables
): Waits until tables are available in the MSSQL database. - Python Operator (
transfer_data
): Transfers data from MSSQL to Snowflake. This task dynamically fetches tables from MSSQL, extracts data, and loads it into Snowflake using temporary CSV files.
This DAG orchestrates the migration of specific tables from MSSQL to Snowflake. It includes the following tasks:
- Operators: For each table specified in the
tables
list, the DAG dynamically creates tasks to:- Extract data from MSSQL using the
MsSqlOperator
. - Create tables in Snowflake using the
SnowflakeOperator
. - Load data into Snowflake using the
SnowflakeOperator
.
- Extract data from MSSQL using the
This DAG extracts data from the MSSQL database, creates corresponding tables in Snowflake, and loads the data into Snowflake. The data extraction, table creation, and data loading are executed in sequence for each table.
To use the Astro Data Migration Project for migrating data from MSSQL to Snowflake, follow these steps:
-
Setup Docker: Ensure Docker is installed on your system.
-
Install Astro CLI: Install the Astro CLI for container orchestration.
-
Setup MSSQL on Docker: Install and configure MSSQL on Docker, and import your database from your local PC into the MSSQL container.
-
Create Docker Network: Create a Docker network to facilitate communication between the Astro CLI containers (trigger, webserver, scheduler, postgres) and the MSSQL container. Run the following command:
docker network create astro_network
This command will create a Docker network named
astro_network
. -
Clone the Repository: Clone the Astro repository to your local machine.
-
Initialize Astro Development Environment: Optionally, initialize the Astro development environment by running
astro dev init
. -
Start Astro Development Environment: Run
astro dev start --wait=5m
to start the Astro development environment. This command will start the Airflow environment using Astro CLI. -
Configure Airflow Connections: On the Airflow UI, configure the
snowflake_default
andmssql_default
connections as per your Snowflake and MSSQL configurations.
Contributions to the Astro Data Migration Project are welcome! If you have any suggestions, find issues, or want to contribute code, please open an issue or submit a pull request.