Important: These instructions assume you have access to StreamSets Data Collector (v3.15+) and have performed all the prerequisites for MySQL and Snowflake
- For help installing StreamSets Data Collector, see StreamSets Data Collector Installation.
- Your SQL Server database is enabled for Change Data Capture (CDC). For help with enabling CDC, see About CDC - SQL Server.
- For help with Snowflake prerequisites, see Snowflake.
This pipeline demonstrates how to read change data capture (CDC) data from a SQL Server database and replicate the changes to Snowflake. The pipeline assumes the following:
- SQL Server database is enabled for Change Data Capture.
- All tables in the database will be tracked for ingesting changes (Inserts, Updates and Deletes). If you need to track only certain tables, configure the Capture Instance Name accordingly in the origin.
- Each source table will be mapped to it's corresponding table in Snowflake. If the table doesn't exist in Snowflake, it will be auto-created.
- The SQL Server CDC client generates records for multiple transaction types. The transaction type is captured in the record header as an attribute called sdc.operation.type:
- 1 for Insert
- 2 for Delete
- 3 for Update, including updates captured after the update operation
- 5 for unsupported operations, including updates captured before the update operation.
The Snowflake destination handles all but one operation type, which is sdc.operation.type=5. The Stream Selector processor in the pipeline routes and discards all records with sdc.operation.type=5.
Disclaimer: This pipeline is meant to serve as a template for performing SQL Server CDC to Snowflake. Some of the parameters, tables and fields may be different for your environment and may need additional customizations. Please consult the StreamSets documentation for full information on configuration of each stage used below.
Click Here to download the pipeline and save it to your drive.
Click the down arrow next to the "Create New Pipeline" and select "Import Pipeline".
Click "Browse" and locate the pipeline file you just downloaded, then click "Import"
Click on the pipeline you just imported to open it and click on the "Parameters" tab and fill in the appropriate information for your environment.
Important: The pipeline template uses the most common default settings for things like the Snowflake region, staging location, etc. All of these are configurable and if you need to change those, you can opt to not use the built-in parameters and choose the appropriate settings yourself. Please refer to the documentation listed in this document for all the available options.
The following parameters are set up for this pipeline:
sqlserver_jdbc_url
|
JDBC URL to connect to the SQL Server database. |
sqlserver_username
|
MySQL username. |
sqlserver_password
|
MySQL password. Tip: To secure sensitive information such as user tokens and passwords, you can use runtime resources or credential stores. |
snowflake_account
|
Snowflake account name. |
snowflake_user
|
Snowflake user name. |
snowflake_pwd
|
Snowflake password. |
snowflake_wh
|
Snowflake warehouse. |
snowflake_db
|
Snowflake database. |
snowflake_schema
|
Snowflake schema. |
snowflake_stage
|
Name of the Snowflake stage used to stage the data.
Unless using a Snowflake internal user stage, you create this stage as part of the Snowflake prerequisite tasks. To use a
Snowflake internal user stage, enter a tilde
( |
Configure each table's Key Column in the Snowflake destination. For CDC data, the MERGE command is applied on Snowflake requiring columns for matching each table. See details here
Click the "START" button to run the pipeline.