These scripts were created to simplify the process of running and testing the models. Their primary purpose is to provide a means of doing so during development without risking a credentials leak, and to avoid needing to run sql-runner once per playbook.
The run_playbooks
, run_config
, e2e
and pr_check
scripts require SQL-runnner.
They also require that the relevant template database target in the templates/
directory is populated. One can include the password in this template, or leave it as PASSWORD_PLACEHOLDER
and authenticate by other means - see the authentication section below.
The run_config
, e2e
, and pr_check
scripts require jq.
The run_test
, e2e
, and pr_check
scripts require python3
. To install dependencies:
cd data-models/.test
pip3 install -r requirements.txt
These scripts also require setting up a great_expectations datasource. To do so, run great_expectations datasource new
from the .test/
directory. This will create a configuration in .test/great_expectations/config/config_variables.tml
. The entry for password can be replaced with ${REDSHIFT_PASSWORD}
to use script input or environment variables to authenticate.
These scripts allow for handling passwords via two means which reduce the risk of committing credentials to source control.
-
Set the
REDSHIFT_PASSWORD
orSNOWFLAKE_PASSWORD
environment variables for Redshift and Snowflake respectively, or for BigQuery set theGOOGLE_APPLICATION_CREDENTIALS
environment variable to path of your JSON service account key file. -
Pass the relevant credential to the relevant argument of the script in question.
Any script which uses SQL-runner will leverage the templates in .scripts/templates/
to manage the database target connection details. Passwords should be left set to PASSWORD_PLACEHOLDER
, but all other details should be hardcoded.
It's best to avoid committing these to source control - however doing so is less severe a risk than leaking a password.
Runs a config json file (examples found in the configs
folder of each model) - which specifies a list of playbooks to run.
Note that this script does not enforce dependencies, rather runs the playbooks in order of appearance. Snowplow BDP customers can take advantage of dependency resolution when running jobs on our Orchestration services.
Arguments:
-b (binary) path to sql-runner binary [required]
-c (config) path to config [required]
-a (auth) optional credentials for database target
-p (print SQL) use sql-runner fillTemplates to print pure sql
-d (dryRun) use sql-runner dry run
-o (output path) path to store output of sql-runner to sql file (to be used in conjunction with p)
-t (target template) path to target template to use (minimizes risk of credential leak)
-v (variable template) path to variable template. Any variables in this template will override any corresponding variables within each playbook for the run.
Examples:
bash .scripts/run_config.sh -b ~/pathTo/sql-runner -c web/v1/bigquery/sql-runner/configs/datamodeling.json;
# Runs the standard bigquery web model end to end.
bash .scripts/run_config.sh -b ~/pathTo/sql-runner -c web/v1/bigquery/sql-runner/configs/datamodeling.json -d;
# Dry-runs the standard bigquery web model end to end.
bash .scripts/run_config.sh -b ~/pathTo/sql-runner -c web/v1/bigquery/sql-runner/configs/example_with_custom.json -p -o tmp/sql;
# Prints pure sql for the bigquery model and example custom steps to files in `tmp/sql` - with all templates filled in.
Runs a great_expectations suite.
The configuration for the tests can be found in the expectations
directory.
We recommend using a virtual environment for python, eg. pyenv
or virtualenv
- for example using the latter:
virtualenv ~/myenv
source ~/myenv/bin/activate
Before running, make sure to install python requirements (python3 required):
cd data-models/.test
pip3 install -r requirements.txt
Arguments:
-d (database) target database for expectations [required]
-c (config) expectation config name [required]
-a (auth) optional credentials for database target
-m (model) target model to run i.e. web or mobile [required]
Examples:
bash .scripts/run_test.sh -d bigquery -c perm_tables -m web;
# runs the perm_tables validation config against bigquery
e2e.sh
runs a single end to end run of a standard model and great expectations tests.
We recommend using a virtual environment for python, eg. pyenv
or virtualenv
- for example using the latter:
virtualenv ~/myenv
source ~/myenv/bin/activate
Before running, make sure to install python requirements (python3 required):
cd data-models/.test
pip3 install -r requirements.txt
Arguments:
-b (binary) path to sql-runner binary [required]
-d (database) target database for expectations [required]
-a (auth) optional credentials for database target
-m (model) target model to run i.e. web or mobile [required]
Examples:
bash .scripts/e2e.sh -b ~/pathTo/sql-runner -d bigquery -m web;
# Runs the end to end testing script against bigquery
Runs ten end to end runs of a standard model and tests. Exits on failure.
We recommend using a virtual environment for python, eg. pyenv
or virtualenv
- for example using the latter:
virtualenv ~/myenv
source ~/myenv/bin/activate
Before running, make sure to install python requirements (python3 required):
cd data-models/.test
pip3 install -r requirements.txt
Arguments:
-b (binary) path to sql-runner binary [required]
-d (database) target database for expectations [required]
-a (auth) optional credentials for database target
-m (model) target model to run i.e. web or mobile [required]
Examples:
bash .scripts/pr_check.sh -b ~/pathTo/sql-runner -d bigquery -m web;
# Runs the pr check testing script against bigquery
Runs 4 end to end runs of the standard model in 1 day increments, using the integration test dataset. The actual derived tables are then checked against the expect derived tables. The standard tests are also performed on the derived tables.
We recommend using a virtual environment for python, eg. pyenv
or virtualenv
- for example using the latter:
virtualenv ~/myenv
source ~/myenv/bin/activate
Before running, make sure to install python requirements (python3 required):
cd data-models/.test
pip3 install -r requirements.txt
Arguments:
-b (binary) path to sql-runner binary [required]
-d (database) target database for expectations [required]
-a (auth) optional credentials for database target
-m (model) target model to run i.e. web or mobile [required]
Examples:
bash .scripts/integration_test.sh -b ~/pathTo/sql-runner -d bigquery -m web
# Runs the integration testing script against bigquery
Deprecated - run_config.sh
provides a simpler instrumentation for this functionality.
Runs a list of playbooks in sequence, using sql-runnner.
Arguments:
bash run_playbooks.sh {path_to_sql_runner} {database} {major version} '{list_of_playbooks_no_extension},{comma_separated}' {credentials (optional)}
# {path_to_sql_runner} - Path to your local instance of SQL-runner
# {database} - Database to run (`redshift`, `snowflake` or `bigquery` - note that only redshift is currently implemented)
# {major version} - Version of the model to run (according to the directory that houses it - eg. `v0` or `v1`)
# '{list_of_playbooks_no_extension},{comma_separated}' - A string containing a list of playbook paths, from the 'playbooks' folder, with no file extension (eg. `standard/00-setup/00-setup-metadata,standard/01-base/01-base-main`).
# {credentials (optional)} - Credentials for the database (optional, this can be provided by env var also)
Examples:
bash .scripts/run_playbooks.sh ~/sql-runner redshift v1 'standard/01-base/01-base-main,standard/02-page-views/01-page-views-main,standard/03-sessions/01-sessions-main,standard/04-users/01-users-main';
# Runs the base, page views, sessions and users main playbooks for redshift