Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit DVC deps to include cmd run files #45

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ stages:
cmd: Rscript pipeline/00-ingest.R
desc: >
Ingest training and assessment data from Athena + generate condo strata
deps:
- pipeline/00-ingest.R
params:
- assessment
- input
Expand All @@ -20,6 +22,8 @@ stages:
Train a LightGBM model with cross-validation. Generate model objects,
data recipes, and predictions on the test set (most recent 10% of sales)
deps:
- pipeline/01-train.R
- input/training_data.parquet
- input/training_data.parquet
params:
- cv
Expand Down Expand Up @@ -55,6 +59,13 @@ stages:
County. Also generate flags, calculate land values, and make any
post-modeling changes
deps:
- pipeline/02-assess.R
- input/assessment_data.parquet
- input/condo_strata_data.parquet
- input/land_nbhd_rate_data.parquet
- input/training_data.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- input/assessment_data.parquet
- input/condo_strata_data.parquet
- input/land_nbhd_rate_data.parquet
Expand Down Expand Up @@ -82,6 +93,9 @@ stages:
2. An assessor-specific ratio study comparing estimated assessments to
the previous year's sales
deps:
- pipeline/03-evaluate.R
- output/assessment_pin/model_assessment_pin.parquet
- output/test_card/model_test_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
- output/test_card/model_test_card.parquet
params:
Expand All @@ -105,6 +119,10 @@ stages:
Generate SHAP values for each card and feature as well as feature
importance metrics for each feature
deps:
- pipeline/04-interpret.R
- input/assessment_data.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- input/assessment_data.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
Expand All @@ -125,6 +143,11 @@ stages:
Save run timings and run metadata to disk and render a performance report
using Quarto.
deps:
- pipeline/05-finalize.R
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
- output/intermediate/timing/model_timing_interpret.parquet
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
Expand Down Expand Up @@ -155,6 +178,24 @@ stages:
outputs prior to upload and attach a unique run ID. This step requires
access to the CCAO Data AWS account, and so is assumed to be internal-only
deps:
- pipeline/06-upload.R
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- output/test_card/model_test_card.parquet
- output/assessment_card/model_assessment_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
- output/performance/model_performance_test.parquet
- output/performance_quantile/model_performance_quantile_test.parquet
- output/performance/model_performance_assessment.parquet
- output/performance_quantile/model_performance_quantile_assessment.parquet
- output/shap/model_shap.parquet
- output/feature_importance/model_feature_importance.parquet
- output/metadata/model_metadata.parquet
- output/timing/model_timing.parquet
- reports/performance/performance.html
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
Expand All @@ -179,6 +220,8 @@ stages:
Generate Desk Review spreadsheets and iasWorld upload CSVs from a finished
run. NOT automatically run since it is typically only run once. Manually
run once a model is selected
deps:
- pipeline/07-export.R
params:
- assessment.year
- input.min_sale_year
Expand Down
Loading