-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dvc.lock file to reflect 2024-assessment-year
tag data
#57
Conversation
@@ -49,12 +49,13 @@ stages: | |||
deps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that nothing in the 00-ingest
stage has changed, only the downstream hashes of that same data (recorded as inputs for subsequent stages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! One small question, but it doesn't need to block merging.
upload: | ||
cmd: Rscript pipeline/06-upload.R | ||
deps: | ||
- path: output/assessment_card/model_assessment_card.parquet | ||
hash: md5 | ||
md5: 3442b0b0fb25364caba810a507213109 | ||
size: 38822670 | ||
- path: output/assessment_pin/model_assessment_pin.parquet | ||
hash: md5 | ||
md5: ae6242ed4427ccd87acab2d87435ab8f | ||
size: 41641680 | ||
- path: output/feature_importance/model_feature_importance.parquet | ||
hash: md5 | ||
md5: 61db6f11d2ea7aa53d6990445b5d9cd2 | ||
size: 8582 | ||
- path: output/metadata/model_metadata.parquet | ||
hash: md5 | ||
md5: 5bfe8e50f50463253a3f8f4fa3164bb8 | ||
size: 29757 | ||
- path: output/parameter_final/model_parameter_final.parquet | ||
hash: md5 | ||
md5: b234a91486b487642e8738306f87c25c | ||
size: 8857 | ||
- path: output/parameter_range/model_parameter_range.parquet | ||
hash: md5 | ||
md5: 150000269b5873fa1b3eaeeff7887ce2 | ||
size: 501 | ||
- path: output/parameter_search/model_parameter_search.parquet | ||
hash: md5 | ||
md5: 150000269b5873fa1b3eaeeff7887ce2 | ||
size: 501 | ||
- path: output/performance/model_performance_assessment.parquet | ||
hash: md5 | ||
md5: 6c43dfc44d5e8186f037b5c6d7bbd8b1 | ||
size: 573773 | ||
- path: output/performance/model_performance_test.parquet | ||
hash: md5 | ||
md5: 9867d9222eb5ff618f69b185ffc7452c | ||
size: 1060602 | ||
- path: output/performance_quantile/model_performance_quantile_assessment.parquet | ||
hash: md5 | ||
md5: 8fb50ba32609879ad5fc9b196e07bdae | ||
size: 461742 | ||
- path: output/performance_quantile/model_performance_quantile_test.parquet | ||
hash: md5 | ||
md5: 5d5b3e0c69fab782974f89c4bbbf75fb | ||
size: 1055715 | ||
- path: output/shap/model_shap.parquet | ||
hash: md5 | ||
md5: 150000269b5873fa1b3eaeeff7887ce2 | ||
size: 501 | ||
- path: output/test_card/model_test_card.parquet | ||
hash: md5 | ||
md5: e95956454d04a68669f04f5355af3b5e | ||
size: 1342825 | ||
- path: output/timing/model_timing.parquet | ||
hash: md5 | ||
md5: 736810f7363817b6023d98b1e74d05af | ||
size: 6032 | ||
- path: output/workflow/fit/model_workflow_fit.zip | ||
hash: md5 | ||
md5: 5a607521588c3aca5761150390082127 | ||
size: 15244546 | ||
- path: output/workflow/recipe/model_workflow_recipe.rds | ||
hash: md5 | ||
md5: c672f98b0b68e5a16adb0b687b43adca | ||
size: 4199953 | ||
- path: reports/performance/performance.html | ||
hash: md5 | ||
md5: 004b653e50e9513fc04ad1fc1d5ca544 | ||
size: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question, non-blocking] Looks like we first added this stage in December. Any idea why DVC is only now adding it to the lockfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we basically just didn't run the condo model locally to update the dvc.lock
file, it only ran on Batch.
This PR updates our
dvc.lock
file so that stages after00-ingest
accurately reflect the results of a fulldvc repro
command.The basic issue here is that after wrapping the 2024 condo model, I committed the final data to only the
00-ingest
stage without propagating that data to any of the downstream stages (by running a fulldvc repro
). This resulted in DVC creating largedvc.lock
diffs for any model run, regardless of whether input parameters or data were changed (compared to the final 2024 model).The updates to
dvc.lock
should accurately reflect the state of the pipeline had we run it immediately after finalizing the 2024 model (minus any weird effects caused by KNN ala #54).@wagnerlmichael
Unblocks #52.