Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dvc.lock file to reflect 2024-assessment-year tag data #57

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

dfsnow
Copy link
Member

@dfsnow dfsnow commented Dec 4, 2024

This PR updates our dvc.lock file so that stages after 00-ingest accurately reflect the results of a full dvc repro command.

The basic issue here is that after wrapping the 2024 condo model, I committed the final data to only the 00-ingest stage without propagating that data to any of the downstream stages (by running a full dvc repro). This resulted in DVC creating large dvc.lock diffs for any model run, regardless of whether input parameters or data were changed (compared to the final 2024 model).

The updates to dvc.lock should accurately reflect the state of the pipeline had we run it immediately after finalizing the 2024 model (minus any weird effects caused by KNN ala #54).

@wagnerlmichael

Unblocks #52.

@dfsnow dfsnow requested a review from jeancochrane December 4, 2024 17:47
@dfsnow dfsnow marked this pull request as ready for review December 4, 2024 17:47
@dfsnow dfsnow requested a review from wrridgeway as a code owner December 4, 2024 17:47
@@ -49,12 +49,13 @@ stages:
deps:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that nothing in the 00-ingest stage has changed, only the downstream hashes of that same data (recorded as inputs for subsequent stages).

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! One small question, but it doesn't need to block merging.

Comment on lines +930 to +1000
upload:
cmd: Rscript pipeline/06-upload.R
deps:
- path: output/assessment_card/model_assessment_card.parquet
hash: md5
md5: 3442b0b0fb25364caba810a507213109
size: 38822670
- path: output/assessment_pin/model_assessment_pin.parquet
hash: md5
md5: ae6242ed4427ccd87acab2d87435ab8f
size: 41641680
- path: output/feature_importance/model_feature_importance.parquet
hash: md5
md5: 61db6f11d2ea7aa53d6990445b5d9cd2
size: 8582
- path: output/metadata/model_metadata.parquet
hash: md5
md5: 5bfe8e50f50463253a3f8f4fa3164bb8
size: 29757
- path: output/parameter_final/model_parameter_final.parquet
hash: md5
md5: b234a91486b487642e8738306f87c25c
size: 8857
- path: output/parameter_range/model_parameter_range.parquet
hash: md5
md5: 150000269b5873fa1b3eaeeff7887ce2
size: 501
- path: output/parameter_search/model_parameter_search.parquet
hash: md5
md5: 150000269b5873fa1b3eaeeff7887ce2
size: 501
- path: output/performance/model_performance_assessment.parquet
hash: md5
md5: 6c43dfc44d5e8186f037b5c6d7bbd8b1
size: 573773
- path: output/performance/model_performance_test.parquet
hash: md5
md5: 9867d9222eb5ff618f69b185ffc7452c
size: 1060602
- path: output/performance_quantile/model_performance_quantile_assessment.parquet
hash: md5
md5: 8fb50ba32609879ad5fc9b196e07bdae
size: 461742
- path: output/performance_quantile/model_performance_quantile_test.parquet
hash: md5
md5: 5d5b3e0c69fab782974f89c4bbbf75fb
size: 1055715
- path: output/shap/model_shap.parquet
hash: md5
md5: 150000269b5873fa1b3eaeeff7887ce2
size: 501
- path: output/test_card/model_test_card.parquet
hash: md5
md5: e95956454d04a68669f04f5355af3b5e
size: 1342825
- path: output/timing/model_timing.parquet
hash: md5
md5: 736810f7363817b6023d98b1e74d05af
size: 6032
- path: output/workflow/fit/model_workflow_fit.zip
hash: md5
md5: 5a607521588c3aca5761150390082127
size: 15244546
- path: output/workflow/recipe/model_workflow_recipe.rds
hash: md5
md5: c672f98b0b68e5a16adb0b687b43adca
size: 4199953
- path: reports/performance/performance.html
hash: md5
md5: 004b653e50e9513fc04ad1fc1d5ca544
size: 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] Looks like we first added this stage in December. Any idea why DVC is only now adding it to the lockfile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we basically just didn't run the condo model locally to update the dvc.lock file, it only ran on Batch.

@dfsnow dfsnow merged commit 1a22271 into master Dec 4, 2024
4 checks passed
@dfsnow dfsnow deleted the dfsnow/fix-dvc-lock branch December 4, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants