You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in the data directory we collect issues and pull requests. Initially I pulled all data from 2019-2023 and added it to a single csv. The idea was that the data would be kept for each year to keep the files small and avoid pulling data from previous years that we already have.
Right now, we have
2019-2023
2024_
2025
however, when I look at the files, I see issues and PRs from previous years in the current files. We should document what these files contain and then check the scripts to ensure that we are collecting data properly.
We also should have a date_opened and date_closed item on each row.
The data should be kept so the the 2024 data contains all issues and pr's OPENED in that year. they may be closed in 2025. The challenge here will be CI. there will be issues opened in 2019 that were closed in 2022 because we got funding, etc and more work got done. Or issues that opened in late 2024 and were resoled in 2025.
So we might want a cron job to go back and add date_Closed to issues and pr's opened in a previous year - maybe that runs monthly and parses all data. vs the bi-weekly updates.
The text was updated successfully, but these errors were encountered:
in the data directory we collect issues and pull requests. Initially I pulled all data from 2019-2023 and added it to a single csv. The idea was that the data would be kept for each year to keep the files small and avoid pulling data from previous years that we already have.
Right now, we have
2019-2023
2024_
2025
however, when I look at the files, I see issues and PRs from previous years in the current files. We should document what these files contain and then check the scripts to ensure that we are collecting data properly.
We also should have a date_opened and date_closed item on each row.
The data should be kept so the the 2024 data contains all issues and pr's OPENED in that year. they may be closed in 2025. The challenge here will be CI. there will be issues opened in 2019 that were closed in 2022 because we got funding, etc and more work got done. Or issues that opened in late 2024 and were resoled in 2025.
So we might want a cron job to go back and add date_Closed to issues and pr's opened in a previous year - maybe that runs monthly and parses all data. vs the bi-weekly updates.
The text was updated successfully, but these errors were encountered: