Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation and hyperparameter tuning setup #16

Open
4 tasks
cczhu opened this issue Nov 5, 2019 · 2 comments
Open
4 tasks

Validation and hyperparameter tuning setup #16

cczhu opened this issue Nov 5, 2019 · 2 comments
Assignees
Labels
countmatch Pythonization of PRTCS into CountMatch module

Comments

@cczhu
Copy link
Contributor

cczhu commented Nov 5, 2019

We need to determine whether our fit errors are similar to Arman's reported results, and those by Bagheri et al. 2013. We also want to investigate if we can significantly relax criteria for data to be included for validation, and isolating a portion of this data as a holdout set. We technically don't need to only consider permanent stations, just any station with sufficient data across multiple years.

Actual testing may take an extended amount of time (and involve comparing our results to those of a Gaussian Process Regression), so the goal of this issue is to merge the preliminary CountMatch with master, then create a new Sandbox Branch ecosystem to test ways for validation testing and hyperparameter tuning.

  • Merge countmatch branch with master (don't delete countmatch).
  • Rebase sandbox by master.
  • Set up a new notebook for validation testing. Create a CountMatch model mini-pipeline that allows us to vary hyperparameters like the number of neighbours considered or the minimum requirements of a permanent count station.
  • Perform preliminary experiments for validation.
@cczhu cczhu added the countmatch Pythonization of PRTCS into CountMatch module label Nov 5, 2019
@cczhu cczhu added this to the CountMatch MVP milestone Nov 5, 2019
@cczhu
Copy link
Contributor Author

cczhu commented Dec 6, 2019

A decent amount of discussion about this is happening in #14 right now, and this may get implicitly solved once we create an MVP countmatch fitter.

@cczhu
Copy link
Contributor Author

cczhu commented Dec 22, 2019

Old Charles is correct - this is now partly solved by the latest commits in #14, but even better - it's in function form rather than notebook form.

Preliminary results suggest that the annual growth factor is by far the most sensitive parameter governing the AADT predictions, so we'll have to think more about #26 before embarking on a full hyperparameter estimation journey.

(Hyperparameter estimation also takes many hours to run a single experiment. Since our work is embarrassingly parallel, we should consider multiprocessing solutions to speed up the work. The simple thing to do is to multi-thread the hyperparameter tuning experiments. The more involved solution is to multi-thread the estimator - the more lucrative one to multi-thread countmatch. We may also finally want to spin up a cluster to do some of this work...

See #7)

@cczhu cczhu removed this from the CountMatch MVP milestone Mar 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
countmatch Pythonization of PRTCS into CountMatch module
Projects
None yet
Development

No branches or pull requests

2 participants