Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional sampling tests #15

Closed
wants to merge 19 commits into from
Closed

Additional sampling tests #15

wants to merge 19 commits into from

Conversation

xjing76
Copy link
Contributor

@xjing76 xjing76 commented Jun 23, 2020

Toy series with added time series component.

So far I can only run thru line 180 with no issue.
After 180 when I started to actually sampling the series, there would be errors saying derivatives of the seasonal components prams are zeros.
ValueError: Mass matrix contains zeros on the diagonal. The derivative of RV beta_s_log__.ravel()[0] is zero.

Originally I also attempt with multiple Fourier series and I was able to run thru it. Yet, I think it would make more sense to use fixed effect based on our current data set up.

@xjing76 xjing76 requested a review from brandonwillard June 23, 2020 17:17
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new files are empty; looks like there might've been an issue copying them. Also, there's a typo in one of the file names.

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's squash these two commits before we continue.

@brandonwillard brandonwillard changed the title add in sampling test script Additional sampling tests Jun 23, 2020
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need to run black on this source (e.g. make black from the project root). That should take care of the formatting issues.

tests/test_sampling.py Outdated Show resolved Hide resolved
tests/test_sampling.py Outdated Show resolved Hide resolved
tests/test_sampling.py Outdated Show resolved Hide resolved
tests/test_sampling.py Outdated Show resolved Hide resolved
tests/test_sampling.py Outdated Show resolved Hide resolved
tests/test_sampling_seasonality.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
@xjing76 xjing76 force-pushed the test_sampling branch 3 times, most recently from 29e60ce to 1f8bf05 Compare June 25, 2020 11:35
@xjing76
Copy link
Contributor Author

xjing76 commented Jun 25, 2020

So far, I convert the test to test_func format, but only test_sampling is working. test_sampling_with_seasonality would not be able to actually sample and would throw pymc3.exceptions.SamplingError: Bad initial energy. Additionally, since we changed the simulate_poiszero_hmm to the SwitchingProcess, additional adjustment need to be made. Also, I will still need to reformat so that it would fit format requirement.

@brandonwillard
Copy link
Contributor

Don't forget to resolve the review comments when you commit/rebase relevant changes!

@xjing76 xjing76 force-pushed the test_sampling branch 3 times, most recently from f761419 to 5155fce Compare June 25, 2020 21:11
@brandonwillard
Copy link
Contributor

To fix the ValueErrors, try uncommenting this line and changing the string value to "warn". That should create the missing mu[0].tag.test_value.

@brandonwillard
Copy link
Contributor

brandonwillard commented Jun 25, 2020

With that ValueError out of the way, I can see the SamplingError. This occurs within the HMC sampler when the sampled point value is

{'p_0_stickbreaking__': array([0., 0.]),
 'p_1_stickbreaking__': array([0., 0.]),
 'p_2_stickbreaking__': array([0., 0.]),
 'S_t': array([1, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 0, 2, 2, 1, 1,
        1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2,
        2, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1,
        1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 2,
        2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2,
        2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0,
        0, 0, 0, 2, 0, 2, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0,
        0, 1]),
 'mu_1_log__': array(8.00636757),
 'mu_2_log__': array(nan),
 'beta_s_log__': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])}

Of course, the mu_2_log__ value (i.e. nan) is a problem. Inspecting mu_2_rv.tag.test_value in the parent frames show that the nan value was present from the start, and it's likely cause is the undefined pm.Gamma prior with negative E_2_mu and Var_2_mu parameters (i.e. -2000 and -1800, respectively).

In summary, it looks like there might be a sign error such that (mu_2 - mu_1) at line 31 should be reversed or the absolute value should be used.

@xjing76
Copy link
Contributor Author

xjing76 commented Jun 26, 2020

Adjusted according to recommendations from @brandonwillard, solved both ValueError and SamplingError. However, with current fixed effect setting for seasonality, posterior prediction is a bit far away from simulation. Thus the failure in Assertion

@xjing76
Copy link
Contributor Author

xjing76 commented Jun 29, 2020

Sampling for seasonality with fixed effect would generate positive Poisson prediction with MAPE around 30%. Percentage of true value falls into predicted 95% CI is also about 30%. Also moved metric checking function into utility.

@brandonwillard brandonwillard marked this pull request as draft June 29, 2020 16:54
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/test_sampling_seasonality.py Outdated Show resolved Hide resolved
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what's causing the FloatingPointError that's making the tests fail?

tests/test_sampling_seasonality.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
tests/utils.py Outdated Show resolved Hide resolved
@xjing76 xjing76 force-pushed the test_sampling branch 2 times, most recently from b7f6472 to 78ff675 Compare July 7, 2020 14:34
@xjing76
Copy link
Contributor Author

xjing76 commented Jul 7, 2020

changed up some formatting issue and added in descriptions for metrics checking function. A not working version of ZeroTruncatedPoission. It is not working at this point and it would throw error with bad initial energy.

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the test errors, it seems like the sampled mu values might be too large for the exp in your logp implementation.

tests/test_ztp.py Outdated Show resolved Hide resolved
@xjing76
Copy link
Contributor Author

xjing76 commented Jul 8, 2020

Just adjust logp as requested. and both seasonality and seasonality_ztp is able to pass individual test. but not when call make test. It seems like i was failing at a function perform(self, node, inputs, output_storage): and the only place I can find it across the project is testing-report.html

@xjing76 xjing76 requested a review from brandonwillard July 8, 2020 19:44
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the output, the perform you're referring to is the Theano Op.perform for the exp function. When Theano's C-compilation is turned off/not implemented for an operator, Op.perform for Theano's exp operator essentially calls np.exp on some arguments. The code you're seeing in Op.perform is doing just that, since self.ufunc is basically/literally np.exp.

The actual error is—again—a FloatingPointError, and Theano's debug output shows you the inputs used by Op.perform. They're still quite large for exp.

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try wrapping the pm.sample in a with np.errstate(over='warn'):.

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success!

Unfortunately, it says the tests are taking 28 minutes to run! We need to trim that down by a lot.

If pm.sample_posterior_predictive is the cause, we can remove the steps that rely on it and focus on posterior sample checks alone.

@brandonwillard brandonwillard marked this pull request as ready for review July 10, 2020 18:08
@xjing76
Copy link
Contributor Author

xjing76 commented Jul 13, 2020

I switched the metrics from posterior estimation to measuring the distance from the real parameters of the distributions.

In addition I added in [ Conway-Maxwell-Poission distribution]
(https://gist.github.com/dadaromeo/33e581d9e3bcbad83531b4a91a87509f). However, this method would fail on both states estimation and the parameter estimation.

Additionally, we are able to reduce the time to run the script by removing sampling posterior only to an extent. For tests with seasonality we would still need reasonable amount of time just to train the model.

@brandonwillard
Copy link
Contributor

In addition I added in [ Conway-Maxwell-Poission distribution]
(https://gist.github.com/dadaromeo/33e581d9e3bcbad83531b4a91a87509f). However, this method would fail on both states estimation and the parameter estimation.

This should be in a separate PR.

Comment on lines +114 to +122
post_pred_imps_hpd_df = az.hdi(
az_post_trace, hdi_prob=ci_conf, group="posterior_predictive", var_names=["Y_t"]
).to_dataframe()

post_pred_imps_hpd_df = post_pred_imps_hpd_df.unstack(level="hdi")
post_pred_imps_hpd_df.columns = post_pred_imps_hpd_df.columns.set_levels(
["upper", "lower"], level="hdi"
)
pred_range = post_pred_imps_hpd_df[positive_index]["Y_t"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use the imp naming scheme—or references to anything like that—in this repository.

@brandonwillard
Copy link
Contributor

Looks like we'll have to resubmit this to the main branch instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Basic model assumptions and sampler tests
2 participants