-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement proper truncation for prior distributions #335
base: develop
Are you sure you want to change the base?
Conversation
90946f3
to
f4b5153
Compare
Currently, when sampled startpoints are outside the bounds, their value is set to the upper/lower bounds. This may put too much probability mass on the bounds. With these changes, we properly sample from the respective truncated distributions. Closes PEtab-dev#330.
f4b5153
to
2fd6da8
Compare
e716487
to
65ef80f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #335 +/- ##
===========================================
- Coverage 74.66% 74.25% -0.42%
===========================================
Files 56 56
Lines 5573 5647 +74
Branches 976 990 +14
===========================================
+ Hits 4161 4193 +32
- Misses 1040 1084 +44
+ Partials 372 370 -2 ☔ View full report in Codecov by Sentry. |
petab/v1/priors.py
Outdated
:param bounds_truncate: Whether the generated prior will be truncated | ||
at the bounds. | ||
If ``True``, the probability density will be rescaled | ||
accordingly and the sample is generated from the truncated | ||
distribution. | ||
If ``False``, the probability density will not account for the | ||
bounds, but any parameter samples outside the bounds will be set to | ||
the value of the closest bound. In this case, the PDF might not match | ||
the sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True: new behavior
False: old behavior
PEtab specs are ambiguous there (https://github.com/PEtab-dev/PEtab/blob/b9e141dd75798d179c17262f085ed6cef8555b3e/doc/v1/documentation_data_format.rst?plain=1#L527-L529):
Sampled points are clipped to lie inside the parameter boundaries specified by
lowerBound
andupperBound
.
While I think the new behavior is more correct, I will wait another while before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEtab specs are ambiguous there (https://github.com/PEtab-dev/PEtab/blob/b9e141dd75798d179c17262f085ed6cef8555b3e/doc/v1/documentation_data_format.rst?plain=1#L527-L529):
Sampled points are clipped to lie inside the parameter boundaries specified by
lowerBound
andupperBound
.While I think the new behavior is more correct, I will wait another while before merging this.
I agree, but I would also be in favor of removing the old behavior entirely. Or "fix" it by resampling out-of-bounds samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we should get rid of that. Happy to remove this option completely.
I will wait for some feedback to PEtab-dev/PEtab#591 before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to some v1 subfolder? Now or later is fine. But I think priors will change a lot in v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about moving it to https://github.com/PEtab-dev/PEtab/ at some point. It might also be helpful for non-python petab users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
@@ -151,15 +156,18 @@ | |||
{ | |||
"metadata": {}, | |||
"cell_type": "markdown", | |||
"source": "To prevent the sampled parameters from exceeding the bounds, the sampled parameters are clipped to the bounds. The bounds are defined in the parameter table. Note that the current implementation does not support sampling from a truncated distribution. Instead, the samples are clipped to the bounds. This may introduce unwanted bias, and thus, should only be used with caution (i.e., the bounds should be chosen wide enough):", | |||
"source": "The given distributions are truncated at the bounds defined in the parameter table:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add something like "This results in a constant shift in the probability density, compared to the non-truncated version (https://en.wikipedia.org/wiki/Truncated_distribution), such that the probability density still sums to 1."
petab/v1/distributions.py
Outdated
def _undo_log(self, x: np.ndarray | float) -> np.ndarray | float: | ||
"""Undo the log transformation. | ||
def _exp(self, x: np.ndarray | float) -> np.ndarray | float: | ||
"""Exponentiate / undo the log transformation according. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_undo_log
made sense to me, since the point is to take the inverse of the log, but fine to change too
"""Exponentiate / undo the log transformation according. | |
"""Exponentiate / undo the log transformation if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it too complicated, as exp
is well understood, I think.
petab/v1/distributions.py
Outdated
:param x: The value at which to evaluate the CDF. | ||
:return: The value of the CDF at ``x``. | ||
""" | ||
return self._cdf_transformed_untruncated(x) - self._cd_low |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, shouldn't the CDF "grow" faster when the PDF is truncated? e.g. for a normal distribution, the CDF reaches 1 at +infty. For a truncated normal distribution, the CDF reaches 1 in a finite interval... so is it enough to just subtract the lower bound CDF value? Could you add a test/sanity check that the CDF is 0 at the lower bound (trivially correct here), and 1 at the upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I missed the normalization.
Thanks, fixed.
|
…v#341) Previously, missing `*PriorParameters` would have resulted in a KeyError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you! 👍
def _cdf_untransformed_untruncated(self, x) -> np.ndarray | float: | ||
"""Cumulative distribution function of the underlying | ||
(untransformed, untruncated) distribution at x. | ||
|
||
:param x: The value at which to evaluate the CDF. | ||
:return: The value of the CDF at ``x``. | ||
""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to make this raise an Error, but the _pdf_untransformed_untruncated
above is an abstract method?
``None`` if the distribution is not truncated. The truncation limits | ||
are the truncation limits of the transformed distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, for :param log:
it reads like the opposite is true:
If a transformation is applied, the location and scale parameters
and the truncation limits are the location, scale and truncation limits
of the underlying normal distribution.```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other distributions.
@@ -131,18 +165,20 @@ | |||
{ | |||
"metadata": {}, | |||
"cell_type": "markdown", | |||
"source": "Prior distributions can also be defined on the parameter scale by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, 1) the distribution parameter are interpreted on the transformed parameter scale, and 2) a sample from the given distribution is used directly, without applying any transformation according to `parameterScale` (this implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`):", | |||
"source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameter are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameter are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.", | |
"source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameters are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.", |
@@ -94,7 +128,7 @@ | |||
{ | |||
"metadata": {}, | |||
"cell_type": "markdown", | |||
"source": "If a parameter scale is specified (`parameterScale=lin|log|log10` not a `parameterScale*`-type distribution), the sample is transformed accordingly (but not the distribution parameters):\n", | |||
"source": "If a parameter scale is specified (`parameterScale=lin|log|log10`) and the chosen distribution is not a `parameterScale*`-type distribution, then the distribution parameters are taken as is, i.e., the `parameterScale` is not applied to the distribution parameters. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"source": "If a parameter scale is specified (`parameterScale=lin|log|log10`) and the chosen distribution is not a `parameterScale*`-type distribution, then the distribution parameters are taken as is, i.e., the `parameterScale` is not applied to the distribution parameters. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n", | |
"source": "If a parameter scale is specified (`parameterScale=lin|log|log10`), the distribution parameters are used as is without applying the `parameterScale` to them. The exception are the `parameterScale*`-type distributions, as explained below. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n", |
Just a suggestion to make it easier to follow, I also found this easier to understand once I understood that parameterScale*
is explained afterwards.
Currently, when sampled startpoints are outside the bounds, their value is set to the upper/lower bounds. This may put too much probability mass on the bounds.
With these changes, we properly sample from the respective truncated distributions.
Closes #330.
This also evaluates all priors on the model parameter scale (instead of
parameterScale
scale, see PEtab-dev/PEtab#402.👀 https://petab--335.org.readthedocs.build/projects/libpetab-python/en/335/example/distributions.html