Implement proper truncation for prior distributions #335

dweindl · 2024-12-07T14:23:36Z

Currently, when sampled startpoints are outside the bounds, their value is set to the upper/lower bounds. This may put too much probability mass on the bounds.

With these changes, we properly sample from the respective truncated distributions.

Closes #330.

This also evaluates all priors on the model parameter scale (instead of parameterScale scale, see PEtab-dev/PEtab#402.

👀 https://petab--335.org.readthedocs.build/projects/libpetab-python/en/335/example/distributions.html

Currently, when sampled startpoints are outside the bounds, their value is set to the upper/lower bounds. This may put too much probability mass on the bounds. With these changes, we properly sample from the respective truncated distributions. Closes PEtab-dev#330.

codecov-commenter · 2024-12-11T18:09:04Z

Codecov Report

Attention: Patch coverage is 86.50794% with 17 lines in your changes missing coverage. Please review.

Project coverage is 74.25%. Comparing base (6a433e0) to head (6f005b8).

Files with missing lines	Patch %	Lines
petab/v1/distributions.py	85.05%	11 Missing and 2 partials ⚠️
petab/v1/priors.py	89.47%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #335      +/-   ##
===========================================
- Coverage    74.66%   74.25%   -0.42%     
===========================================
  Files           56       56              
  Lines         5573     5647      +74     
  Branches       976      990      +14     
===========================================
+ Hits          4161     4193      +32     
- Misses        1040     1084      +44     
+ Partials       372      370       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dweindl · 2024-12-11T18:49:04Z

petab/v1/priors.py

+    :param bounds_truncate: Whether the generated prior will be truncated
+        at the bounds.
+        If ``True``, the probability density will be rescaled
+        accordingly and the sample is generated from the truncated
+        distribution.
+        If ``False``, the probability density will not account for the
+        bounds, but any parameter samples outside the bounds will be set to
+        the value of the closest bound. In this case, the PDF might not match
+        the sample.


True: new behavior
False: old behavior

PEtab specs are ambiguous there (https://github.com/PEtab-dev/PEtab/blob/b9e141dd75798d179c17262f085ed6cef8555b3e/doc/v1/documentation_data_format.rst?plain=1#L527-L529):

Sampled points are clipped to lie inside the parameter boundaries specified by lowerBound and upperBound.

While I think the new behavior is more correct, I will wait another while before merging this.

PEtab specs are ambiguous there (https://github.com/PEtab-dev/PEtab/blob/b9e141dd75798d179c17262f085ed6cef8555b3e/doc/v1/documentation_data_format.rst?plain=1#L527-L529):

Sampled points are clipped to lie inside the parameter boundaries specified by lowerBound and upperBound.

While I think the new behavior is more correct, I will wait another while before merging this.

I agree, but I would also be in favor of removing the old behavior entirely. Or "fix" it by resampling out-of-bounds samples.

Agreed that we should get rid of that. Happy to remove this option completely.

I will wait for some feedback to PEtab-dev/PEtab#591 before proceeding.

dilpath · 2024-12-11T19:09:33Z

doc/example/distributions.ipynb

Move this to some v1 subfolder? Now or later is fine. But I think priors will change a lot in v2

I was thinking about moving it to https://github.com/PEtab-dev/PEtab/ at some point. It might also be helpful for non-python petab users.

Sounds good!

dilpath · 2024-12-11T19:17:47Z

doc/example/distributions.ipynb

@@ -151,15 +156,18 @@
  {
   "metadata": {},
   "cell_type": "markdown",
-   "source": "To prevent the sampled parameters from exceeding the bounds, the sampled parameters are clipped to the bounds. The bounds are defined in the parameter table. Note that the current implementation does not support sampling from a truncated distribution. Instead, the samples are clipped to the bounds. This may introduce unwanted bias, and thus, should only be used with caution (i.e., the bounds should be chosen wide enough):",
+   "source": "The given distributions are truncated at the bounds defined in the parameter table:",


Add something like "This results in a constant shift in the probability density, compared to the non-truncated version (https://en.wikipedia.org/wiki/Truncated_distribution), such that the probability density still sums to 1."

petab/v1/distributions.py

dilpath · 2024-12-11T19:24:28Z

petab/v1/distributions.py

-    def _undo_log(self, x: np.ndarray | float) -> np.ndarray | float:
-        """Undo the log transformation.
+    def _exp(self, x: np.ndarray | float) -> np.ndarray | float:
+        """Exponentiate / undo the log transformation according.


_undo_log made sense to me, since the point is to take the inverse of the log, but fine to change too

Suggested change

"""Exponentiate / undo the log transformation according.

"""Exponentiate / undo the log transformation if applicable.

I found it too complicated, as exp is well understood, I think.

petab/v1/distributions.py

dilpath · 2024-12-11T19:32:40Z

petab/v1/distributions.py

+        :param x: The value at which to evaluate the CDF.
+        :return: The value of the CDF at ``x``.
+        """
+        return self._cdf_transformed_untruncated(x) - self._cd_low


Hm, shouldn't the CDF "grow" faster when the PDF is truncated? e.g. for a normal distribution, the CDF reaches 1 at +infty. For a truncated normal distribution, the CDF reaches 1 in a finite interval... so is it enough to just subtract the lower bound CDF value? Could you add a test/sanity check that the CDF is 0 at the lower bound (trivially correct here), and 1 at the upper bound?

You are right, I missed the normalization.

Thanks, fixed.

petab/v1/distributions.py

petab/v1/priors.py

dweindl · 2024-12-11T21:28:51Z

~~This does not yet address the issue of whether all priors are defined in the linear scale or the parameter scale PEtab-dev/PEtab#402.~~

…v#341) Previously, missing `*PriorParameters` would have resulted in a KeyError.

m-philipps

Looks good to me, thank you! 👍

m-philipps · 2025-01-12T21:56:25Z

petab/v1/distributions.py

+    def _cdf_untransformed_untruncated(self, x) -> np.ndarray | float:
+        """Cumulative distribution function of the underlying
+        (untransformed, untruncated) distribution at x.
+
+        :param x: The value at which to evaluate the CDF.
+        :return: The value of the CDF at ``x``.
+        """
+        raise NotImplementedError


Is there a reason to make this raise an Error, but the _pdf_untransformed_untruncated above is an abstract method?

m-philipps · 2025-01-12T22:02:24Z

petab/v1/distributions.py

+        ``None`` if the distribution is not truncated. The truncation limits
+        are the truncation limits of the transformed distribution.


Below, for :param log: it reads like the opposite is true:

If a transformation is applied, the location and scale parameters and the truncation limits are the location, scale and truncation limits of the underlying normal distribution.```

Same for the other distributions.

m-philipps · 2025-01-13T17:29:59Z

doc/example/distributions.ipynb

@@ -131,18 +165,20 @@
  {
   "metadata": {},
   "cell_type": "markdown",
-   "source": "Prior distributions can also be defined on the parameter scale by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, 1) the distribution parameter are interpreted on the transformed parameter scale, and 2) a sample from the given distribution is used directly, without applying any transformation according to `parameterScale` (this implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`):",
+   "source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameter are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.",


Suggested change

"source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameter are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.",

"source": "Prior distributions can also be defined on the scaled parameters (i.e., transformed according to `parameterScale`) by using the types `parameterScaleUniform`, `parameterScaleNormal` or `parameterScaleLaplace`. In these cases, the distribution parameters are interpreted on the transformed parameter scale (but not the parameter bounds, see below). This implies, that for `parameterScale=lin`, there is no difference between `parameterScaleUniform` and `uniform`.",

m-philipps · 2025-01-13T18:10:50Z

doc/example/distributions.ipynb

@@ -94,7 +128,7 @@
  {
   "metadata": {},
   "cell_type": "markdown",
-   "source": "If a parameter scale is specified (`parameterScale=lin|log|log10` not a `parameterScale*`-type distribution), the sample is transformed accordingly (but not the distribution parameters):\n",
+   "source": "If a parameter scale is specified (`parameterScale=lin|log|log10`) and the chosen distribution is not a `parameterScale*`-type distribution, then the distribution parameters are taken as is, i.e., the `parameterScale` is not applied to the distribution parameters. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n",


Suggested change

"source": "If a parameter scale is specified (`parameterScale=lin|log|log10`) and the chosen distribution is not a `parameterScale*`-type distribution, then the distribution parameters are taken as is, i.e., the `parameterScale` is not applied to the distribution parameters. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n",

"source": "If a parameter scale is specified (`parameterScale=lin|log|log10`), the distribution parameters are used as is without applying the `parameterScale` to them. The exception are the `parameterScale*`-type distributions, as explained below. In the context of PEtab prior distributions, `parameterScale` will only be used for the start point sampling for optimization, where the sample will be transformed accordingly. This is demonstrated below. The left plot always shows the prior distribution for unscaled parameter values, and the right plot shows the prior distribution for scaled parameter values. Note that in the objective function, the prior is always on the unscaled parameters.\n",

Just a suggestion to make it easier to follow, I also found this easier to understand once I understood that parameterScale* is explained afterwards.

dweindl self-assigned this Dec 7, 2024

dweindl force-pushed the 330_truncated branch 3 times, most recently from 90946f3 to f4b5153 Compare December 11, 2024 15:53

dweindl force-pushed the 330_truncated branch from f4b5153 to 2fd6da8 Compare December 11, 2024 16:21

optional

65ef80f

dweindl force-pushed the 330_truncated branch from e716487 to 65ef80f Compare December 11, 2024 18:03

dweindl marked this pull request as ready for review December 11, 2024 18:42

dweindl requested review from m-philipps and a team as code owners December 11, 2024 18:42

dweindl commented Dec 11, 2024

View reviewed changes

dilpath reviewed Dec 11, 2024

View reviewed changes

dweindl and others added 3 commits December 11, 2024 21:40

fix cdf normalization

c01f2fb

review

d3b4e7f

Merge branch 'develop' into 330_truncated

057457f

dweindl linked an issue Dec 11, 2024 that may be closed by this pull request

Proper handling of parameter bounds in prior / startpoint sampling #330

Open

dweindl and others added 5 commits December 11, 2024 23:54

fix cdf/pdf outside bounds / <0

1425d9c

Always sample correctly, but optionally use unscaled pdf for neglogprior

155853f

prior always on linear

2484a7f

Fix Prior.from_par_dict for missing priorParameters columns (PEtab-de…

a17aa62

…v#341) Previously, missing `*PriorParameters` would have resulted in a KeyError.

Merge branch 'develop' into 330_truncated

6f005b8

m-philipps approved these changes Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement proper truncation for prior distributions #335

Implement proper truncation for prior distributions #335

dweindl commented Dec 7, 2024 •

edited

Loading

codecov-commenter commented Dec 11, 2024 •

edited

Loading

dweindl Dec 11, 2024

dilpath Dec 11, 2024

dweindl Dec 11, 2024

dilpath Dec 11, 2024

dweindl Dec 11, 2024 •

edited

Loading

dilpath Dec 11, 2024

dilpath Dec 11, 2024

dilpath Dec 11, 2024

dweindl Dec 11, 2024

dilpath Dec 11, 2024

dweindl Dec 11, 2024 •

edited

Loading

dweindl commented Dec 11, 2024 •

edited

Loading

m-philipps left a comment

m-philipps Jan 12, 2025

m-philipps Jan 12, 2025

m-philipps Jan 12, 2025

m-philipps Jan 13, 2025

m-philipps Jan 13, 2025

	"""Exponentiate / undo the log transformation according.
	"""Exponentiate / undo the log transformation if applicable.

		``None`` if the distribution is not truncated. The truncation limits
		are the truncation limits of the transformed distribution.

Implement proper truncation for prior distributions #335

Are you sure you want to change the base?

Implement proper truncation for prior distributions #335

Conversation

dweindl commented Dec 7, 2024 • edited Loading

codecov-commenter commented Dec 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dweindl Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dweindl Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

dweindl commented Dec 11, 2024 • edited Loading

m-philipps left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dweindl commented Dec 7, 2024 •

edited

Loading

codecov-commenter commented Dec 11, 2024 •

edited

Loading

dweindl Dec 11, 2024 •

edited

Loading

dweindl Dec 11, 2024 •

edited

Loading

dweindl commented Dec 11, 2024 •

edited

Loading