Skip to content

Implement specialized Hurdle distribution #7810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2025

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Jun 4, 2025

It indirectly addresses the issue reported in in pymc-devs/nutpie#163

The new objects have a logp that handles the discrete + continuous process correctly, without requiring the arbitrary truncation of the latter at epsilon. This provides a cheaper and more stable logp / logcdf.
For discrete variables we keep using a truncation

Also added special logic to truncate a Hurdle distribution which solves bambinos/bambi#768, this is not the desired behavior, reverted it

CC @zwelitunyiswa


📚 Documentation preview 📚: https://pymc--7810.org.readthedocs.build/en/7810/

@ricardoV94 ricardoV94 requested a review from tomicapretto June 4, 2025 12:16
@ricardoV94 ricardoV94 force-pushed the hurdle_mixtures branch 2 times, most recently from 77ed668 to 9c65ca1 Compare June 4, 2025 14:02
@ricardoV94 ricardoV94 changed the title Implement specialized Hurdle distribution Implement specialized Hurdle distribution and allow truncating it Jun 4, 2025
@ricardoV94 ricardoV94 force-pushed the hurdle_mixtures branch 3 times, most recently from 3d5772c to bdb3f12 Compare June 4, 2025 14:15
@ricardoV94 ricardoV94 requested a review from lucianopaz June 4, 2025 14:15
@ricardoV94 ricardoV94 force-pushed the hurdle_mixtures branch 2 times, most recently from ac73b55 to 9a65487 Compare June 4, 2025 14:52
@pymc-devs pymc-devs deleted a comment from review-notebook-app bot Jun 4, 2025
dist
for dist in dists
if (
getattr(dist, "rv_type", None) is not None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was too restrictive, a subclass also inherits the dispatch function, and need not be in the registry explicitly

Copy link

codecov bot commented Jun 4, 2025

Codecov Report

Attention: Patch coverage is 91.42857% with 6 lines in your changes missing coverage. Please review.

Project coverage is 92.88%. Comparing base (3b62f82) to head (1e38719).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pymc/distributions/mixture.py 91.17% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #7810   +/-   ##
=======================================
  Coverage   92.88%   92.88%           
=======================================
  Files         107      107           
  Lines       18377    18389   +12     
=======================================
+ Hits        17069    17081   +12     
  Misses       1308     1308           
Files with missing lines Coverage Δ
pymc/distributions/moments/means.py 100.00% <100.00%> (ø)
pymc/distributions/mixture.py 95.25% <91.17%> (+0.23%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricardoV94 ricardoV94 marked this pull request as draft June 4, 2025 15:59
@zwelitunyiswa
Copy link

It indirectly addresses the issue reported in in pymc-devs/nutpie#163

The new objects have a logp that handles the discrete + continuous process correctly, without requiring the arbitrary truncation of the latter at epsilon. This provides a cheaper and more stable logp / logcdf. For discrete variables we keep using a truncation

Also added special logic to truncate a Hurdle distribution which solves bambinos/bambi#768

CC @zwelitunyiswa

📚 Documentation preview 📚: https://pymc--7810.org.readthedocs.build/en/7810/

@ricardoV94 This is amazing. Thank you so much for this!

@ricardoV94 ricardoV94 changed the title Implement specialized Hurdle distribution and allow truncating it Implement specialized Hurdle distribution Jun 5, 2025
@ricardoV94 ricardoV94 marked this pull request as ready for review June 5, 2025 12:34
Copy link
Contributor

@zaxtax zaxtax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We should probably add a test similar to one that motivate this PR in the first class

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 5, 2025

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything reasonable obvious to test here?

@zaxtax
Copy link
Contributor

zaxtax commented Jun 5, 2025

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything reasonable obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

@ricardoV94
Copy link
Member Author

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything reasonable obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

That was fixed sometime ago in PyTensor: pymc-devs/pytensor#1137

The performance question when in numba is addressed by pymc-devs/pytensor#1445

Neither is PyMC specific

@zaxtax
Copy link
Contributor

zaxtax commented Jun 6, 2025

LGTM! We should probably add a test similar to one that motivate this PR in the first class

We have the pre-existing hurdlesl tests, in a sense this is just a refactor/optimization. Can't think of anything reasonable obvious to test here?

What caused the error originally reported here? pymc-devs/nutpie#163 Does that have a test already?

That was fixed sometime ago in PyTensor: pymc-devs/pytensor#1137

The performance question when in numba is addressed by pymc-devs/pytensor#1445

Neither is PyMC specific

Feel free to merge whenever you feel comfortable! I think it's good to go

Copy link
Member

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I just left two very minor comments

)

return mix_logp
mix_support_point = pt.sum(weights * support_point_components, axis=mix_axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the logsumexp and have log scale weights here? Is it because the weights are already in the 0-1 range and taking the log won’t help with precision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not computing any log quantities nor starting with any log quantities so I don't think it would help. Also the initial point is not so critical?

This does not require the arbitrary truncation of continuous distribution in the logp/logcdf
@ricardoV94 ricardoV94 merged commit 0f1bfa9 into pymc-devs:main Jun 16, 2025
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants