Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform integer/enum generation performance #363

Open
0xd34df00d opened this issue Dec 16, 2024 · 3 comments
Open

Uniform integer/enum generation performance #363

0xd34df00d opened this issue Dec 16, 2024 · 3 comments

Comments

@0xd34df00d
Copy link
Contributor

Is your feature request related to a problem? Please describe.
My model needs to generate 10⁸ uniform integers in a given range [0; max) (think max ~ 1000). The obvious way to use MonadDistribution API seems to be something like floor <$> uniform 0 max. This works in roughly 7.7 s.
I'm using SamplerST directly so I have the luxury to try out SamplerT $ ReaderT $ uniformRM (0, max - 1) instead, and this works in roughly 0.3 s — more than 20x improvement!
Moreover, in this case I don't even need to think whether I care about uniform ever returning max (which has probability 0 in ℝ but a non-zero, albeit small, probability in computer floats), which is an extra bonus.

Describe the solution you'd like
Having a randomInt (and/or randomIntegral, randomEnum and so on) in MonadDistribution API that has a default implementation via uniform but can be overriden (like in Sampler's case above) would perhaps be the most straightforward approach.

I'd be more than happy to hack on this and open a PR if it's something that aligns with your vision, and I'm also open to any suggestions on the approach!

Describe alternatives you've considered

  • Generalize the current uniform to operate on any UniformRange a instead of just Doubles. This requires no changes to the Sampler instance which already uses uniformRM under the hood, but I can't think off the top of my head how the default implementation (currently done via uniformDistr) would need to be changed for an arbitrary UniformRange.
  • Keep the status quo and delegate this to the users of the library.

Additional context
As a side note, using random directly (as in floor . (* fromIntegral max) <$> random) runs in about 0.75 s — about 10x faster than uniform, so I could also look at trying to improve that as well. Maybe it's just a few INLINEABLEs missing.

@reubenharry
Copy link
Contributor

This strikes me as a good addition (although I haven't thought about it carefully). Maybe @idontgetoutmuch or @turion would have opinions?

@turion
Copy link
Collaborator

turion commented Dec 17, 2024

Thanks for reporting this!

  • Generalize the current uniform to operate on any UniformRange a instead of just Doubles

In principle that would be a good approach. But for backwards compatibility it should be a new class method, maybe uniformR? The disadvantage is that we would lose the desirable property of MonadDistribution having only one required method (random), so this would still break for implementors of the class.

Having a randomInt (and/or randomIntegral, randomEnum and so on) in MonadDistribution API that has a default implementation via uniform but can be overriden

Yes, that would definitely be great!

Another possibility might be a new class:

class MonadDistribution m => MonadUniformRange m where
  uniformR :: UniformRange a => a -> a -> m a

Then this could be implemented for SamplerT, and lifted (because it's first order in the effect) for all the other transformers.

I'd welcome a PR for the second or the third variant!

@turion
Copy link
Collaborator

turion commented Dec 17, 2024

As a side note, using random directly (as in floor . (* fromIntegral max) <$> random) runs in about 0.75 s — about 10x faster than uniform, so I could also look at trying to improve that as well. Maybe it's just a few INLINEABLEs missing.

That sounds alarming and interesting 😅 if you can find the cause that would be very useful. How are you calling the final sampling effect? Are you using sampleIO or sampleSTFixed really only once, or maybe once per sample (which is expected to be slower)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants