-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample
function to List module, add log
and exp
functions to Float module
#772
Conversation
I would say the only concern is that replacing a value in the reservoir takes O(n) since its random replacement of a linked list. I also have an implementation using a dict instead if that is preferred? |
For the dict version I do this: let reservoir =
reservoir
|> map2(range(0, k - 1), _, fn(a, b) { #(a, b) })
|> dict.from_list
let w = float.exp(log_random() /. int.to_float(k))
do_sample(list, reservoir, k, k, w)
|> dict.fold([], fn(acc, _, v) { [v, ..acc] }) which feels expensive but I also I think you'll pay more for the random replacement cost of |
I did a simple comparison for keeping reservoir as a I did 10_000 iterations of sampling 500 samples from a list of 1_000
I just used the builtin Linux |
Another change is converting the reservoir to a dict before checking length The most common case is likely to be taking less samples than the input length, so we can make that the "fast" path The current implementation checks the length of the reservoir before we convert it to a dict This is so that we don't pay the cost of turning the list to a dict when the sample size is larger than the input size But since this is the "slow" path, we can make that more expensive to cheapen the length check cost for the "fast" path |
I tested it with samples of length 500_000 and it made no difference lol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This will be super useful. I've left some small notes inline 🙏
src/gleam/list.gleam
Outdated
} | ||
|
||
fn log_random() -> Float { | ||
let assert Ok(random) = float.log(float.random() +. 2.220446049250313e-16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the significance of this number float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the result from float.random to be non-zero for log
. I kind of forgot why that specific value, something related to float error. Should I use something like this value instead: rust f64::MIN_POSITIVE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched it to use the MIN_POSITIVE value
} | ||
} | ||
|
||
fn log_random() -> Float { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this function do? The name doesn't help me understand, so I'm not sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algo L takes the log of a uniform random value but float.random is inclusive of 0. I need to add a small value to it as log(0) is undefined (a Result(Nil)
) to assert its not an error. I can inline it if you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it in 3 places across sample
and sample_loop
so I thought it would be less noisy to extract it
I updated per your feedback @lpil. Let me know if there are any other changes you'd like! |
Would you like me to squash the commits? Any other changes you would like? @lpil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful work! Thank you very much!!
I implemented the sample function based on the discussion in issue #683
This PR contains an implementation of Algo L
I also tested Algo R but it was slower
Algo L requires
log
andexp
, hence their addition to the float modulet