Add `sample` function to List module, add `log` and `exp` functions to Float module #772

ethanthoma · 2024-12-18T05:29:55Z

I implemented the sample function based on the discussion in issue #683

This PR contains an implementation of Algo L

I also tested Algo R but it was slower

Algo L requires log and exp, hence their addition to the float modulet

…ions to Float module

ethanthoma · 2024-12-18T08:11:39Z

I would say the only concern is that replacing a value in the reservoir takes O(n) since its random replacement of a linked list. I also have an implementation using a dict instead if that is preferred?

ethanthoma · 2024-12-18T08:13:34Z

For the dict version I do this:

          let reservoir =
            reservoir
            |> map2(range(0, k - 1), _, fn(a, b) { #(a, b) })
            |> dict.from_list

          let w = float.exp(log_random() /. int.to_float(k))

          do_sample(list, reservoir, k, k, w)
          |> dict.fold([], fn(acc, _, v) { [v, ..acc] })

which feels expensive but I also I think you'll pay more for the random replacement cost of List

ethanthoma · 2024-12-18T08:20:07Z

I did a simple comparison for keeping reservoir as a List or changing it to a dict

I did 10_000 iterations of sampling 500 samples from a list of 1_000

List: 1:48 minutes
Dict: 0:08 minutes

I just used the builtin Linux time which IK is not benchmark precision but I think that's a sufficiently large difference for it to not matter

ethanthoma · 2024-12-18T08:30:01Z

Another change is converting the reservoir to a dict before checking length

The most common case is likely to be taking less samples than the input length, so we can make that the "fast" path

The current implementation checks the length of the reservoir before we convert it to a dict

This is so that we don't pay the cost of turning the list to a dict when the sample size is larger than the input size

But since this is the "slow" path, we can make that more expensive to cheapen the length check cost for the "fast" path

ethanthoma · 2024-12-18T08:47:37Z

I tested it with samples of length 500_000 and it made no difference lol

lpil

Thank you! This will be super useful. I've left some small notes inline 🙏

CHANGELOG.md

src/gleam/float.gleam

lpil · 2024-12-20T14:50:12Z

src/gleam/list.gleam

+}
+
+fn log_random() -> Float {
+  let assert Ok(random) = float.log(float.random() +. 2.220446049250313e-16)


What's the significance of this number float?

I need the result from float.random to be non-zero for log. I kind of forgot why that specific value, something related to float error. Should I use something like this value instead: rust f64::MIN_POSITIVE?

I switched it to use the MIN_POSITIVE value

src/gleam/list.gleam

lpil · 2024-12-20T14:53:30Z

src/gleam/list.gleam

+  }
+}
+
+fn log_random() -> Float {


What does this function do? The name doesn't help me understand, so I'm not sure!

Algo L takes the log of a uniform random value but float.random is inclusive of 0. I need to add a small value to it as log(0) is undefined (a Result(Nil)) to assert its not an error. I can inline it if you prefer?

I use it in 3 places across sample and sample_loop so I thought it would be less noisy to extract it

ethanthoma · 2024-12-20T18:55:50Z

I updated per your feedback @lpil. Let me know if there are any other changes you'd like!

ethanthoma · 2024-12-21T20:23:56Z

Would you like me to squash the commits? Any other changes you would like? @lpil

lpil

Beautiful work! Thank you very much!!

ethanthoma added 2 commits December 17, 2024 21:29

feat: Add sample function to List module, add log and exp funct…

876ad6b

…ions to Float module

tidy: use split instead of take and drop

f791f61

feat: dict for reservoir

2ef7aa2

tidy: use dict.values for list.sample instead

55f7742

ethanthoma added 2 commits December 19, 2024 13:46

Merge branch 'main' into main

7768408

Update CHANGELOG.md

e528e75

lpil reviewed Dec 20, 2024

View reviewed changes

tidy: update sample function per feedback

79dc570

lpil approved these changes Dec 21, 2024

View reviewed changes

lpil merged commit 9d76bea into gleam-lang:main Dec 21, 2024
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `sample` function to List module, add `log` and `exp` functions to Float module #772

Add `sample` function to List module, add `log` and `exp` functions to Float module #772

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

lpil left a comment

lpil Dec 20, 2024

ethanthoma Dec 20, 2024

ethanthoma Dec 20, 2024

lpil Dec 20, 2024

ethanthoma Dec 20, 2024

ethanthoma Dec 20, 2024

ethanthoma commented Dec 20, 2024

ethanthoma commented Dec 21, 2024

lpil left a comment

Add sample function to List module, add log and exp functions to Float module #772

Add sample function to List module, add log and exp functions to Float module #772

Conversation

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

ethanthoma commented Dec 18, 2024

lpil left a comment

Choose a reason for hiding this comment

lpil Dec 20, 2024

Choose a reason for hiding this comment

ethanthoma Dec 20, 2024

Choose a reason for hiding this comment

ethanthoma Dec 20, 2024

Choose a reason for hiding this comment

lpil Dec 20, 2024

Choose a reason for hiding this comment

ethanthoma Dec 20, 2024

Choose a reason for hiding this comment

ethanthoma Dec 20, 2024

Choose a reason for hiding this comment

ethanthoma commented Dec 20, 2024

ethanthoma commented Dec 21, 2024

lpil left a comment

Choose a reason for hiding this comment

Add `sample` function to List module, add `log` and `exp` functions to Float module #772

Add `sample` function to List module, add `log` and `exp` functions to Float module #772