Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/previous next values #424

Merged
merged 24 commits into from
Jun 11, 2024

Conversation

bearrito
Copy link
Contributor

This will address - #380

There a few edge cases I'm not sure about. Namely ones where the the next or previous values would lie completely outside.

@lemire lemire requested review from Oppen and neena May 25, 2024 04:55
Copy link
Contributor

@Oppen Oppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good for the most part. I have a few suggestions and I'd like for the docs to be more clear about the meaning. I had to do some guess work to understand what the functionality achieves, ideally users shouldn't have to do that.

setutil.go Outdated Show resolved Hide resolved
setutil.go Outdated Show resolved Hide resolved
setutil.go Outdated Show resolved Hide resolved
setutil_test.go Outdated
}, 100, false, -1},
{"missing alternating", func() []uint16 {
return []uint16{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 22, 24, 26}
}, 20, false, 9},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder for myself: find a test case for the bugs I pointed out before. The conditions are that:

  1. target is within bounds;
  2. target is not in array;
  3. array exhausts the loop and falls to the default return point.

setutil_test.go Outdated Show resolved Hide resolved
setutil_test.go Outdated Show resolved Hide resolved
arraycontainer.go Outdated Show resolved Hide resolved
bitmapcontainer.go Show resolved Hide resolved
bearrito and others added 4 commits May 31, 2024 14:05
Co-authored-by: Mario Rugiero <[email protected]>
Co-authored-by: Mario Rugiero <[email protected]>
Co-authored-by: Mario Rugiero <[email protected]>
@lemire
Copy link
Member

lemire commented May 31, 2024

running tests

@bearrito bearrito changed the title [DRAFT]: Feature/previous next values Feature/previous next values Jun 2, 2024
@Oppen
Copy link
Contributor

Oppen commented Jun 2, 2024

Running tests.

@bearrito
Copy link
Contributor Author

bearrito commented Jun 2, 2024

Looks like these passed. I still need to implement previousAbsentValue for array containers. That should be straightforward.

@lemire
Copy link
Member

lemire commented Jun 3, 2024

Running tests.

@bearrito
Copy link
Contributor Author

bearrito commented Jun 3, 2024

I'll look at the failures. They are on my end.

@Oppen
Copy link
Contributor

Oppen commented Jun 3, 2024

Running again.

@bearrito
Copy link
Contributor Author

bearrito commented Jun 5, 2024

Tests pass. I don't think I can think of how to apply any more test pressure.

I did have a question about the overall design. I more or less referred to the Java implementation. I had a question about what was done there and how users use this feature. Why do we not for a given container, especially for absent value return values that are in the keyspace of a given key.

Example: For a run container with HOB X, we can calculate the range of of values that could be under X, call that the image of X. Today if the query/target is less than min(Union(ranges)) we return -1 meaning out of bound. Why would we not min(Union(ranges))-1 if that value was in the image of X.

What are the users query patterns that make that not useful?

@lemire
Copy link
Member

lemire commented Jun 5, 2024

For a run container with HOB X, we can calculate the range of of values that could be under X, call that the image of X. Today if the query/target is less than min(Union(ranges)) we return -1 meaning out of bound. Why would we not min(Union(ranges))-1 if that value was in the image of X.

Can you elaborate?

What code do you propose changing?

@bearrito bearrito closed this Jun 7, 2024
@lemire lemire reopened this Jun 7, 2024
@lemire
Copy link
Member

lemire commented Jun 7, 2024

I have reopened the issue as I think it was closed by mistake.

Let me go back to the Java functionality...

  /**
   * Returns the first value equal to or larger than the provided value
   * (interpreted as an unsigned integer). If no such
   * bit exists then {@code -1} is returned. It is not necessarily a
   * computationally effective way to iterate through the values.
   *
   * @param  fromValue the lower bound (inclusive)
   * @return the smallest value larger than or equal to the specified value,
   *       or {@code -1} if there is no such value
   */
  long nextValue(int fromValue);
  
  /**
   * Returns the first value less than or equal to the provided value
   * (interpreted as an unsigned integer). If no such
   * bit exists then {@code -1} is returned. It is not an efficient
   * way to iterate through the values backwards.
   *
   * @param  fromValue the upper bound (inclusive)
   * @return the largest value less than or equal to the specified value,
   *       or {@code -1} if there is no such value
   */
  long previousValue(int fromValue);

The current PR only addresses the underlying containers but does not yet exposes the functionality to the users.

In Java, we return a -1 (long) when no value is present.

@bearrito
Copy link
Contributor Author

bearrito commented Jun 7, 2024

@lemire yeah I closed it on purpose, so it would not accidentally be merged.

I noticed that as well. I'm implementing that functionality.

I had a few other edge-case bugs. What I did for those was to port all the java tests over and confirm that we match the java tests.

@lemire
Copy link
Member

lemire commented Jun 7, 2024

@bearrito Your work prompted me to do some work in Javaland:

RoaringBitmap/RoaringBitmap#720

... as it was ill-defined in some edge cases.

@bearrito
Copy link
Contributor Author

bearrito commented Jun 8, 2024

@lemire

As mentioned I ported the Java tests line for line over for runs and arrays. I didn't really understand the Java fluent data builder used by bitmaps was doing so I couldn't port for bitmaps. I'm happy with the port, that's a good safety net. I picked that last release and included doc links to the corresponding java tests.

roaring.go Outdated Show resolved Hide resolved
roaring.go Outdated Show resolved Hide resolved
roaring.go Outdated Show resolved Hide resolved
roaring.go Outdated Show resolved Hide resolved
roaring.go Outdated Show resolved Hide resolved
@bearrito
Copy link
Contributor Author

@lemire
My fuzz tests are failing. I think I have an overflow issue. But they are passing on my machine

--- PASS: TestNextAndPreviousValue (0.05s)
    --- PASS: TestNextAndPreviousValue/randomized (0.05s)

Is this an architecture issue? I'll look at it more.

roaring.go Outdated
container := rb.highlowcontainer.getContainer(containerKey)
// if containerKey > orginalKey then we are past the container which mapped to the orignal key
// in that case we can just return the minimum from that container
var responseBit int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var responseBit int
var responseBit int64

roaring.go Outdated
if err == nil {
responseBit = -1
}
responseBit = int(bit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
responseBit = int(bit)
responseBit = int64(bit)

roaring.go Outdated
if responseBit == -1 {
nextValue = -1
} else {
nextValue = int(combineLoHi32(uint32(responseBit), uint32(containerKey)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nextValue = int(combineLoHi32(uint32(responseBit), uint32(containerKey)))
nextValue = int64(combineLoHi32(uint32(responseBit), uint32(containerKey)))

@lemire
Copy link
Member

lemire commented Jun 11, 2024

@bearrito I have marked some suggested changes. The problem is that there is a hidden assumption that int spans 64 bits, but it does not. It is either 32 bits or 64 bits. So you should only use int when it is ok if int == int32.

Generally, it is safer to use int64 than int.

@lemire
Copy link
Member

lemire commented Jun 11, 2024

Is this an architecture issue? I'll look at it more.

Yes. It is. On a 32-bit platform they would fail.

Try setting GOARCH to 386 or other 32-bit system.

@bearrito
Copy link
Contributor Author

Thanks. I'm generally poor at low-level bit-thinking/machine sympathy (partly why I wanted to contribute to this project)
I'll correct it.

@lemire
Copy link
Member

lemire commented Jun 11, 2024

Thanks. I'm generally poor at low-level bit-thinking/machine sympathy (partly why I wanted to contribute to this project)
I'll correct it.

It is not at all trivial that int has a different size depending on the system. :-)

@bearrito
Copy link
Contributor Author

Updated. Tested locally by toggling GOARCH

@lemire
Copy link
Member

lemire commented Jun 11, 2024

Looks good to me. If you recommend merging, we will do so.

@bearrito
Copy link
Contributor Author

I'm satisfied, please merge.

@lemire lemire merged commit 612b5f6 into RoaringBitmap:master Jun 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants