Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement insert_many #111

Closed
wants to merge 3 commits into from

Conversation

Jujumba
Copy link
Contributor

@Jujumba Jujumba commented Jul 17, 2024

@Jujumba Jujumba changed the title Initial implementation of insert_many Implement insert_many Jul 17, 2024
@pczarn
Copy link
Contributor

pczarn commented Jul 17, 2024

Thanks. I may check within two days

@Jujumba Jujumba changed the title Implement insert_many WIP: Implement insert_many Jul 17, 2024
@pczarn
Copy link
Contributor

pczarn commented Jul 17, 2024

@Jujumba by the way, fn insert got released with 0.8

@Jujumba Jujumba force-pushed the insert_many_impl branch from 3cb1937 to 149417c Compare July 17, 2024 20:35
@pczarn
Copy link
Contributor

pczarn commented Jul 19, 2024

Seems to allow up to 32 bits inserted, and we'd like to insert any number

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 19, 2024

Seems to allow up to 32 bits inserted, and we'd like to insert any number

Oh, you are right, missed this edge case

@pczarn
Copy link
Contributor

pczarn commented Jul 20, 2024

@Jujumba I would prefer the algorithm from ferrilabs that I explained in one of the comments, though if you make it half as fast like here then I can merge as a first proof of concept because the function is not so likely to be used a lot and can be improved later.

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 20, 2024

I've been thinking about it for a while, taking into account all edge-cases the problem becomes genuinely hard

I'll try to come up with something reasonable soon

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 20, 2024

Perhaps we could rotate block several times (if k elements should be inserted, and k > B::bits()), with each rotation <= B::bits()

This way it's possible to insert any number of bits crossing any number of blocks

@pczarn
Copy link
Contributor

pczarn commented Jul 20, 2024

Yes, though it's basically the same as a copy

overlapping / non-overlapping source and destination - these are important cases

@pczarn
Copy link
Contributor

pczarn commented Jul 20, 2024

The algorithm based on ferrilab's I described in the issue would be best, I believe

@pczarn
Copy link
Contributor

pczarn commented Jul 20, 2024

Load a block N, load block N+1, shift / rotate, store block, it's as simple as that

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 21, 2024

Load a block N, load block N+1, shift / rotate, store block, it's as simple as that

Then I think the problem comes with next blocks, as now we have a carry which spans two blocks, so we must consider two blocks at a time when shifting

(Or I am wrong in the first place and just don't see the efficient algorithm)

But I think I have worked this out, though I algorithm I come up with a large space for improvements

Will push it soon

@pczarn

@Jujumba Jujumba marked this pull request as draft July 21, 2024 19:31
@Jujumba Jujumba force-pushed the insert_many_impl branch 2 times, most recently from d928fa8 to c143a52 Compare July 21, 2024 19:34
@pczarn
Copy link
Contributor

pczarn commented Jul 21, 2024

Then I think the problem comes with next blocks, as now we have a carry which spans two blocks, so we must consider two blocks at a time when shifting

Nope, copying block X to block X+Y means you do Y*32 bit shift for free

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 21, 2024

Then I think the problem comes with next blocks, as now we have a carry which spans two blocks, so we must consider two blocks at a time when shifting

Nope, copying block X to block X+Y means you do Y*32 bit shift for free

Hmmmm... But if we are inserted, lets say 65 bits at the middle of a block, then it's not so trivial, isn't it?

And also, could you watch at a current implementation?

@pczarn

@Jujumba Jujumba force-pushed the insert_many_impl branch from c143a52 to c30cf8a Compare July 21, 2024 21:22
@pczarn
Copy link
Contributor

pczarn commented Jul 22, 2024

The time complexity is exponential: O(n^2). I absolutely can't accept this.

@pczarn
Copy link
Contributor

pczarn commented Jul 22, 2024

then it's not so trivial

And yet it's the only right way.

@pczarn
Copy link
Contributor

pczarn commented Jul 22, 2024

Your code will do close to 100,000 shift operations when inserting 1,000 elements before 1,000 blocks.

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 22, 2024

then it's not so trivial

And yet it's the only right way.

Then could you provide some clues in code? I don't get your written explanation of algorithm such as "assign state to the first block" 😔

@pczarn

@pczarn
Copy link
Contributor

pczarn commented Jul 22, 2024

Ok, I'll be back within a few days. If you feel like it, you may start any of the other basic issues.

I appreciate your questions and during our discussions I came up with the idea of using of SmallVec in BitVec.

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 23, 2024

Okay, so I have come up with a better algorithm, which does only one rotation :)

@pczarn


for i in ((block_at + 1)..=(block_at + block_offset)).rev() {
let block = core::mem::replace(&mut self.storage[i], B::zero());
self.storage[i + block_offset] = block;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very close to the algorithm we've been looking for. You may try to do this copy + the rotation in one go in one loop over the blocks.

Copy link
Contributor Author

@Jujumba Jujumba Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this?

let offset = bits.len() % B::bits();
for i in ((block_at + 1)..=(block_at + block_offset)).rev() {
    let block = core::mem::replace(&mut self.storage[i], B::zero());
    let carry = block >> (B::bits() - offset);
    self.storage[i + block_offset] = block >> offset;
    self.storage[i + block_offset + 1] = self.storage[i + block_offset + 1] | carry;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. You can get carry from the previous iteration, not the current iteration. Then the change to self.storage[i + block_offset + 1] will wait until another iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to self.storage[i + block_offset + 1] is not really needed

Copy link
Contributor Author

@Jujumba Jujumba Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. You can get carry from the previous iteration, not the current iteration. Then the change to self.storage[i + block_offset + 1] will wait until another iteration.

And then I have to start from block_at + 2, right?

I'm intentionally starting from the next block, because the bits after bit_at in the first block are tricky, and may get to different blocks, shifting them is not so trivial (though there is a decent chance I'm wrong)

@pczarn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going in reverse is a good idea. And, modern processors handle reverse iteration as good as normal forward iteration.

I will try to dig deeper into your code within a few days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm a bit busy at my job and have some other stuff going on. I didn't lost interest in closing this and will eventually return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, got down to it again, and there are so many corner cases which makes the implementation of this function extremely hard :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jujumba Okay, I've been thinking whether better give advice or finish myself. Seems like some complex stuff, I should get it working within a few days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jujumba With algorithms like these, often there are tricks we may come up with to deal with edge cases effectively and make our work simpler.

@pczarn
Copy link
Contributor

pczarn commented Jul 23, 2024

I very much recommend drawing bitvecs and operations of your code on paper. You may draw a bitvec with 4 bits per block instead of 32.

@Jujumba
Copy link
Contributor Author

Jujumba commented Jul 23, 2024

I very much recommend drawing bitvecs and operations of your code on paper. You may draw a bitvec with 4 bits per block instead of 32.

That's what I've been doing actually :)

.map(|bit| if bit { B::one() } else { B::zero() })
.enumerate()
{
self.storage[(at + index) / B::bits()] = self.storage[(at + index) / B::bits()] | bit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving bit-by-bit is slow, in fact, 32 times slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually accept BitSlice right here, once that is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving bit-by-bit is slow, in fact, 32 times slower.

There is no other way to do it, at least now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jujumba Exposing it in the public API and docs makes us kinda commit to this style of taking in bits. We should probably wait until I have enough time on my hands to do a BitSlice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jujumba Exposing it in the public API and docs makes us kinda commit to this style of taking in bits. We should probably wait until I have enough time on my hands to do a BitSlice.

Okay, so should I close this PR?

And could you, please, review another PR I submitted (when you have some spare time, indeed) 🥺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I think best for this is bits: impl Iterator<Item = bool> and grab 64 bits at a time

@Jujumba Jujumba closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants