-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implement insert_many
#111
Conversation
Thanks. I may check within two days |
@Jujumba by the way, |
3cb1937
to
149417c
Compare
Seems to allow up to 32 bits inserted, and we'd like to insert any number |
Oh, you are right, missed this edge case |
@Jujumba I would prefer the algorithm from ferrilabs that I explained in one of the comments, though if you make it half as fast like here then I can merge as a first proof of concept because the function is not so likely to be used a lot and can be improved later. |
I've been thinking about it for a while, taking into account all edge-cases the problem becomes genuinely hard I'll try to come up with something reasonable soon |
Perhaps we could rotate block several times (if This way it's possible to insert any number of bits crossing any number of blocks |
Yes, though it's basically the same as a copy overlapping / non-overlapping source and destination - these are important cases |
The algorithm based on ferrilab's I described in the issue would be best, I believe |
Load a block N, load block N+1, shift / rotate, store block, it's as simple as that |
Then I think the problem comes with next blocks, as now we have a carry which spans two blocks, so we must consider two blocks at a time when shifting (Or I am wrong in the first place and just don't see the efficient algorithm) But I think I have worked this out, though I algorithm I come up with a large space for improvements Will push it soon |
d928fa8
to
c143a52
Compare
Nope, copying block X to block X+Y means you do Y*32 bit shift for free |
Hmmmm... But if we are inserted, lets say 65 bits at the middle of a block, then it's not so trivial, isn't it? And also, could you watch at a current implementation? |
c143a52
to
c30cf8a
Compare
The time complexity is exponential: O(n^2). I absolutely can't accept this. |
And yet it's the only right way. |
Your code will do close to 100,000 shift operations when inserting 1,000 elements before 1,000 blocks. |
Then could you provide some clues in code? I don't get your written explanation of algorithm such as "assign state to the first block" 😔 |
Ok, I'll be back within a few days. If you feel like it, you may start any of the other basic issues. I appreciate your questions and during our discussions I came up with the idea of using of SmallVec in BitVec. |
Okay, so I have come up with a better algorithm, which does only one rotation :) |
|
||
for i in ((block_at + 1)..=(block_at + block_offset)).rev() { | ||
let block = core::mem::replace(&mut self.storage[i], B::zero()); | ||
self.storage[i + block_offset] = block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very close to the algorithm we've been looking for. You may try to do this copy + the rotation in one go in one loop over the blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like this?
let offset = bits.len() % B::bits();
for i in ((block_at + 1)..=(block_at + block_offset)).rev() {
let block = core::mem::replace(&mut self.storage[i], B::zero());
let carry = block >> (B::bits() - offset);
self.storage[i + block_offset] = block >> offset;
self.storage[i + block_offset + 1] = self.storage[i + block_offset + 1] | carry;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. You can get carry from the previous iteration, not the current iteration. Then the change to self.storage[i + block_offset + 1]
will wait until another iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to self.storage[i + block_offset + 1]
is not really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. You can get carry from the previous iteration, not the current iteration. Then the change to
self.storage[i + block_offset + 1]
will wait until another iteration.
And then I have to start from block_at + 2
, right?
I'm intentionally starting from the next block, because the bits after bit_at
in the first block are tricky, and may get to different blocks, shifting them is not so trivial (though there is a decent chance I'm wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going in reverse is a good idea. And, modern processors handle reverse iteration as good as normal forward iteration.
I will try to dig deeper into your code within a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I'm a bit busy at my job and have some other stuff going on. I didn't lost interest in closing this and will eventually return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, got down to it again, and there are so many corner cases which makes the implementation of this function extremely hard :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jujumba Okay, I've been thinking whether better give advice or finish myself. Seems like some complex stuff, I should get it working within a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jujumba With algorithms like these, often there are tricks we may come up with to deal with edge cases effectively and make our work simpler.
I very much recommend drawing bitvecs and operations of your code on paper. You may draw a bitvec with 4 bits per block instead of 32. |
That's what I've been doing actually :) |
.map(|bit| if bit { B::one() } else { B::zero() }) | ||
.enumerate() | ||
{ | ||
self.storage[(at + index) / B::bits()] = self.storage[(at + index) / B::bits()] | bit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving bit-by-bit is slow, in fact, 32 times slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should eventually accept BitSlice right here, once that is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving bit-by-bit is slow, in fact, 32 times slower.
There is no other way to do it, at least now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jujumba Exposing it in the public API and docs makes us kinda commit to this style of taking in bits. We should probably wait until I have enough time on my hands to do a BitSlice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jujumba Exposing it in the public API and docs makes us kinda commit to this style of taking in bits. We should probably wait until I have enough time on my hands to do a BitSlice.
Okay, so should I close this PR?
And could you, please, review another PR I submitted (when you have some spare time, indeed) 🥺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
I think best for this is bits: impl Iterator<Item = bool>
and grab 64 bits at a time
@pczarn