Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended balance_root to handle root underflow case #621

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

krishvishal
Copy link
Contributor

Balance root now handles root underflow case:

  • Root becomes underflow when its empty. Its a special case.
  • We try and copy root's child into root.
  • If successful we delete the child and update pointers.
  • If unsuccessful because the child can't fit inside the root, we early return and handle the child overflow in balance_leaf

@krishvishal krishvishal marked this pull request as ready for review January 5, 2025 14:38
… trying to convert overflow pages to PageType variant

I have changed places which use `page_type` function to use `maybe_page_type`
@krishvishal
Copy link
Contributor Author

I'm not sure why the test is failing only on windows. I have fixed the same error on my ubuntu machine. This is the only check that's failing.

core/storage/btree.rs Outdated Show resolved Hide resolved
core/storage/btree.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we handle root underflow? Balance root tries to map 1:1 to balance_deeper implementation which is an overflow balancing of root. Underflow should be managed generically for all cases. Furthermore, can a root page underflow even? Isn't it possible to create a btree page that once you do first insertion it will see itself as underfull?

@krishvishal
Copy link
Contributor Author

@pereman2 I've looked at both the references below and thought it is better to handle root underflow case in the balance_root itself.

I think the comment from [1] describes root underflow case.

    /* The root page of the b-tree now contains no cells. The only sibling
    ** page is the right-child of the parent. Copy the contents of the
    ** child page into the parent, decreasing the overall height of the
    ** b-tree structure by one. This is described as the "balance-shallower"
    ** sub-algorithm in some documentation.
    **
    ** If this is an auto-vacuum database, the call to copyNodeContent()
    ** sets all pointer-map entries corresponding to database image pages
    ** for which the pointer is stored within the content being copied.
    **
    ** It is critical that the child page be defragmented before being
    ** copied into the parent, because if the parent is page 1 then it will
    ** by smaller than the child due to the database header, and so all the
    ** free space needs to be up front.


[1] https://github.com/sqlite/sqlite/blob/master/src/btree.c#L8867-L8881
[2]https://www.sqlite.org/btreemodule.html#:~:text=4.2.5.2%20Balance%20Shallower

core/storage/btree.rs Outdated Show resolved Hide resolved
core/storage/btree.rs Outdated Show resolved Hide resolved
core/storage/btree.rs Outdated Show resolved Hide resolved
core/storage/btree.rs Outdated Show resolved Hide resolved
@@ -1285,8 +1400,7 @@ impl BTreeCursor {

// setup overflow page
let contents = page.get().contents.as_mut().unwrap();
let buf = contents.as_ptr();
buf.fill(0);
contents.write_u32(0, 0); // Initialize next overflow page pointer to 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0, can we use one of the constants? This 0 means PAGE_HEADER_OFFSET_PAGE_TYPE, are you sure this is what this 0 write means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I did is wrong. Will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 can you please tell me what's happening here?: https://github.com/sqlite/sqlite/blob/c2e400af042bdd7d21e159a41fcf34c05398044c/src/btree.c#L7166-L7172

This is my understanding:

// Writes the page number of the new overflow page (pgnoOvfl) to pPrior. For the first overflow page, pPrior points to the end of the local cell content.
put4byte(pPrior, pgnoOvfl);

// ignore this
releasePage(pToRelease);
pToRelease = pOvfl;

// Points pPrior to the start of the new overflow page's data area
pPrior = pOvfl->aData;

// Writes 0 at the start of the overflow page, which shows its the last overflow page
put4byte(pPrior, 0);

// Sets pPayload to point past the 4-byte pointer. This is where the actual data will be stored in the overflow page
pPayload = &pOvfl->aData[4];

//Calculates remaining space for data in this overflow page. Usable space minus 4 bytes used for the next-page pointer
spaceLeft = pBt->usableSize - 4;

Where are we doing put4byte(pPrior, 0); equivalent in limbo?

1. Added an unreachable!() to the if-else under/overflow checker.
2. Directly inserted to cells using `insert_into_cell()` instead of cloning vecs into an external vector
3. Propagated errors using `?` instead of `except` and handled the case where child page is not loaded/locked.
4. Restored `allocate_overflow_page()` function to previous state.
@krishvishal
Copy link
Contributor Author

@pereman2 @jussisaurio I've added the following changes, can you please take a look?

  1. Added an unreachable!() to the if-else under/overflow checker.
  2. Directly inserted to cells using insert_into_cell() instead of cloning vecs into an external vector
  3. Propagated errors using ? instead of except and handled the case where child page is not loaded/locked.
  4. Restored allocate_overflow_page() function to previous state.

…h its overflow cells to dst

2. Added `zero_page` function which zeros's page properly.
3. Fixed the bug on windows which was due to bad offset handling and the test_sequential_overflow_page passes now.
4. Now `balance_root` mirror's closely what sqlite is doing.
@krishvishal
Copy link
Contributor Author

I think this is ready for review.

  1. Added copy_node_content function which copies a page's along with its overflow cells to dst

  2. Added zero_page function which zeros's page properly.

  3. Fixed the bug on windows which was due to bad offset handling and the test_sequential_overflow_page passes now on Windows.

  4. Now balance_root mirror's closely what sqlite does.


I will add tests for root underflow case when delete is functional.

Comment on lines 541 to 549
match self.maybe_page_type() {
Some(page_type) => match page_type {
PageType::IndexInterior => Some(self.read_u32(8)),
PageType::TableInterior => Some(self.read_u32(8)),
PageType::IndexLeaf => None,
PageType::TableLeaf => None,
},
None => None, // Handle overflow pages by returning None
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to return None? Wouldn't it be better to crash if we call any of these methods if we are on an overflow page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these changes. I have made them for compute_used_space which is wrong and not needed for root balancing.

Comment on lines 1265 to 1266
fn zero_page(&self, page: &PageRef, page_type: PageType, is_leaf: bool) {
let contents = page.get().contents.as_mut().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if page is page 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, since write_u16 and other write functions add page header size offsets when writing data to a location we need not handle page1 specially. Is my understanding correct?

Comment on lines 1255 to 1260
let header_and_pointers_size =
src_contents.header_size() + src_contents.cell_pointer_array_size();
dst_buf[..header_and_pointers_size].copy_from_slice(&src_buf[..header_and_pointers_size]);

// Copy overflow cells
dst_contents.overflow_cells = src_contents.overflow_cells.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this handle page 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we would need to add 100 as offset if page_1, since we are directly copying. I will make the necessary edits.

Comment on lines 1215 to 1217
for overflow_cell in &page.overflow_cells {
total_used += overflow_cell.payload.len();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do overflow cells count? I mean, no overflow cell is there inside the page. If there is an overflow cell, regardless of the used space we have to balance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is not needed for now and the calculation is wrong. I'm using this function inside in is_underflow. I've simplified both is_underflow and is_overflow to handle only balance_root case for. I will extend them in the future.

I will remove this function. This can be resolved.

…contents.

2. Refactored `zero_page` to only zero offsets. And added documentation on where and how to use it.
3. Removed erroneous `compute_used_space` function and changed `is_underflow` `is_overflow` to currently only check inside balance_root.
4. Removed changes to `maybe_page_type` function inside sqlite3_ondisk.
fn zero_page(&self, page: &PageRef) {
let contents = page.get().contents.as_mut().unwrap();

contents.write_u32(PAGE_HEADER_OFFSET_FIRST_FREEBLOCK, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 I've written write_u32 here instead of write_u16 because SQLite does memset to 4 bytes: https://github.com/sqlite/sqlite/blob/c2e400af042bdd7d21e159a41fcf34c05398044c/src/btree.c#L2277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants