Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsql: add durable_frame_num caching and metadata file #1830

Merged
merged 12 commits into from
Nov 22, 2024

Conversation

LucioFranco
Copy link
Contributor

This PR adds both a metadata file that is used to cache the durable_frame_num to allow us to only send the frames the server needs. This also adds some basic logging to allow us to track what frames and generations are being pushed to the server.

Closes #1818

❯ RUST_LOG=libsql::sync=debug cargo run --example offline_writes
    Blocking waiting for file lock on build directory
   Compiling libsql v0.6.0 (/home/lucio/code/libsql/libsql)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 10.14s
     Running `target/debug/examples/offline_writes`
2024-11-19T19:57:20.054954Z DEBUG libsql::sync: no metadata info file found
Please write your entry to the guestbook:
hello
You entered: hello

Guest book entries:
  cargo run --example offline_writeshello

  hello

  hello

  hello

  hello

Syncing database to remote...
2024-11-19T19:57:23.677365Z DEBUG push_one_frame{generation=1 frame_no=1}: libsql::sync: pushing frame
2024-11-19T19:57:23.781639Z DEBUG push_one_frame{generation=1 frame_no=1}: libsql::sync: frame successfully pushed durable_frame_num=6
2024-11-19T19:57:23.781975Z DEBUG push_one_frame{generation=1 frame_no=7}: libsql::sync: pushing frame
2024-11-19T19:57:23.837472Z DEBUG push_one_frame{generation=1 frame_no=7}: libsql::sync: frame successfully pushed durable_frame_num=6
Done!
libsql on  lucio/cache-server-frame [$!?] via 🦀 v1.81.0
❯ RUST_LOG=libsql::sync=debug cargo run --example offline_writes
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/examples/offline_writes`
Please write your entry to the guestbook:
bye
You entered: bye

Guest book entries:
  cargo run --example offline_writeshello

  hello

  hello

  hello

  hello

  bye

Syncing database to remote...
2024-11-19T19:57:42.074752Z DEBUG push_one_frame{generation=1 frame_no=7}: libsql::sync: pushing frame
2024-11-19T19:57:42.118522Z DEBUG push_one_frame{generation=1 frame_no=7}: libsql::sync: frame successfully pushed durable_frame_num=7
2024-11-19T19:57:42.118936Z DEBUG push_one_frame{generation=1 frame_no=8}: libsql::sync: pushing frame
2024-11-19T19:57:42.162425Z DEBUG push_one_frame{generation=1 frame_no=8}: libsql::sync: frame successfully pushed durable_frame_num=7
Done!

@LucioFranco
Copy link
Contributor Author

@penberg in the logs above I see it is re-sending frame 7 because the server on the previous call returned max_frame_no 6 when we sent frame_no 7. Is there an off by one on the server side or is that to be expected with the protocol?

libsql/src/sync.rs Outdated Show resolved Hide resolved
// Update our last known max_frame_no from the server.
self.durable_frame_num = durable_frame_num;

self.write_metadata().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to write metadata on every frame? Maybe we could do that after each batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being fixed in commit de24d82

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I made a mistake it wasn't done in that commit but its done here 26ac07e

@penberg
Copy link
Collaborator

penberg commented Nov 20, 2024

@LucioFranco What happens if the metadata file is corrupted? The code should recover from that by trying to push the whole log.

This adds a new `<database>-info` file that is a json file containing
metadata for syncing databases. At this point it just contains the
`max_frame_no` from the server to optimize sending wal frames that
already exist on the server.

Closes #1818
@LucioFranco LucioFranco force-pushed the lucio/cache-server-frame branch from 4a8460f to 203fc49 Compare November 20, 2024 16:44
libsql/src/sync.rs Outdated Show resolved Hide resolved
libsql/src/sync.rs Outdated Show resolved Hide resolved
@LucioFranco LucioFranco requested a review from haaawk November 20, 2024 18:58
@penberg
Copy link
Collaborator

penberg commented Nov 21, 2024

@LucioFranco I corrupted the info file as follows:

{"hash":3168545583"version":0,"durable_frame_num":120,"generation":1}

and now I see this error:

Running `/Users/penberg/src/tursodatabase/libsql/target/debug/examples/offline_writes`
thread 'main' panicked at /Users/penberg/src/tursodatabase/libsql/libsql/src/sync.rs:161:78:
called `Result::unwrap()` on an `Err` value: Error("expected `,` or `}`", line: 1, column: 19)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

As I said before, the info file is for caching purposes. If we fail to read it, we should just discard it and regenerate it after chatting with the server.

@LucioFranco
Copy link
Contributor Author

@penberg that has been fixed/implemented and there are a bunch of tests including a corrupted metadata file.

@LucioFranco LucioFranco force-pushed the lucio/cache-server-frame branch from 7baa9a0 to 11cce47 Compare November 21, 2024 19:13
@@ -402,7 +402,7 @@ impl Database {

let max_frame_no = conn.wal_frame_count();

let generation = 1; // TODO: Probe from WAL.
let generation = sync_ctx.generation(); // TODO: Probe from WAL.
let start_frame_no = sync_ctx.durable_frame_num() + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we cache durable frame number, we need to be prepared for the scenario where remote database got deleted. IOW. the durable frame number we have here might be higher than what is actually on server side.

@LucioFranco LucioFranco added this pull request to the merge queue Nov 22, 2024
Merged via the queue into main with commit 4a5f373 Nov 22, 2024
18 checks passed
@LucioFranco LucioFranco deleted the lucio/cache-server-frame branch November 22, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache server-side frame number in client
3 participants