-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: added wal/snapshot doc #25856
base: main
Are you sure you want to change the base?
Conversation
8cc5d20
to
b5f00f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good I just had a couple comments.
│ ┌────────────┐ | ||
└───────────────►│clear buffer│ (whatever snapshotted is removed) | ||
└────────────┘ | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A useful addition to this diagram would be to show the entry point for writes from user, i.e., where do writes go from the user (wal buffer
?), via an arrow. Otherwise, it is not clear on the order of operations. If you could connect the numbers from the steps described below to locations / arrows on the diagram, that would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I'll try to link the steps to the diagram and add the incoming writes as well.
If going ahead with force snapshotting, pick all the wal periods in the tracker and find the max time from most recent wal period. This will be | ||
used as the `end_time_marker` to evict data from query buffer. Because forcing a snapshot can be triggered when wal buffer is empty (even though | ||
queryable buffer is full), we need to add `Noop` (a no-op WalOp) to the wal file to hold the snapshot details in wal file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a header on this diagram, e.g.,
##### Forced snapshot
1. When _writes_ comes in, they go into a write batch in wal buffer. These batches are held per database and the batches keep track of min | ||
and max times within each batch. These batches further hold per table chunks. This chunk is created by taking incoming rows and pinning | ||
them to a period. It is done by `t - (t % gen_1_duration)`. If `gen_1_duration` is 10 mins, then all of the rows will be divided into | ||
10 min chunks. As an example if there are rows for 10.29 and 10.35 then they both go into 2 separate chunks (10.20 and 10.30). And this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 10.29 and what is 10.35? Are those meant to be timestamps like for time 10:29 and time 10:35?
them to a period. It is done by `t - (t % gen_1_duration)`. If `gen_1_duration` is 10 mins, then all of the rows will be divided into | ||
10 min chunks. As an example if there are rows for 10.29 and 10.35 then they both go into 2 separate chunks (10.20 and 10.30). And this | ||
10.20 and 10.30 are used later as the key in queryable buffer. | ||
2. Every flush interval, the wal buffer is flushed and all batches are written to to wal file (converts to wal content and gets min/max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also mention that the write request that came in had a oneshot channel created that gets called back on after the flush and the placement of that data into the queryable buffer, with then returns a success to the client.
|
||
``` | ||
|
||
If it is a normal snapshot, then leave one wal period (`3` in eg below) and pick the last one (`2` in eg below) max time used as `end_time_marker` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly true. What the WAL periods are looking for is that data written into the oldest WAL files have time stamps that fall into chunks that are no longer receiving writes. So if we have the default 10m gen1 chunks and we're always writing data with a time of "now" then we would only snapshot after we have the 10m chunk time go cold. If we have lagged collection, by say 1m, we won't snapshot until after that 10m wall clock time has passed + 1m, but we would only snapshot the wal files from before that time. So we'd likely leave behind 60 wal files, which is by design.
``` | ||
|
||
│ | ||
├───10.20───────────►┌────────────────────────────┐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as above, is 10.20 meant to a timestamp like 10:20? If so it might make more sense to have 2025-01-24T10:20
so that it's more clear
|
||
│ | ||
├───10.20───────────►┌────────────────────────────┐ | ||
│ │ chunk 10.20 - 10.29 │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are times, it's better expressed with [10:20 - 10:30) which indicates that the 30 is exclusive
Moving some of my work notes into a doc (might be handy to understand the wal/snapshotting process)