Head tracking batch #417

Scooletz · 2024-10-23T16:00:05Z

This PR introduces a notion of a IMultiHeadChain. The multi head approach enables creating multiple heads of the blockchain that track the recent state properly. If no reorganization happens, the same IHead is used over and over for a prolonged period of time while ensuring that all the reads are served in a speed of reading the raw underlying database. This is done by applying COW not only on the underlying database level but also in the head with a help of a PageTable. With this we have:

Reusability of a batch instance
Removal of ancestors' notion
No filters and direct db queries
Easier move from block N to N+1

The reusability is addressed by reusing the head tracking branch through multiple blocks. Every time a block is committed, it's state is applied to internal state of HeadTrackingBatch in a specific manner, allowing tracking of blocks.

HeadTrackingBatch has no notion of ancestors as it keeps all the modified pages in a simple lookup, that allows to perceive a squashed view of the world.

No BitFilter is requires as data are always queried like there would be from the database. The only difference is that some pages are represented as modified in-memory copies. These copies eventually are applied to the disk, but for sake of lookups, are treated like there were applied already.

IHead

The idea is implemented by introduction of one more level of abstraction over a batch. Before this PR, only a single PageDb.Batch could exist. This PR introduces a new type of the batch, called IMultiHeadChain. From the chain you can get IHead, that accumulates changes in memory, meaning, that at a given batch number, multiple propositions can exist before they are committed/discarded. To make it work, the head introduces two dictionaries that allow resolving Page <-> DbAddress mapping in both sides. There are used to track pages that were already overwritten. It's possible that a given page will be overwritten multiple times though, so that the mappings keep only the last version. The mapping using dictionary seems to be costly (it was shown in the profiling). To address this, a dummy array-based lookup is provided in front of it.

ProposedBatch

A simple construct of a ProposedBatch is used to track:

a list of (DbAddress,Page) tuples
a RootPage
a position of the root page to apply to (calculated from the batchId in the root page)

These proposed blocks are created every time HeadTrackingBatch is committed, by selecting pages that were updated during the given batch. Once a finality is reached, proposed batches that have block numbers lower than the finalized one, are applied on the database. The application should be fast and almost instantaneous as it copies payloads to proper pages, updates the RootPage and decreases the counter on the ProposedBatch (counter based managed similar to blockchain).

Readers

Readers manage the read access. They are created whenever a block is committed/finalized and are just retrieved by the client code. No readers should be created in the hot paths!

Benchmarks

Some areas of interest

BeforeCommit that represents Merkleization
CommitImpl that applies the changes to the in-memory representation. Potentially can be offloaded
The actual Commit that registers the proposal

src/Paprika/Store/PagedDb.cs

Scooletz · 2024-10-28T11:00:14Z

@dipkakwani Maybe I should have been more explicit about it. This is a sketch of the idea. I wanted to gather some input while running the benchmarks with the current setup.

…bandoned

src/Paprika/Store/PagedDb.cs

src/Paprika/Store/PagedDb.MultiHeadChain.cs

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

…ions

github-actions · 2025-02-07T15:53:18Z

Package	Line Rate	Branch Rate	Health
Paprika	80%	76%	➖
Summary	80% (5001 / 6231)	76% (1698 / 2230)	➖

Minimum allowed line rate is 75%

sketch of the page table

b0bbeca

dipkakwani reviewed Oct 28, 2024

View reviewed changes

src/Paprika/Store/PagedDb.cs Outdated Show resolved Hide resolved

src/Paprika/Store/PagedDb.cs Outdated Show resolved Hide resolved

Scooletz mentioned this pull request Nov 6, 2024

BitFilter made faster #433

Merged

Scooletz added 4 commits November 12, 2024 12:42

renaming and simplifying HeadBatch

a69330a

renaming

0817c05

revere mapping

922b5a6

more extraction

d1c7675

Scooletz changed the title ~~Page table & linked batches.~~ Head tracking batch Nov 12, 2024

Scooletz added 21 commits November 13, 2024 12:15

Merge branch 'main' into page-table

b3c1469

moving more members, making base context less dependent

2bc911f

stats and build

6409faf

head tracking batch

c218587

separation of apis

96e7c33

towards implementation of the tracking

30edee7

refactored root finding

4dea197

towards testable MultiHeadChain

b199b61

reading from multihead and proper disposal

8e37291

green test with reads

3fb6d8a

GetAtWriting to ensure that MultiHead can track pages properly with a…

164a8ad

…bandoned

clearing

2797d33

multi block

d1d17f6

moving metadata to Commit of the head

6862627

ref counting proposed batches

4bc3872

fix doubled root creation

fee1ed1

new api to support not mmaped writes

86f6675

flusher in MultiHead

989e726

into the channel

83f5882

async disposable and finalization mechanism

33e088a

finality and block lock

fa2f90a

Scooletz added 8 commits November 29, 2024 18:14

throw on not found

365fb3f

MultiHead allows to create non-committable world state

494e783

pageTable cache in front of a dictionary

00397e5

atomic clean up

c9916cb

update cache only on mapping changes

9e48831

override removal made a bit faster

cb1990d

close prefetcher after commit

a790db3

Merge branch 'main' into page-table

bef584a

Scooletz force-pushed the page-table branch from f3f3f32 to bef584a Compare December 9, 2024 10:18

prefetching disposal got right plus test fixes

a686b6c

dipkakwani reviewed Dec 9, 2024

View reviewed changes

Update src/Paprika/Store/PagedDb.cs

06f783e

dipkakwani reviewed Dec 10, 2024

View reviewed changes

src/Paprika/Store/PagedDb.MultiHeadChain.cs Show resolved Hide resolved

Scooletz mentioned this pull request Dec 10, 2024

RlpMemo should work on compressed data #445

Closed

Merge branch 'main' into page-table

97da87c

github-advanced-security bot found potential problems Feb 5, 2025

View reviewed changes

Scooletz added 11 commits February 6, 2025 08:54

automatic finality fix

70e6cf7

format

5085efa

clear id cache

ddcabe0

formatting

9529b56

reopening prefetcher

dd6a5af

create page table properly sized

e896560

comments

5a914dd

more comments

74a7cc2

more comments

714d139

reader caches dictionary statically to do not create a lot of allocat…

3c8a022

…ions

last reader reporter

2bf3a1a

Scooletz force-pushed the page-table branch from 9462c73 to 2bf3a1a Compare February 7, 2025 15:02

use capacity

2a7edd1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Head tracking batch #417

Head tracking batch #417

Scooletz commented Oct 23, 2024 •

edited

Loading

Scooletz commented Oct 28, 2024 •

edited

Loading

github-advanced-security bot left a comment

github-actions bot commented Feb 7, 2025

Head tracking batch #417

Are you sure you want to change the base?

Head tracking batch #417

Conversation

Scooletz commented Oct 23, 2024 • edited Loading

IHead

ProposedBatch

Readers

Benchmarks

Scooletz commented Oct 28, 2024 • edited Loading

github-advanced-security bot left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 7, 2025

Scooletz commented Oct 23, 2024 •

edited

Loading

Scooletz commented Oct 28, 2024 •

edited

Loading