Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore: Record decoded memory of each Region #9780

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Jan 10, 2025

What problem does this PR solve?

Issue Number: ref #9722

Problem Summary:

This is part 1 of the change to record table-wise memory usage in KVStore.

Record payload and decoded memory in RegionData:

  • payload memory is the data of the 3 cfs that flows from proxy into TiFlash.
  • decoded memory is mainly the decoded lock value cached to reduce the cost of parsing lock when reading.

We now record the decoded memory of large txns because it could consume significant amount of memory if there are lots of locks in the lock cf.

We also simplify the codes in RegionData.

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: CalvinNeo <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 10, 2025
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 13, 2025
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 14, 2025
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo CalvinNeo changed the title KVStore: Record table-wise memory KVStore: Record table-wise memory part 1 Jan 17, 2025
@CalvinNeo CalvinNeo force-pushed the record-table-memory-usage branch from c42c6b1 to 92abc36 Compare January 17, 2025 10:07
@CalvinNeo CalvinNeo changed the title KVStore: Record table-wise memory part 1 KVStore: Record decoded memory of each Region Jan 20, 2025
@CalvinNeo
Copy link
Member Author

/retest

Copy link
Contributor

ti-chi-bot bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 24, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 24, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-24 06:11:11.155925855 +0000 UTC m=+420398.486845259: ☑️ agreed by Lloyd-Pottiger.

@ti-chi-bot ti-chi-bot bot added the approved label Jan 24, 2025
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is kind of confusing because RegionData::reportAlloc/RegionData::reportDealloc and RegionData::recordMemChange implement similar functionalities. It is difficult to determine when to call which function.

dbms/src/Storages/KVStore/MultiRaft/RegionData.cpp Outdated Show resolved Hide resolved
Comment on lines 53 to 65
void RegionData::recordMemChange(const RegionDataMemDiff & delta)
{
cf_data_size += delta.payload;
decoded_data_size += delta.decoded;
if (delta.payload > 0)
{
root_of_kvstore_mem_trackers->alloc(delta.payload, false);
}
else if (delta.payload < 0)
{
root_of_kvstore_mem_trackers->free(-delta.payload);
}
}
Copy link
Contributor

@JaySon-Huang JaySon-Huang Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoded_data_size also reflect the occupied memory, why not add it to root_of_kvstore_mem_trackers? Or do you want to do it in next PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is actually a dilemma here.

Since if I add it to root_of_kvstore_mem_trackers, it breaks the behavior. So when we want to check the metric on grafana board, we have to firstly make sure what this metric actually represents according to the TiFlash version. We could do this, but it takes some modifications and I want to leave them in some later PR.

However, I think we can savage out previous misunderstanding in a more elegant way, at least I think it so. The idea is that it is hard to get how the memory usage of payload, because it is a nested structure. However, it is quite easy to get the memory usage of the decoded lock, which has fixed size. So in

GET_METRIC(tiflash_raft_classes_count, type_fully_decoded_lockcf).Increment(1);
, I have recorded the count of DecodedLockCFValue::Inner, by which we could easily compute the cached size.

Thus, we can introduce another series in the "KVStore memory" panel. And it is quite easy to distinguish it from the payload series.

@JaySon-Huang
Copy link
Contributor

... and some functions in RegionCFDataBase is not necessary. We can merge them and reduce the code complexity. Checkout these changes pls https://github.com/CalvinNeo/tiflash/pull/13/files

* Remove unnecessary functions

* Update dbms/src/Storages/KVStore/MultiRaft/RegionCFDataBase.cpp

* Update dbms/src/Storages/KVStore/MultiRaft/RegionData.h

---------

Co-authored-by: Calvin Neo <[email protected]>
@CalvinNeo
Copy link
Member Author

... and some functions in RegionCFDataBase is not necessary. We can merge them and reduce the code complexity. Checkout these changes pls https://github.com/CalvinNeo/tiflash/pull/13/files

Yes, you are right, the PR makes the code even clearer.

Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Jan 24, 2025

@CalvinNeo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test 1580de6 link true /test pull-unit-test
pull-integration-test 1580de6 link true /test pull-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +126 to +128
cf_data_size += delta.payload;
decoded_data_size += delta.decoded;
recordMemChange(delta);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cf_data_size += delta.payload;
decoded_data_size += delta.decoded;
recordMemChange(delta);
recordMemChange(delta);

cf_data_size and decoded_data_size has been changed in recordMemChange(delta)

write_cf = std::move(new_region_data.write_cf);
lock_cf = std::move(new_region_data.lock_cf);
orphan_keys_info = std::move(new_region_data.orphan_keys_info);
recordMemChange(cf_data_size.negative());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
recordMemChange(cf_data_size.negative());
recordMemChange(RegionDataMemDiff{-cf_data_size, -decoded_data_size});

Comment on lines +309 to +311
region_data.cf_data_size = size_changed.payload;
region_data.decoded_data_size = size_changed.decoded;
recordMemChange(size_changed.payload);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
region_data.cf_data_size = size_changed.payload;
region_data.decoded_data_size = size_changed.decoded;
recordMemChange(size_changed.payload);
region_data.recordMemChange(size_changed);

region_data.cf_data_size and region_data.decoded_data_size will be changed in region_data.recordMemChange

Comment on lines +355 to +357
recordMemChange(cf_data_size.negative());
cf_data_size = 0;
decoded_data_size = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
recordMemChange(cf_data_size.negative());
cf_data_size = 0;
decoded_data_size = 0;
recordMemChange(RegionDataMemDiff{-cf_data_size, -decoded_data_size});

Comment on lines +370 to +395
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();)
{
const auto & decoded_val = std::get<2>(write_map_it->second);
const auto & [pk, ts] = write_map_it->first;

if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag)
{
if (!decoded_val.short_value)
{
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end())
{
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely.
if (ts < safe_point)
{
del_write += 1;
cf_data_size -= RegionWriteCFData::calcTotalKVSize(write_map_it->second).payload;
write_map_it = write_map.erase(write_map_it);
continue;
}
}
}
}
++write_map_it;
}
// No need to check default cf. Because tikv will gc default cf before write cf.
return del_write;
Copy link
Contributor

@JaySon-Huang JaySon-Huang Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the free cf_data_size in this function does not change root_of_kvstore_mem_trackers. We should handle it as following

Suggested change
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();)
{
const auto & decoded_val = std::get<2>(write_map_it->second);
const auto & [pk, ts] = write_map_it->first;
if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag)
{
if (!decoded_val.short_value)
{
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end())
{
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely.
if (ts < safe_point)
{
del_write += 1;
cf_data_size -= RegionWriteCFData::calcTotalKVSize(write_map_it->second).payload;
write_map_it = write_map.erase(write_map_it);
continue;
}
}
}
}
++write_map_it;
}
// No need to check default cf. Because tikv will gc default cf before write cf.
return del_write;
RegionDataMemDiff delta;
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();)
{
const auto & decoded_val = std::get<2>(write_map_it->second);
const auto & [pk, ts] = write_map_it->first;
if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag)
{
if (!decoded_val.short_value)
{
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end())
{
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely.
if (ts < safe_point)
{
del_write += 1;
delta.sub(RegionWriteCFData::calcTotalKVSize(write_map_it->second));
write_map_it = write_map.erase(write_map_it);
continue;
}
}
}
}
++write_map_it;
}
// No need to check default cf. Because tikv will gc default cf before write cf.
recordMemChange(delta);
return del_write;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants