-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KVStore: Record decoded memory of each Region #9780
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: CalvinNeo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
c42c6b1
to
92abc36
Compare
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
… into record-table-memory-usage
Co-authored-by: JaySon <[email protected]>
Co-authored-by: JaySon <[email protected]>
Co-authored-by: JaySon <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
… into record-table-memory-usage
Signed-off-by: Calvin Neo <[email protected]>
/retest |
Signed-off-by: Calvin Neo <[email protected]>
… into record-table-memory-usage
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Lloyd-Pottiger The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kind of confusing because RegionData::reportAlloc
/RegionData::reportDealloc
and RegionData::recordMemChange
implement similar functionalities. It is difficult to determine when to call which function.
void RegionData::recordMemChange(const RegionDataMemDiff & delta) | ||
{ | ||
cf_data_size += delta.payload; | ||
decoded_data_size += delta.decoded; | ||
if (delta.payload > 0) | ||
{ | ||
root_of_kvstore_mem_trackers->alloc(delta.payload, false); | ||
} | ||
else if (delta.payload < 0) | ||
{ | ||
root_of_kvstore_mem_trackers->free(-delta.payload); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoded_data_size
also reflect the occupied memory, why not add it to root_of_kvstore_mem_trackers
? Or do you want to do it in next PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is actually a dilemma here.
Since if I add it to root_of_kvstore_mem_trackers
, it breaks the behavior. So when we want to check the metric on grafana board, we have to firstly make sure what this metric actually represents according to the TiFlash version. We could do this, but it takes some modifications and I want to leave them in some later PR.
However, I think we can savage out previous misunderstanding in a more elegant way, at least I think it so. The idea is that it is hard to get how the memory usage of payload, because it is a nested structure. However, it is quite easy to get the memory usage of the decoded lock, which has fixed size. So in
GET_METRIC(tiflash_raft_classes_count, type_fully_decoded_lockcf).Increment(1); |
DecodedLockCFValue::Inner
, by which we could easily compute the cached size.
Thus, we can introduce another series in the "KVStore memory" panel. And it is quite easy to distinguish it from the payload series.
... and some functions in |
* Remove unnecessary functions * Update dbms/src/Storages/KVStore/MultiRaft/RegionCFDataBase.cpp * Update dbms/src/Storages/KVStore/MultiRaft/RegionData.h --------- Co-authored-by: Calvin Neo <[email protected]>
Yes, you are right, the PR makes the code even clearer. |
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
cf_data_size += delta.payload; | ||
decoded_data_size += delta.decoded; | ||
recordMemChange(delta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cf_data_size += delta.payload; | |
decoded_data_size += delta.decoded; | |
recordMemChange(delta); | |
recordMemChange(delta); |
cf_data_size and decoded_data_size has been changed in recordMemChange(delta)
write_cf = std::move(new_region_data.write_cf); | ||
lock_cf = std::move(new_region_data.lock_cf); | ||
orphan_keys_info = std::move(new_region_data.orphan_keys_info); | ||
recordMemChange(cf_data_size.negative()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recordMemChange(cf_data_size.negative()); | |
recordMemChange(RegionDataMemDiff{-cf_data_size, -decoded_data_size}); |
region_data.cf_data_size = size_changed.payload; | ||
region_data.decoded_data_size = size_changed.decoded; | ||
recordMemChange(size_changed.payload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
region_data.cf_data_size = size_changed.payload; | |
region_data.decoded_data_size = size_changed.decoded; | |
recordMemChange(size_changed.payload); | |
region_data.recordMemChange(size_changed); |
region_data.cf_data_size
and region_data.decoded_data_size
will be changed in region_data.recordMemChange
recordMemChange(cf_data_size.negative()); | ||
cf_data_size = 0; | ||
decoded_data_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recordMemChange(cf_data_size.negative()); | |
cf_data_size = 0; | |
decoded_data_size = 0; | |
recordMemChange(RegionDataMemDiff{-cf_data_size, -decoded_data_size}); |
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();) | ||
{ | ||
const auto & decoded_val = std::get<2>(write_map_it->second); | ||
const auto & [pk, ts] = write_map_it->first; | ||
|
||
if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag) | ||
{ | ||
if (!decoded_val.short_value) | ||
{ | ||
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end()) | ||
{ | ||
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely. | ||
if (ts < safe_point) | ||
{ | ||
del_write += 1; | ||
cf_data_size -= RegionWriteCFData::calcTotalKVSize(write_map_it->second).payload; | ||
write_map_it = write_map.erase(write_map_it); | ||
continue; | ||
} | ||
} | ||
} | ||
} | ||
++write_map_it; | ||
} | ||
// No need to check default cf. Because tikv will gc default cf before write cf. | ||
return del_write; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the free cf_data_size in this function does not change root_of_kvstore_mem_trackers
. We should handle it as following
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();) | |
{ | |
const auto & decoded_val = std::get<2>(write_map_it->second); | |
const auto & [pk, ts] = write_map_it->first; | |
if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag) | |
{ | |
if (!decoded_val.short_value) | |
{ | |
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end()) | |
{ | |
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely. | |
if (ts < safe_point) | |
{ | |
del_write += 1; | |
cf_data_size -= RegionWriteCFData::calcTotalKVSize(write_map_it->second).payload; | |
write_map_it = write_map.erase(write_map_it); | |
continue; | |
} | |
} | |
} | |
} | |
++write_map_it; | |
} | |
// No need to check default cf. Because tikv will gc default cf before write cf. | |
return del_write; | |
RegionDataMemDiff delta; | |
for (auto write_map_it = write_map.begin(); write_map_it != write_map.end();) | |
{ | |
const auto & decoded_val = std::get<2>(write_map_it->second); | |
const auto & [pk, ts] = write_map_it->first; | |
if (decoded_val.write_type == RecordKVFormat::CFModifyFlag::PutFlag) | |
{ | |
if (!decoded_val.short_value) | |
{ | |
if (auto data_it = default_map.find({pk, decoded_val.prewrite_ts}); data_it == default_map.end()) | |
{ | |
// if key-val in write cf can not find matched data in default cf and its commit-ts < gc-safe-point, we can clean it safely. | |
if (ts < safe_point) | |
{ | |
del_write += 1; | |
delta.sub(RegionWriteCFData::calcTotalKVSize(write_map_it->second)); | |
write_map_it = write_map.erase(write_map_it); | |
continue; | |
} | |
} | |
} | |
} | |
++write_map_it; | |
} | |
// No need to check default cf. Because tikv will gc default cf before write cf. | |
recordMemChange(delta); | |
return del_write; |
What problem does this PR solve?
Issue Number: ref #9722
Problem Summary:
This is part 1 of the change to record table-wise memory usage in KVStore.
Record payload and decoded memory in RegionData:
We now record the decoded memory of large txns because it could consume significant amount of memory if there are lots of locks in the lock cf.
We also simplify the codes in RegionData.
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note