CEP-37 on Trunk #3598

jaydeepkumar1984 · 2024-10-03T20:41:44Z

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution

Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0

This PR adds two new table properties hence, we need to update the dtests also along with this PR. apache/cassandra-dtest#270

michaelsembwever · 2024-10-19T15:23:30Z

@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3

as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.)

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution

Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0

masokol · 2024-10-21T08:12:12Z

Hi,

I think this looks promising! I have a few points:

Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.
Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.
Repair priority per table, not per node.

jaydeepkumar1984 · 2024-10-21T16:26:42Z

Thanks, @masokol, for the review! Please find my response here:

Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.

Generally, empty tables or tables with a small amount of data runs through pretty fast, in seconds, so it is not a major issue for smaller/empty tables.
The current framework already has support to combine ranges through a setting (src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:266)

Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.

This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this

Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.

There is already a new nodetool command that would print the current status. Please take a look at it here.

Repair priority per table, not per node.

Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above.

…g repaired

…s for auto-repair

…yspace is provided in args

Summary of impacting changes: * RepairTokenRangeSplitter is now the default in favor of FixedSplitTokenRangeSplitter. * number_of_subranges moved from a repair_type_override/global config to a property of FixedSplitTokenRangeSplitter. * [get|set]autorepairconfig usability changes Detailed breakdown of changes: 0. IAutoRepairTokenRangeSplitter changes: * Move RepairType from an parameter in getRepairAssignments to a parameter in the constructor for implementations. This was done because the RepairType is always the same for a splitter instance. * Add setParameter and getParameter methods which are used by setautorepairconfig and getautorepairconfig to dynamically update splitter configuration. 1. [get|set]autorepairconfig changes: * getautorepairconfig output now shows property names instead of human readable names (e.g. repair_check_interval instead of repair eligibility check interval). This was done to make it more intuitive to know what properties to use for setautorepairconfig. * getautorepairconfig and setautorepairconfig now support viewing and changing splitter properties, e.g.: setautorepairconfig token_range_splitter.max_bytes_per_schedule 500GiB -t full 2. RepairTokenRangeSplitter changes: * Renames RepairRangeSplitter to RepairTokenRangeSplitter and makes it the default implementation. * Establishes defaults for each repair type to be sensible. * Improve javadocs detailing the primary goal of the splitter, its configuration and its defaults and the justifications for using them. * Rename variables to be consistent with their setting names. 3. FixedSplitTokenRangeSplitter changes: * Renames DefaultAutoRepairTokenSplitter to FixedSplitTokenRangeSplitter as it is no longer the default. * Move number_of_subranges from a global config to a property for this splitter. 4. RepairAssignmentIterator * Refactored common code from both splitter implementations into RepairAssignmentIterator with the aim to reduce the amount of boiler plate custom splitter implementations need to implement. 5. Test changes * Fix AutoRepairParameterizedTest to use fixed splitter so we get a deterministic repair plan. * Allow splitter to be changed programmatically, only expect it to be used for tests. * Rename CassandraSreamReceiverTest and fix it Whether streaming cdc/mvs into commitlogs was previously dependent on system properties; update the test to account for the new yaml properties. * Fix dtest after CASSANDRA-20160 The introduction of repair_task_min_duration causes repairs to take cumulatively longer for a node than 2 minutes. To resolve this, set that to 0s, and also enable repair_by_keyspace and set subranges to 1 to reduce the overall number of repairs. Patch by Andy Tolbert; reviewed by ___ for CASSANDRA-20179

* Move curly brackets to new line * Remove unused config declaration in constructor

Also remove redundant AutoRepairConfig.RepairType for RepairType in AutoRepairConfig.

…mind

* Promote node not being present in gossip to warn * Clean up NUMBER_OF_SUBRANGES doc * Simplify default map parameter parsing in both splitters * Doc cleanup and make RepairAssignmentIterator fully public * Remove unnecessary getters and setters in Splitters * Handle case where ColumnFamilyStore not retrievable, in this case return no assignments as we can assume deleted. * Always return a RepairAssignment for a table, even if empty, in RepairTokenRangeSplitter as node may have missed writes * Add set|getParameters tests * Update AutoRepairParameterizedTest to use RepairTokenRangeSplitter * Move no-split specific test to FixedSplitTokenRangeSplitterTest

Updates partitions_per_assignment to not be based on repair_session_max_tree_depth which is deprecated. Instead, just use 2^20. Also updates documentation around partitions_per_assignment and cleans up some warnings in RepairTokeRangeSplitter. patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20231

Creating repair sessions by table can create an overwhelming amount of repairs, especially with vnodes. If a repair assignment is too big (> 64 tables by default, or > 200GB for full/50GB for incremental) RepairTokenRangeSplitter will already split into multiple repair assignments. Adjusts repair_by_keyspace to default to true. patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20232

Patch by Francisco Guerrero; reviewed by TBD for CASSANDRA-20185

…ation in the AutoRepairServiceMBean definition

jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from ee6b375 to 2f42d8c Compare October 4, 2024 19:06

jaydeepkumar1984 changed the title ~~[Draft] CEP-37 on Trunk~~ CEP-37 on Trunk Oct 8, 2024

jaydeepkumar1984 force-pushed the trunk_cep_37 branch from 75b0bad to a145408 Compare October 9, 2024 03:39

jaydeepkumar1984 force-pushed the trunk_cep_37 branch from db7d73b to d3589c2 Compare October 17, 2024 23:50

jaydeepkumar1984 force-pushed the trunk_cep_37 branch from d3589c2 to 3610701 Compare October 22, 2024 20:31

jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from 5730aa2 to 126684a Compare November 6, 2024 01:47

jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from ec99162 to f9f6971 Compare November 21, 2024 05:20

jaydeepkumar1984 and others added 17 commits January 28, 2025 17:20

autorepair v2 framework

f541bef

cr

a252da5

chop the repair API

cbc019a

Add a test case that ensures LocalStrategy.class is ignored from bein…

2499889

…g repaired

Shuffle keyspaces and tables when running auto-repair

aa414fc

Prevent node restart from scheduling a premature auto-repair cycle

617a58a

Implement SucceededTokenRangesCount and FailedTokenRangesCount metric…

8422c43

…s for auto-repair

Add intial_scheduler_delay option to AutoRepairConfig

7a5c574

Implement nodetool sstablerepairedset

27176d4

Change SkippedTablesCount metric to SkippedTokenRangesCount

b9ac6f7

config description

f552280

Implement repair retries for auto-repair scheduler

69bfd0c

Extend nodetool sstablerepairedset to affect all keyspaces when no ke…

3f4d811

…yspace is provided in args

Make repair session timeout configurable

40e1a02

Comments from Joshua M

0d49a86

AutoRepair scheduler InJvm dtest

4a5bf52

Increase CircleCI parallelism 1-->4

281d969

jaydeep1984 and others added 29 commits January 28, 2025 17:29

Add a global type=ColumnFamily

b469bd4

Remove an extra empty line

f4abcfb

Revert the cassandra-4.x.xml changes

b7fc3a8

gauge compatible meter

02fb454

Implement minimum repair task duration setting for auto-repair scheduler

941ac82

Cleanup

750821d

Cleanup

07a2389

Address comment

291f98c

Remove unused imports

bb2f7b0

formatting

f632757

JDK 8 support and test rename

e43af29

Fix newAutoRepairTokenRangeSplitter duplication

4c33d4b

Nit-fixing in existing code

95ce296

* Move curly brackets to new line * Remove unused config declaration in constructor

Process feedback

65a7f16

Also remove redundant AutoRepairConfig.RepairType for RepairType in AutoRepairConfig.

Fix the the total expected sub-ranges calculation keeping v-nodes in …

b28131a

…mind

Remove unused import in RepairTokenRangeSplitterTest.java

1063c60

Add configuration to cassandra.yaml and cassandra_later.yaml

396bf07

Add a doc for AutoRepair feature

551b263

cr from Andy

e2e645b

address cr from Andy

a71d6ec

Fix inconsistent spacing in javadoc

5d1bb4b

AutoRepairServiceMBean should not use custom types for JMX methods

be6b863

Patch by Francisco Guerrero; reviewed by TBD for CASSANDRA-20185

Address comments

494bc26

Fix tests and rename autoRepairConfiguration -> getAutoRepairConfigur…

86c76a0

…ation in the AutoRepairServiceMBean definition

Adjust based on the latest trunk

e093b34

jaydeepkumar1984 force-pushed the trunk_cep_37 branch from 95d0a5e to e093b34 Compare January 29, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEP-37 on Trunk #3598

CEP-37 on Trunk #3598

jaydeepkumar1984 commented Oct 3, 2024 •

edited

Loading

michaelsembwever commented Oct 19, 2024 •

edited

Loading

masokol commented Oct 21, 2024

jaydeepkumar1984 commented Oct 21, 2024 •

edited

Loading

CEP-37 on Trunk #3598

Are you sure you want to change the base?

CEP-37 on Trunk #3598

Conversation

jaydeepkumar1984 commented Oct 3, 2024 • edited Loading

michaelsembwever commented Oct 19, 2024 • edited Loading

masokol commented Oct 21, 2024

jaydeepkumar1984 commented Oct 21, 2024 • edited Loading

jaydeepkumar1984 commented Oct 3, 2024 •

edited

Loading

michaelsembwever commented Oct 19, 2024 •

edited

Loading

jaydeepkumar1984 commented Oct 21, 2024 •

edited

Loading