-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CEP-37 on Trunk #3598
base: trunk
Are you sure you want to change the base?
CEP-37 on Trunk #3598
Conversation
ee6b375
to
2f42d8c
Compare
75b0bad
to
a145408
Compare
db7d73b
to
d3589c2
Compare
@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3 as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.) |
Hi, I think this looks promising! I have a few points:
|
Thanks, @masokol, for the review! Please find my response here:
This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this
There is already a new nodetool command that would print the current status. Please take a look at it here.
Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above. |
d3589c2
to
3610701
Compare
5730aa2
to
126684a
Compare
ec99162
to
f9f6971
Compare
…s for auto-repair
…yspace is provided in args
Summary of impacting changes: * RepairTokenRangeSplitter is now the default in favor of FixedSplitTokenRangeSplitter. * number_of_subranges moved from a repair_type_override/global config to a property of FixedSplitTokenRangeSplitter. * [get|set]autorepairconfig usability changes Detailed breakdown of changes: 0. IAutoRepairTokenRangeSplitter changes: * Move RepairType from an parameter in getRepairAssignments to a parameter in the constructor for implementations. This was done because the RepairType is always the same for a splitter instance. * Add setParameter and getParameter methods which are used by setautorepairconfig and getautorepairconfig to dynamically update splitter configuration. 1. [get|set]autorepairconfig changes: * getautorepairconfig output now shows property names instead of human readable names (e.g. repair_check_interval instead of repair eligibility check interval). This was done to make it more intuitive to know what properties to use for setautorepairconfig. * getautorepairconfig and setautorepairconfig now support viewing and changing splitter properties, e.g.: setautorepairconfig token_range_splitter.max_bytes_per_schedule 500GiB -t full 2. RepairTokenRangeSplitter changes: * Renames RepairRangeSplitter to RepairTokenRangeSplitter and makes it the default implementation. * Establishes defaults for each repair type to be sensible. * Improve javadocs detailing the primary goal of the splitter, its configuration and its defaults and the justifications for using them. * Rename variables to be consistent with their setting names. 3. FixedSplitTokenRangeSplitter changes: * Renames DefaultAutoRepairTokenSplitter to FixedSplitTokenRangeSplitter as it is no longer the default. * Move number_of_subranges from a global config to a property for this splitter. 4. RepairAssignmentIterator * Refactored common code from both splitter implementations into RepairAssignmentIterator with the aim to reduce the amount of boiler plate custom splitter implementations need to implement. 5. Test changes * Fix AutoRepairParameterizedTest to use fixed splitter so we get a deterministic repair plan. * Allow splitter to be changed programmatically, only expect it to be used for tests. * Rename CassandraSreamReceiverTest and fix it Whether streaming cdc/mvs into commitlogs was previously dependent on system properties; update the test to account for the new yaml properties. * Fix dtest after CASSANDRA-20160 The introduction of repair_task_min_duration causes repairs to take cumulatively longer for a node than 2 minutes. To resolve this, set that to 0s, and also enable repair_by_keyspace and set subranges to 1 to reduce the overall number of repairs. Patch by Andy Tolbert; reviewed by ___ for CASSANDRA-20179
* Move curly brackets to new line * Remove unused config declaration in constructor
Also remove redundant AutoRepairConfig.RepairType for RepairType in AutoRepairConfig.
* Promote node not being present in gossip to warn * Clean up NUMBER_OF_SUBRANGES doc * Simplify default map parameter parsing in both splitters * Doc cleanup and make RepairAssignmentIterator fully public * Remove unnecessary getters and setters in Splitters * Handle case where ColumnFamilyStore not retrievable, in this case return no assignments as we can assume deleted. * Always return a RepairAssignment for a table, even if empty, in RepairTokenRangeSplitter as node may have missed writes * Add set|getParameters tests * Update AutoRepairParameterizedTest to use RepairTokenRangeSplitter * Move no-split specific test to FixedSplitTokenRangeSplitterTest
Updates partitions_per_assignment to not be based on repair_session_max_tree_depth which is deprecated. Instead, just use 2^20. Also updates documentation around partitions_per_assignment and cleans up some warnings in RepairTokeRangeSplitter. patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20231
Creating repair sessions by table can create an overwhelming amount of repairs, especially with vnodes. If a repair assignment is too big (> 64 tables by default, or > 200GB for full/50GB for incremental) RepairTokenRangeSplitter will already split into multiple repair assignments. Adjusts repair_by_keyspace to default to true. patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20232
Patch by Francisco Guerrero; reviewed by TBD for CASSANDRA-20185
…ation in the AutoRepairServiceMBean definition
95d0a5e
to
e093b34
Compare
CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0
This PR adds two new table properties hence, we need to update the dtests also along with this PR. apache/cassandra-dtest#270