Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEP-37 on Trunk #3598

Open
wants to merge 132 commits into
base: trunk
Choose a base branch
from
Open

Conversation

jaydeepkumar1984
Copy link
Contributor

@jaydeepkumar1984 jaydeepkumar1984 commented Oct 3, 2024

@jaydeepkumar1984 jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from ee6b375 to 2f42d8c Compare October 4, 2024 19:06
@jaydeepkumar1984 jaydeepkumar1984 changed the title [Draft] CEP-37 on Trunk CEP-37 on Trunk Oct 8, 2024
@michaelsembwever
Copy link
Member

michaelsembwever commented Oct 19, 2024

@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3

as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.)

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution

Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0

@masokol
Copy link
Contributor

masokol commented Oct 21, 2024

Hi,

I think this looks promising! I have a few points:

  • Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
  • Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.
  • Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.
  • Repair priority per table, not per node.

@jaydeepkumar1984
Copy link
Contributor Author

jaydeepkumar1984 commented Oct 21, 2024

Thanks, @masokol, for the review! Please find my response here:

  • Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
  1. Generally, empty tables or tables with a small amount of data runs through pretty fast, in seconds, so it is not a major issue for smaller/empty tables.
  2. The current framework already has support to combine ranges through a setting (src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:266)
  • Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.

This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this

  • Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.

There is already a new nodetool command that would print the current status. Please take a look at it here.

  • Repair priority per table, not per node.

Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above.

@jaydeepkumar1984 jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from 5730aa2 to 126684a Compare November 6, 2024 01:47
@jaydeepkumar1984 jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from ec99162 to f9f6971 Compare November 21, 2024 05:20
jaydeep1984 and others added 29 commits January 28, 2025 17:29
Summary of impacting changes:

* RepairTokenRangeSplitter is now the default in favor of
  FixedSplitTokenRangeSplitter.

* number_of_subranges moved from a repair_type_override/global config
  to a property of FixedSplitTokenRangeSplitter.

* [get|set]autorepairconfig usability changes

Detailed breakdown of changes:

0. IAutoRepairTokenRangeSplitter changes:

  * Move RepairType from an parameter in getRepairAssignments to
    a parameter in the constructor for implementations.  This was done
    because the RepairType is always the same for a splitter instance.

  * Add setParameter and getParameter methods which are used by
    setautorepairconfig and getautorepairconfig to dynamically update
    splitter configuration.

1. [get|set]autorepairconfig changes:

  * getautorepairconfig output now shows property names instead of
    human readable names (e.g. repair_check_interval instead of
    repair eligibility check interval). This was done to make it more
    intuitive to know what properties to use for setautorepairconfig.

  * getautorepairconfig and setautorepairconfig now support viewing
    and changing splitter properties, e.g.:

    setautorepairconfig token_range_splitter.max_bytes_per_schedule 500GiB -t full

2. RepairTokenRangeSplitter changes:

  * Renames RepairRangeSplitter to RepairTokenRangeSplitter and makes it
    the default implementation.

  * Establishes defaults for each repair type to be sensible.

  * Improve javadocs detailing the primary goal of the splitter, its
    configuration and its defaults and the justifications for using them.

  * Rename variables to be consistent with their setting names.

3. FixedSplitTokenRangeSplitter changes:

  * Renames DefaultAutoRepairTokenSplitter to
    FixedSplitTokenRangeSplitter as it is no longer the default.

  * Move number_of_subranges from a global config to a property for
    this splitter.

4. RepairAssignmentIterator

  * Refactored common code from both splitter implementations into
    RepairAssignmentIterator with the aim to reduce the amount of
    boiler plate custom splitter implementations need to implement.

5. Test changes

  * Fix AutoRepairParameterizedTest to use fixed splitter so we get a
    deterministic repair plan.

  * Allow splitter to be changed programmatically, only expect it to be
    used for tests.

  * Rename CassandraSreamReceiverTest and fix it

    Whether streaming cdc/mvs into commitlogs was previously dependent
    on system properties;  update the test to account for the new
    yaml properties.

  * Fix dtest after CASSANDRA-20160

    The introduction of repair_task_min_duration causes repairs to take
    cumulatively longer for a node than 2 minutes.  To resolve this,
    set that to 0s, and also enable repair_by_keyspace and set subranges
    to 1 to reduce the overall number of repairs.

Patch by Andy Tolbert; reviewed by ___ for CASSANDRA-20179
* Move curly brackets to new line
* Remove unused config declaration in constructor
Also remove redundant AutoRepairConfig.RepairType for RepairType
in AutoRepairConfig.
* Promote node not being present in gossip to warn
* Clean up NUMBER_OF_SUBRANGES doc
* Simplify default map parameter parsing in both splitters
* Doc cleanup and make RepairAssignmentIterator fully public
* Remove unnecessary getters and setters in Splitters
* Handle case where ColumnFamilyStore not retrievable, in this case
  return no assignments as we can assume deleted.
* Always return a RepairAssignment for a table, even if empty, in
  RepairTokenRangeSplitter as node may have missed writes
* Add set|getParameters tests
* Update AutoRepairParameterizedTest to use RepairTokenRangeSplitter
* Move no-split specific test to FixedSplitTokenRangeSplitterTest
Updates partitions_per_assignment to not be based on
repair_session_max_tree_depth which is deprecated.  Instead, just
use 2^20.

Also updates documentation around partitions_per_assignment and cleans
up some warnings in RepairTokeRangeSplitter.

patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20231
Creating repair sessions by table can create an overwhelming amount of
repairs, especially with vnodes.

If a repair assignment is too big (> 64 tables by default, or > 200GB
for full/50GB for incremental) RepairTokenRangeSplitter will already
split into multiple repair assignments.

Adjusts repair_by_keyspace to default to true.

patch by Andy Tolbert; reviewed by Jaydeep Chovatia for CASSANDRA-20232
Patch by Francisco Guerrero; reviewed by TBD for CASSANDRA-20185
…ation in the AutoRepairServiceMBean definition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants