-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a gauge for the effective machine version in ra_counters #426
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
the-mikedavis
force-pushed
the
effective-machine-version-gauge
branch
from
March 12, 2024 16:24
6c1cfee
to
f0d0ef6
Compare
kjnilsson
requested changes
Mar 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment to look into. otherwise fine!
This allows callers to cheaply query a server's current effective machine version.
the-mikedavis
force-pushed
the
effective-machine-version-gauge
branch
from
March 28, 2024 13:35
f0d0ef6
to
a66eac3
Compare
kjnilsson
approved these changes
Mar 28, 2024
dumbbell
added a commit
to rabbitmq/khepri
that referenced
this pull request
Mar 28, 2024
[Why] The "client" side of `khepri_machine` implemented in `process_sync_command/3` have a retry mechanism if `ra:process_command/3` returns an error such as `noproc`, `nodedown` or `shutdown`. However, this retry mechanism can't tell if the state machine already received the command and just couldn't reply, for instance because there is a node stopping or a change of leadership. Therefore, it's possible that the same command is submitted twice and thus processed twice. That's ok for idempotent commands, but it may not be alright for all transactions for example. That's why we need a deduplication mechanism that ensures the same command is not applied multiple times. [How] Two new commands are introduced to implement the deduplication system: * #dedup{} which is used to wrap the command to protect and assign a unique reference to it * #dedup_ack{} which is used at the end of the retry loop to let the state machine know that the "client" side received the reply When the state machine receives a command wrapped into a #dedup{} command, it will remember the reply for the initial processing of that command. For any subsequent copies of the same #dedup{} (based on the unique reference), the state machine will not apply the wrapped command and will simply returned the reply it remembered from the first application. Later when the state machine receives a #dedup_ack{}, it will drop the cached reply for that reference. Just in case the client never sends a #dedup_ack{}, the state machine will drop any expired cached entries. The expiration time is based on the command timeout. If it's infinity, it defaults to 15 minutes. This whole deduplication mechanism can be enabled or disabled through the new `protect_against_dups` command option which takes a boolean. This option is off by default, except for R/W transactions. Thus if the caller knows the transation is idempotent, it can decide to turn the dedup mechanism off. Because the state machine's state grows with a new field and handles two new commandes, we bump the machine version from 0 to 1. V2: We now use the `effective_machine_version` counter provided by `ra_counters:counters/2` if it is available as it is faster than querying the Ra server. If the counter is unavailable, we fall back to the query. The new counter is added by rabbitmq/ra#426 and will be used once a Ra release contains this change.
dumbbell
added a commit
to rabbitmq/khepri
that referenced
this pull request
Mar 28, 2024
[Why] The "client" side of `khepri_machine` implemented in `process_sync_command/3` have a retry mechanism if `ra:process_command/3` returns an error such as `noproc`, `nodedown` or `shutdown`. However, this retry mechanism can't tell if the state machine already received the command and just couldn't reply, for instance because there is a node stopping or a change of leadership. Therefore, it's possible that the same command is submitted twice and thus processed twice. That's ok for idempotent commands, but it may not be alright for all transactions for example. That's why we need a deduplication mechanism that ensures the same command is not applied multiple times. [How] Two new commands are introduced to implement the deduplication system: * #dedup{} which is used to wrap the command to protect and assign a unique reference to it * #dedup_ack{} which is used at the end of the retry loop to let the state machine know that the "client" side received the reply When the state machine receives a command wrapped into a #dedup{} command, it will remember the reply for the initial processing of that command. For any subsequent copies of the same #dedup{} (based on the unique reference), the state machine will not apply the wrapped command and will simply returned the reply it remembered from the first application. Later when the state machine receives a #dedup_ack{}, it will drop the cached reply for that reference. Just in case the client never sends a #dedup_ack{}, the state machine will drop any expired cached entries. The expiration time is based on the command timeout. If it's infinity, it defaults to 15 minutes. This whole deduplication mechanism can be enabled or disabled through the new `protect_against_dups` command option which takes a boolean. This option is off by default, except for R/W transactions. Thus if the caller knows the transation is idempotent, it can decide to turn the dedup mechanism off. Because the state machine's state grows with a new field and handles two new commandes, we bump the machine version from 0 to 1. V2: We now use the `effective_machine_version` counter provided by `ra_counters:counters/2` if it is available as it is faster than querying the Ra server. If the counter is unavailable, we fall back to the query. The new counter is added by rabbitmq/ra#426 and will be used once a Ra release contains this change.
dumbbell
added a commit
to rabbitmq/khepri
that referenced
this pull request
May 15, 2024
[Why] The "client" side of `khepri_machine` implemented in `process_sync_command/3` have a retry mechanism if `ra:process_command/3` returns an error such as `noproc`, `nodedown` or `shutdown`. However, this retry mechanism can't tell if the state machine already received the command and just couldn't reply, for instance because there is a node stopping or a change of leadership. Therefore, it's possible that the same command is submitted twice and thus processed twice. That's ok for idempotent commands, but it may not be alright for all transactions for example. That's why we need a deduplication mechanism that ensures the same command is not applied multiple times. [How] Two new commands are introduced to implement the deduplication system: * #dedup{} which is used to wrap the command to protect and assign a unique reference to it * #dedup_ack{} which is used at the end of the retry loop to let the state machine know that the "client" side received the reply When the state machine receives a command wrapped into a #dedup{} command, it will remember the reply for the initial processing of that command. For any subsequent copies of the same #dedup{} (based on the unique reference), the state machine will not apply the wrapped command and will simply returned the reply it remembered from the first application. Later when the state machine receives a #dedup_ack{}, it will drop the cached reply for that reference. Just in case the client never sends a #dedup_ack{}, the state machine will drop any expired cached entries. The expiration time is based on the command timeout. If it's infinity, it defaults to 15 minutes. This whole deduplication mechanism can be enabled or disabled through the new `protect_against_dups` command option which takes a boolean. This option is off by default, except for R/W transactions. Thus if the caller knows the transation is idempotent, it can decide to turn the dedup mechanism off. Because the state machine's state grows with a new field and handles two new commandes, we bump the machine version from 0 to 1. V2: We now use the `effective_machine_version` counter provided by `ra_counters:counters/2` if it is available as it is faster than querying the Ra server. If the counter is unavailable, we fall back to the query. The new counter is added by rabbitmq/ra#426 and will be used once a Ra release contains this change.
dumbbell
added a commit
to rabbitmq/khepri
that referenced
this pull request
May 15, 2024
[Why] The "client" side of `khepri_machine` implemented in `process_sync_command/3` have a retry mechanism if `ra:process_command/3` returns an error such as `noproc`, `nodedown` or `shutdown`. However, this retry mechanism can't tell if the state machine already received the command and just couldn't reply, for instance because there is a node stopping or a change of leadership. Therefore, it's possible that the same command is submitted twice and thus processed twice. That's ok for idempotent commands, but it may not be alright for all transactions for example. That's why we need a deduplication mechanism that ensures the same command is not applied multiple times. [How] Two new commands are introduced to implement the deduplication system: * #dedup{} which is used to wrap the command to protect and assign a unique reference to it * #dedup_ack{} which is used at the end of the retry loop to let the state machine know that the "client" side received the reply When the state machine receives a command wrapped into a #dedup{} command, it will remember the reply for the initial processing of that command. For any subsequent copies of the same #dedup{} (based on the unique reference), the state machine will not apply the wrapped command and will simply returned the reply it remembered from the first application. Later when the state machine receives a #dedup_ack{}, it will drop the cached reply for that reference. Just in case the client never sends a #dedup_ack{}, the state machine will drop any expired cached entries. The expiration time is based on the command timeout. If it's infinity, it defaults to 15 minutes. This whole deduplication mechanism can be enabled or disabled through the new `protect_against_dups` command option which takes a boolean. This option is off by default, except for R/W transactions. Thus if the caller knows the transation is idempotent, it can decide to turn the dedup mechanism off. Because the state machine's state grows with a new field and handles two new commandes, we bump the machine version from 0 to 1. V2: We now use the `effective_machine_version` counter provided by `ra_counters:counters/2` if it is available as it is faster than querying the Ra server. If the counter is unavailable, we fall back to the query. The new counter is added by rabbitmq/ra#426 and will be used once a Ra release contains this change.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This allows callers to cheaply query a server's current effective machine version.
Closes #424
This should be useful for rabbitmq/khepri#250