Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a gauge for the effective machine version in ra_counters #426

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

the-mikedavis
Copy link
Member

This allows callers to cheaply query a server's current effective machine version.

Closes #424
This should be useful for rabbitmq/khepri#250

@the-mikedavis the-mikedavis requested a review from kjnilsson March 12, 2024 16:22
@the-mikedavis the-mikedavis self-assigned this Mar 12, 2024
@the-mikedavis the-mikedavis force-pushed the effective-machine-version-gauge branch from 6c1cfee to f0d0ef6 Compare March 12, 2024 16:24
@the-mikedavis the-mikedavis marked this pull request as draft March 13, 2024 16:37
@the-mikedavis the-mikedavis marked this pull request as ready for review March 14, 2024 17:47
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment to look into. otherwise fine!

src/ra_server.erl Outdated Show resolved Hide resolved
This allows callers to cheaply query a server's current effective
machine version.
@the-mikedavis the-mikedavis force-pushed the effective-machine-version-gauge branch from f0d0ef6 to a66eac3 Compare March 28, 2024 13:35
@kjnilsson kjnilsson merged commit 0c4deea into main Mar 28, 2024
6 of 7 checks passed
@the-mikedavis the-mikedavis deleted the effective-machine-version-gauge branch March 28, 2024 13:48
dumbbell added a commit to rabbitmq/khepri that referenced this pull request Mar 28, 2024
[Why]
The "client" side of `khepri_machine` implemented in
`process_sync_command/3` have a retry mechanism if
`ra:process_command/3` returns an error such as `noproc`, `nodedown` or
`shutdown`.

However, this retry mechanism can't tell if the state machine already
received the command and just couldn't reply, for instance because there
is a node stopping or a change of leadership.

Therefore, it's possible that the same command is submitted twice and
thus processed twice.

That's ok for idempotent commands, but it may not be alright for all
transactions for example. That's why we need a deduplication mechanism
that ensures the same command is not applied multiple times.

[How]
Two new commands are introduced to implement the deduplication system:
* #dedup{} which is used to wrap the command to protect and assign a
  unique reference to it
* #dedup_ack{} which is used at the end of the retry loop to let the
  state machine know that the "client" side received the reply

When the state machine receives a command wrapped into a #dedup{}
command, it will remember the reply for the initial processing of that
command. For any subsequent copies of the same #dedup{} (based on the
unique reference), the state machine will not apply the wrapped command
and will simply returned the reply it remembered from the first
application.

Later when the state machine receives a #dedup_ack{}, it will drop the
cached reply for that reference.

Just in case the client never sends a #dedup_ack{}, the state machine
will drop any expired cached entries. The expiration time is based on
the command timeout. If it's infinity, it defaults to 15 minutes.

This whole deduplication mechanism can be enabled or disabled through
the new `protect_against_dups` command option which takes a boolean.
This option is off by default, except for R/W transactions.

Thus if the caller knows the transation is idempotent, it can decide to
turn the dedup mechanism off.

Because the state machine's state grows with a new field and handles two
new commandes, we bump the machine version from 0 to 1.

V2: We now use the `effective_machine_version` counter provided by
    `ra_counters:counters/2` if it is available as it is faster than
    querying the Ra server. If the counter is unavailable, we fall back
    to the query. The new counter is added by rabbitmq/ra#426 and will
    be used once a Ra release contains this change.
dumbbell added a commit to rabbitmq/khepri that referenced this pull request Mar 28, 2024
[Why]
The "client" side of `khepri_machine` implemented in
`process_sync_command/3` have a retry mechanism if
`ra:process_command/3` returns an error such as `noproc`, `nodedown` or
`shutdown`.

However, this retry mechanism can't tell if the state machine already
received the command and just couldn't reply, for instance because there
is a node stopping or a change of leadership.

Therefore, it's possible that the same command is submitted twice and
thus processed twice.

That's ok for idempotent commands, but it may not be alright for all
transactions for example. That's why we need a deduplication mechanism
that ensures the same command is not applied multiple times.

[How]
Two new commands are introduced to implement the deduplication system:
* #dedup{} which is used to wrap the command to protect and assign a
  unique reference to it
* #dedup_ack{} which is used at the end of the retry loop to let the
  state machine know that the "client" side received the reply

When the state machine receives a command wrapped into a #dedup{}
command, it will remember the reply for the initial processing of that
command. For any subsequent copies of the same #dedup{} (based on the
unique reference), the state machine will not apply the wrapped command
and will simply returned the reply it remembered from the first
application.

Later when the state machine receives a #dedup_ack{}, it will drop the
cached reply for that reference.

Just in case the client never sends a #dedup_ack{}, the state machine
will drop any expired cached entries. The expiration time is based on
the command timeout. If it's infinity, it defaults to 15 minutes.

This whole deduplication mechanism can be enabled or disabled through
the new `protect_against_dups` command option which takes a boolean.
This option is off by default, except for R/W transactions.

Thus if the caller knows the transation is idempotent, it can decide to
turn the dedup mechanism off.

Because the state machine's state grows with a new field and handles two
new commandes, we bump the machine version from 0 to 1.

V2: We now use the `effective_machine_version` counter provided by
    `ra_counters:counters/2` if it is available as it is faster than
    querying the Ra server. If the counter is unavailable, we fall back
    to the query. The new counter is added by rabbitmq/ra#426 and will
    be used once a Ra release contains this change.
dumbbell added a commit to rabbitmq/khepri that referenced this pull request May 15, 2024
[Why]
The "client" side of `khepri_machine` implemented in
`process_sync_command/3` have a retry mechanism if
`ra:process_command/3` returns an error such as `noproc`, `nodedown` or
`shutdown`.

However, this retry mechanism can't tell if the state machine already
received the command and just couldn't reply, for instance because there
is a node stopping or a change of leadership.

Therefore, it's possible that the same command is submitted twice and
thus processed twice.

That's ok for idempotent commands, but it may not be alright for all
transactions for example. That's why we need a deduplication mechanism
that ensures the same command is not applied multiple times.

[How]
Two new commands are introduced to implement the deduplication system:
* #dedup{} which is used to wrap the command to protect and assign a
  unique reference to it
* #dedup_ack{} which is used at the end of the retry loop to let the
  state machine know that the "client" side received the reply

When the state machine receives a command wrapped into a #dedup{}
command, it will remember the reply for the initial processing of that
command. For any subsequent copies of the same #dedup{} (based on the
unique reference), the state machine will not apply the wrapped command
and will simply returned the reply it remembered from the first
application.

Later when the state machine receives a #dedup_ack{}, it will drop the
cached reply for that reference.

Just in case the client never sends a #dedup_ack{}, the state machine
will drop any expired cached entries. The expiration time is based on
the command timeout. If it's infinity, it defaults to 15 minutes.

This whole deduplication mechanism can be enabled or disabled through
the new `protect_against_dups` command option which takes a boolean.
This option is off by default, except for R/W transactions.

Thus if the caller knows the transation is idempotent, it can decide to
turn the dedup mechanism off.

Because the state machine's state grows with a new field and handles two
new commandes, we bump the machine version from 0 to 1.

V2: We now use the `effective_machine_version` counter provided by
    `ra_counters:counters/2` if it is available as it is faster than
    querying the Ra server. If the counter is unavailable, we fall back
    to the query. The new counter is added by rabbitmq/ra#426 and will
    be used once a Ra release contains this change.
dumbbell added a commit to rabbitmq/khepri that referenced this pull request May 15, 2024
[Why]
The "client" side of `khepri_machine` implemented in
`process_sync_command/3` have a retry mechanism if
`ra:process_command/3` returns an error such as `noproc`, `nodedown` or
`shutdown`.

However, this retry mechanism can't tell if the state machine already
received the command and just couldn't reply, for instance because there
is a node stopping or a change of leadership.

Therefore, it's possible that the same command is submitted twice and
thus processed twice.

That's ok for idempotent commands, but it may not be alright for all
transactions for example. That's why we need a deduplication mechanism
that ensures the same command is not applied multiple times.

[How]
Two new commands are introduced to implement the deduplication system:
* #dedup{} which is used to wrap the command to protect and assign a
  unique reference to it
* #dedup_ack{} which is used at the end of the retry loop to let the
  state machine know that the "client" side received the reply

When the state machine receives a command wrapped into a #dedup{}
command, it will remember the reply for the initial processing of that
command. For any subsequent copies of the same #dedup{} (based on the
unique reference), the state machine will not apply the wrapped command
and will simply returned the reply it remembered from the first
application.

Later when the state machine receives a #dedup_ack{}, it will drop the
cached reply for that reference.

Just in case the client never sends a #dedup_ack{}, the state machine
will drop any expired cached entries. The expiration time is based on
the command timeout. If it's infinity, it defaults to 15 minutes.

This whole deduplication mechanism can be enabled or disabled through
the new `protect_against_dups` command option which takes a boolean.
This option is off by default, except for R/W transactions.

Thus if the caller knows the transation is idempotent, it can decide to
turn the dedup mechanism off.

Because the state machine's state grows with a new field and handles two
new commandes, we bump the machine version from 0 to 1.

V2: We now use the `effective_machine_version` counter provided by
    `ra_counters:counters/2` if it is available as it is faster than
    querying the Ra server. If the counter is unavailable, we fall back
    to the query. The new counter is added by rabbitmq/ra#426 and will
    be used once a Ra release contains this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add effective_machine_version to ra_counters as a guage
2 participants