-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade go-msgpack to 2.1.1 #577
Conversation
This adds the option to specify what version of the `time.Time` encoding format is desired. The default option is false, which preserves compatibility with the current dependency of `hashicorp/go-msgpack 0.5.5` in go.mod, but users of this library will probably want to override the the setting. While we're here, upgrade the go.mod file. I tested that this appears to work with Nomad. This needs to be updated and merged after hashicorp/raft#577
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swenson.
I thought through whether there is anything else that will help us validate that this doesn't break things but I'm don't think there is. I had one suggestion inline that I don't expect to change anything but would give maybe a tiny bit of extra confidence.
// Do a snapshot | ||
NoErr(WaitFuture(raft.raft.Snapshot()), b) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this benchmark used to check before and after performance of the msgpack encoding? What were the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Differences were basically non-existent, as I suspect the disk I/O / sync was the dominant cost.
t.Errorf("Expected time %s to encode as %+v but got %+v", stamp, expected, buf.Bytes()) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth having a similar test that encodes an entire Log
populated with some arbitrary data? My thinking is that that struct is really the main thing msgpack is used for other than some minor RPC framing.
As far as I know the only part of the encoding that caused issues due to changes upstream was the time.Time in the AppendedAt
field which this covers, but it might provided added confidence that there are no other unknown incompatibilities too?
Not essential, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there are some tests like that in go-msgpack
already: https://github.com/hashicorp/go-msgpack/blob/main/codec/raft_test.go which links to a bunch of data I captured from running Raft with various versions of go-msgpack
https://github.com/hashicorp/go-msgpack/tree/main/codec/internal/testdata
🤔 It took me a while to parse this and think through the ramifications. Is this correct @swenson ?
I'm not sure who this advice would be aimed at though? Is that if you import raft but are also a library with downstream consumers? |
Exactly!
This isn't strictly necessary. But, I assume there might be non-HashiCorp libraries or servers using this library? And if so, the next release breaking compatibility unless you specify a certain flag, depending on what version you were using before and what version of go-msgpack you might have in your (With HashiCorp stuff, I think we have it mostly handled.) |
(If we are doing a release, we could also put the upgrade notes in the release notes) |
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designe to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running the prior version of Nomad (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/serf#705 - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 - [ ] hashicorp/memberlist#292 and maybe - [ ] hashicorp/net-rpc-msgpackrpc#12
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
Note: This might handle #473 |
@banks pinging this again |
(Oops, I missed that you approved this @banks. Thanks! I'll wait to merge in case there is something else you wanted to bring up.) |
@swenson thanks. Still OK with merging this, my only pause is about how and where we document the nuances I tried to spell out above - do you have a plan for that? I guess in the raft library changelog |
@banks will do. I'll draft something in a comment later today. |
@banks how about: Hi developers, If you aren't a developer who uses the https://github.com/hashicorp/raft library, you can stop reading now. tl;dr The next Raft release (1.6.0) will switch the underlying hashicorp/go-msgpack library from v0.5.5 to v2.1.1. On its own, v2.1.1 should have the same default encoding as v0.5.5. However, some users of the library have overridden the go-msgpack library with other versions that may be use an incompatible binary encoding. It is possible in go-msgpack v2.1.1 and Raft 1.6.0 to specify which binary encoding is preferred at runtime. We have PRs out for Consul, Nomad, and Vault to facilitate these upgrades and set the options appropriately:
More details are available in the PR #577 and in the release notes (TBD). If you have further questions, feel free to reach out in #tech-raft. Thanks, Release notes for 1.6.0Upgrade hashicorp/go-msgpack to v2, with go.mod upgraded from v0.5.5 to v2.1.1. go-msgpack v2.1.1 is by default binary compatible with v0.5.5 ("non-builtin" encoding of However, if users of this libary overrode the version of go-msgpack (especially to v1), this could break compatibility if raft nodes are running a mix of versions. This compatibility can be configured at runtime in Raft using |
Those look awesome @swenson thanks. I think we can maybe duplicate the Release Notes on both the Releases page in GH and in CHANGELOG.md - we normally update that by hand it seems and on main. If you remember to do that before cutting the release so it's also in the tag then even better but I don't think we are very consistent about that! |
We add a parameter that allows the behavior of the time encoding to be changed by `NetworkTransportConfig.MsgpackUseNewTimeFormat`. The default value is `false`, which keeps the current behavior of time encoding is compatible with with `go-msgpack` v0.5.5 (the version in `go.mod` prior to this change). However, users who upgrade to this version of Raft will no longer have their own version of pre-v2 `go-msgpack` honored, which could cause unexpected binary compatibility problems. This will be a problem for HashiCorp products who might blindly upgrade: * [Vault](https://github.com/hashicorp/vault/blob/f2f532ec802d356346c2302e9c38734f2a594f3f/go.mod#L98) (`go-msgpack` v1.1.5) * [Nomad](https://github.com/hashicorp/nomad/blob/cb2363f2fbbb771af9c8a62ed09a60250df647a2/go.mod#L64) (`go-msgpack` v1.1.5). I don't believe this will be a problem for [Consul](https://github.com/hashicorp/consul/blob/8eb074e7c17a0a2077436637c06f521b036407f5/go.mod#L199) (`go-msgpack` v0.5.5). For the sake of non-HashiCorp products using this library, I'd recommend merging this and making a release with a full `v2` so that people are not surprised. I'll follow-up this PR and release with PRs to upgrade Vault, Consul, and Nomad. I tested this branch with Vault using `NetworkTransportConfig.MsgpackUseNewTimeFormat` set to `true` and verified it was able to replicate to Vault 1.15.0 (which uses `go-msgpack` v1.1.5). (I also tested that the same setting of `false` would fail to replicate to Vault 1.15.0 with a time decoding error, as expected). I also added a quick benchmark in `bench_test.go`, and did not notice any significant performance differences before and after this change (which is not surprising, since I don't assume msgpack is a significant cost for snapshots).
Thanks! |
d796549
to
bd568b2
Compare
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - hashicorp/raft-boltdb#38 - hashicorp/raft#577 I did a simple pair of benchmarks (one with a final sync, one without) and ran them before and after on both my Mac (M2 Max) laptop and my Linux (AMD Threadripper 3970X) desktop. tl;dr There was no performance difference for this benchmark. ``` goos: darwin goarch: arm64 pkg: github.com/hashicorp/vault/physical/raft │ a.txt │ b.txt │ │ sec/op │ sec/op vs base │ RaftWithNetwork-10 58.65m ± 2% 58.62m ± 2% ~ (p=0.937 n=6) ``` ``` goos: linux goarch: amd64 pkg: github.com/hashicorp/vault/physical/raft cpu: AMD Ryzen Threadripper 3970X 32-Core Processor │ c.txt │ d.txt │ │ sec/op │ sec/op vs base │ RaftWithNetwork-64 5.861m ± 1% 5.837m ± 0% ~ (p=0.240 n=6) ```
We add a parameter that allows the behavior of the time encoding to be changed by
NetworkTransportConfig.MsgpackUseNewTimeFormat
. The default value isfalse
, which keeps the current behavior of time encoding is compatible with withgo-msgpack
v0.5.5 (the version ingo.mod
prior to this change).However, users who upgrade to this version of Raft will no longer have their own version of pre-v2
go-msgpack
honored, which could cause unexpected binary compatibility problems.This will be a problem for HashiCorp products who might blindly upgrade:
go-msgpack
v1.1.5)go-msgpack
v1.1.5).I don't believe this will be a problem for Consul (
go-msgpack
v0.5.5).For the sake of non-HashiCorp products using this library, I'd recommend merging this and making a release with a full
v2
so that people are not surprised.I'll follow-up this PR and release with PRs to upgrade Vault, Consul, and Nomad.
I tested this branch with Vault using
NetworkTransportConfig.MsgpackUseNewTimeFormat
set totrue
and verified it was able to replicate to Vault 1.15.0 (which usesgo-msgpack
v1.1.5). (I also tested that the same setting offalse
would fail to replicate to Vault 1.15.0 with a time decoding error, as expected).I also added a quick benchmark in
bench_test.go
, and did not notice any significant performance differences before and after this change (which is not surprising, since I don't assume msgpack is a significant cost for snapshots).