-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed cluster in CI #255
Comments
Previous updates were given in the umbrella issue: |
Legacy cluster bringup troubleWe are trying to bring up a familiar There are two problems, in two different contexts:
Nodes not creating blocks, in a single-machine-multi-node contextLogs: local-cluster.txt This setup has a fair number of divergence from the well-established road, mainly in the deployment part:
Configuration key: input-output-hk/cardano-sl@a8c04d5 Note the fixed Branches in related repos:
Nodes failing to start, rejecting their (freshly-made) databases as malformed.Logs: c-a-1.log This setup is fairly traditional as it relies on regular AWS deployment, except it also includes the Of note is that the Genesis & configuration: input-output-hk/cardano-sl@108464c Branches in related repos: |
There is some evidence that the error is caused by DB state loss, so is really a node misbehavior:
I.e. we query the So the next step is to find the mutator that does the damage. Btw, an additional piece of context, is that this action happens in the consolidation code. |
After adding a tracepoint that @intricate suggested (at input-output-hk/cardano-sl@016b38c#diff-a9e07ec6470d6fa7c4708aada702011eR185):
..i.e. the Genesis point is erased from the DB!
|
https://github.com/input-output-hk/cardano-sl/blob/master/lib/src/Pos/Worker/Block.hs#L220 is the caller that drops the genesis point:
|
So, at least the naive, first approach at interpretation looks like this -- |
Fix for |
206: cardano-lib: mainnet CI genesis & configuration r=deepfire a=deepfire Configuration and genesis for IntersectMBO/cardano-node#255 Co-authored-by: Kosyrev Serge <[email protected]>
4247: Single-machine multi-node mixed cluster CI prerequisites r=deepfire a=deepfire This supplies the necessary changes for a mixed-cluster integration test, as per IntersectMBO/cardano-node#255 : 1. `mainnet_ci_full` genesis & configuration, starting in OBFT node 2. fix for an OBFT EBB rollback issue, which was trying to erase EBB even if the chain was started in OBFT mode, leading to IntersectMBO/cardano-node#255 (comment) 3. change in `network-transport-tcp` to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8. This is necessary to avoid problems when starting multiple nodes on the same machine. 4. small improvements in genesis generation Additionally, this resets the protocol version for the `shelley_staging_short_full` configuration to 0 -- a prerequisite for its respin. *NOTE*: perhaps this PR should be split. But then, this repository sees very little activity, so perhaps the separation wouldn't have much benefit. I don't have a strong opinion myself. Co-authored-by: Kosyrev Serge <[email protected]>
4247: Single-machine multi-node mixed cluster CI prerequisites r=deepfire a=deepfire This supplies the necessary changes for a mixed-cluster integration test, as per IntersectMBO/cardano-node#255 : 1. `mainnet_ci_full` genesis & configuration, starting in OBFT node 2. fix for an OBFT EBB rollback issue, which was trying to erase EBB even if the chain was started in OBFT mode, leading to IntersectMBO/cardano-node#255 (comment) 3. change in `network-transport-tcp` to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8. This is necessary to avoid problems when starting multiple nodes on the same machine. 4. small improvements in genesis generation Additionally, this resets the protocol version for the `shelley_staging_short_full` configuration to 0 -- a prerequisite for its respin. *NOTE*: perhaps this PR should be split. But then, this repository sees very little activity, so perhaps the separation wouldn't have much benefit. I don't have a strong opinion myself. Co-authored-by: Kosyrev Serge <[email protected]>
The final deliverable is in: #269 |
Documentation for running the CI cluster locally is at: https://github.com/input-output-hk/cardano-node/blob/serge/mainnet-ci/scripts/README.org#ci-cluster |
Seeing #302 in |
input-output-hk/cardano-sl#4251 is a conditional blocker for merging of the |
input-output-hk/iohk-nix#237 -- a nasty caching bug that prevents local runs of the CI cluster (post-rebase). |
269: Mixed cluster CI r=deepfire a=deepfire _One PR to bring them all_.. ..or, the final deliverable of #255 Dependencies: - [x] input-output-hk/cardano-sl#4252 - [x] input-output-hk/cardano-byron-proxy#70 - [x] #302 Co-authored-by: Kosyrev Serge <[email protected]> Co-authored-by: Marcin Szamotulski <[email protected]>
#269 was merged. |
Goal
This is a subgoal of #211
We want an integration test for the Legacy/OBFT to Shelley/OBFT transition phase.
This means that we want to run a cluster configuration with two segments -- nodes running
cardano-sl
and nodes runningcardano-node
-- with having a number ofcardano-byron-proxy
connecting those.Implementation
The entire cluster is supposed to run in a NixOS test on a single machine, to lighten load on CI & make the test faster.
The cluster has to share the same genesis, for obvious reasons, so
cardano-sl
nodes must use external genesis. Preferably we should use a genesis with configuration as close tomainnet
as possible (to make the testing maximally relevant).Deliverable
The final PR that enables this functionality is #269
The text was updated successfully, but these errors were encountered: