-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent panic on chain replay #1197
Conversation
f05ad1d
to
c181afa
Compare
@LePremierHomme could you have another look please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlem A quick question, how will the cache be recovered after crashes? Seems not queried when booting?
Yeah good point. We would loose it after the crash. Do you think maybe saving it to the app state might be a way to go? |
@karlem The rabbit hole gets deeper. But I think it's possible to read the state during boot up. The validators and gas limits are all stored in the contract or actor, so technically we can call the getters from the store directly. |
@cryptoAtwill that's a good point; it would also take us one tiny step closer to our desired end state of having as much logic as possible in on-chain actors. |
@cryptoAtwill Yeah, I think working with the contract makes more sense than this. I would ditch this in favor of using the contracts, as they are part of the app state—much more solid than this. |
@cryptoAtwill , I believe the issue is that validators are only stored in the actor after top-down finality has been finalized, whereas we need them available earlier. We have two potential approaches to resolve this: Store the initial set of validators from genesis in the top-down finality actor/contract. Create a new actor specifically for storing these validators before top-down finality is achieved. What are your thoughts on this? Also looping in @raulk for input. |
Related: #1166 |
77d6fe3
to
7b48df2
Compare
@cryptoAtwill Changed the implementation to rely on app and exec states instead. |
I'm wrapping my head around the problem and the proposed solution to help push things forward. Here are some notes. If I'm following correctly, this attempts to fix an edge case whereby during a CometBFT startup, the latter may attempts to perform a chain replay either from the WAL or from the network, by feeding blocks to the ABCI app. The issue arises because we recently introduced logic to fetch the block proposer's public key so we could credit gas premiums to their on-chain account. This happens in I think all of this can be greatly simplified if instead of querying CometBFT, we query our power table from the gateway actor and match on the block proposer's identity. The gateway already has public keys. Such a call would be entirely inside the state tree, so it has no dependencies on CometBFT, and would break the problematic cycle. In fact, we already do this in
In a nutshell, I don't think we need one more cache here, nor to manage its lifecycle, nor anything like that. Assuming I'm right, we can kill the @cryptoAtwill does this sound right to you? |
@cryptoAtwill and @raulk After the last review and team decision, I have updated the PR again. The following changes are included:
Hopefully, this aligns closely with what we want. |
d029337
to
b2a588a
Compare
5be7efc
to
750ce1e
Compare
db2f393
to
030d5d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits; other than that, LGTM. Glad we're landing this after the several twists and bends this has taken.
@@ -158,7 +159,7 @@ pub struct GenesisOutput { | |||
|
|||
pub struct GenesisBuilder { | |||
/// Hardhat like util to deploy ipc contracts | |||
hardhat: Option<Hardhat>, | |||
hardhat: Hardhat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good cleanup here!
Close #1196
Removing the dependency on the CometBFT client in favor of caching. When CometBFT is catching up—replaying from the beginning of the chain to synchronize with the ABCI app—it does not start the RPC API. Unfortunately, our ABCI app relied on the API during consensus events, which made it impossible to replay the chain.
Solved by relying on the exec and app states instead of making calls to CometBFT.