-
Notifications
You must be signed in to change notification settings - Fork 88
Logbook 2024 H1
-
To execute
schnorrkel
on L2 I had to play with themaxTxExUnits
since I would get budget overspent errors. -
If I alter
genesis-alonzo.json
file I can get past this error but then init tx is never re-observed (it seems like it was tried to post but there is no error and the test just hangs) -
I was expecting to have to change
Evaluate.hs
module since that one contains the same definitions for values on L2 ledger but altering the values there (just slapping one more zero toexUnitsMem
andexUnitsSteps
) didn't resolve the issue with the budget. -
Altering the forementioned values also increases the fees needed for other Hydra transactions, perhaps I also need to play with the wallet code in order to get everything running but it seems like the plutus function really needs to run withing the budget limits.
-
Maybe the ledger changes are not correctly implemented when it comes to the this new
schnorkkel
function? -
Ok, even if I don't use
schnorkkel
function in our dummy validator I still get the overspent budget error:
The plutus evaluation error is: CekError An error has occurred:\\nThe machine terminated part way through evaluation due to overspending the budget.\\nThe budget when the machine terminated was:\\n({cpu: -100\\n| mem: -100})\\nNegative numbers indicate the overspent budget; note that this only indicates the budget that was needed for the next step, not to run the program to completion.\
so I think there is something off with the plutus fork we are using.
-
We tried running a head between two parties and noticed that deposit/increment mainly doesn't work for whatever reason. I saw it also working but in majority of cases the deposit suceeds but increment is either not observed or didn't run at all.
-
As the first step let's figure out if increment actually ran or not so we can go deeper.
-
Ok, it doesn't seem like increment ran at all!
-
We have this condition in code:
case find (\(_, depositUTxO) -> Just depositUTxO == utxoToCommit) (Map.assocs pendingDeposits) of
and in case this find
returned Nothing
we don't issue any error message at
all. Perhaps the recorded deposit is not matching with what we have in the
snapshot to commit?
-
I added a
CommitIgnored
message just so we are able to figure out what is happening. -
Re-running the local demo with this change in place since I think this is what's happening and the reason we don't do any increments.
-
One thing confusing me is this one:
{"timestamp":"2024-12-03T11:16:01.589868581Z","threadId":539,"namespace":"HydraNode-\"2\"","message":{"api":{"reason":"Network.Socket.sendBuf: invalid argument (Bad file descriptor)","tag":"APIConnectionError"},"tag":"APIServer"}}
-
Also it seems like my two nodes are not connected - not able to exchange messages for some reason. Perhaps the port is busy? I'll re-run them with different ports.
-
I was having wrongly configured nodes duh... Once they are properly connected I managed to see
H4
which is aHeadValueNotPreserved
on increment! -
Very weirl, I'll beef up the local e2e test to spin two nodes and try to deposit from a non-leader node.
-
The test turns out to work and I think part of the problem is that one of the node always sees a failing tx in case it was too late to submit it and the other node managed to get ahead first.
-
Testing different scenarios proved to work. I want to perhaps add a condition that you need to wait on decommit to finish before doing any deposits and also that you can't decrement in case there is a pending deposit.
-
I talked to FT about tx trace testing and we might change our approach but let's see. Seems impossible to get the coverage we expect if we constrain everything in the precondition/validFailingAction
-
I discovered bugs when trying to do close/fanout after the increment in our e2e tests. This is the outcome of the redeemer changes for close/contest and also the way we handle snapshots.
-
HeadLogic needed to not include utxoToDecommit together with the utxo field - this was causing duplicate entries in utxo field. After altering this I found out that we need to specify
CloseUsedInc
also when calculating the deltaUTxO so that I am able to use deltaUTxO to provide a valid sig on-chain. -
This change means deltaUTxO is not tied anymore just to decommits so now I am stuck in a problem between close/fanout. I can make close work by adding utxoToDecommit hash as deltaUTxO. This makes the close valid but fanout expects that any utxoToCommit hash is present in the close datum. This makes the close invalid but would make fanout valid. So I need to find a way to satisfy both.
-
One option would be to pass to the fanout tx utxoToCommit too and then use it like so
Head.Fanout
{ numberOfFanoutOutputs = fromIntegral $ length utxo
, numberOfDecommitOutputs = fromIntegral $ maybe 0 length utxoToDecommit + maybe 0 length utxoToCommit
}
and also make changes in the CloseUsedInc
redeemer to include the utxoToDecommit hash and not rely on deltaUTxOHash.
-
Unfortunately this leads to some weird plutus error I can't reason about from the error message. What could cause this?
-
Plutus debugging is really shit when it comes to situations like this one.
-
I am trying to view the tx and try to figure out from that what can cause this failure.
-
It was the case that fanout was missing any outputs
-
To solve things with more explicit approach I decided to add
alphaUTxOHash
to keep information about any utxo to commit on-chain -
This solved our bugs with close/fanout and hopefully we are more explicit but also more clear in what we are trying to do.
-
I don't think we ever successfully do increments. Currently trying to get the increment tx to find the deposit txid.
-
I altered the AppM type to hold information about the deposit txid too since I needed to record it upon successfull deposit so I can use it to construct the increment tx.
-
This made the deposits work but now I see some problems with contests. H29 - TooOldSnapshot
-
I worked a lot on the close/contest and it seems like the current situtation does not allow proper contest modelling. If I have a snapshot with a snapshot number higher than the closed snapshot one how can I successfully contest?
-
Ok, I think it is time to wrap things up with these types of tests. I should also remember to add close and fanout to some e2e tests that exercise de/commits just so we are testing the close/contest logic some more.
-
After some changes in decrement to make sure only to do it if there is nothing pending I am seeing
CannotFindDepositOutputInIncrement
. I think I should investigate if the depositing works withadjustUTxO
function. -
I spent the whole day helping Dan with his branch
-
I have everything green right now but the coverage for fanout with empty utxo is not making the tests pass.
-
Here and there I see I generate something to decommit but to get something to commit it seems like there is something preventing deposits to appear more often.
-
In fact it seems like we never generate something to commit!
-
Perhaps including some utxo with Deposit was a mistake? Perhaps I should use the NewSnapshot for that too? -> No. We need to invoke a deposit tx in order for commit to workout nicely.
-
In order to test out the benefits of using a simple db like sqlite and in general, help out the hydra-doom project to squeeze better performance we decided to do this experiment
-
Adding a new module with the similar interface (handle) proved to be easy.
-
Spent some time with the tests but all green now
-
Replacing the file persistence with this new module everywhere
-
Some tests need adjusting since we don't use temp files anymore (perhaps I could make it work). Slashes don't play nice with the db name so that's why I don't use tmp folders anymore but maybe I could find a workaround so we don't get db files all over after running the tests. There is a dropDb command I added but then I need to make sure to use it everywhere.
-
So the current problem is that we are trying to close with snapshot 0 and something to decommit and we get a invalid sig error. We rely on a fact that snapshot number 0 is tied to initial state and there is the problem we have since the state is different between the input/output datum which causes sig failure. For now I'll just make sure to close only if closedSnapshotNumber is > 0 in the precondition.
-
After this change we go further and now the decommit is failing because of
VersionNotIncremented
. -
I've progressed further and now constantly trying to solve the problem where close sig is invalid.
-
The problem is that all looks good and aligned in the local state, snapshot and versions so it is not obvious why the signature fails.
-
This is how the actions look like:
Assertion failed (after 18 tests):
do action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
action $ Close {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
failingAction $ Contest {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
failingAction $ Contest {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
failingAction $ Contest {actor = Carol, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
failingAction $ Contest {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
action $ Fanout {utxo = [A,C], deltaUTxO = [B]}
pure ()
Model {headState = Open, knownSnapshots = [ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}], currentVersion = 0, closedSnapshotNumber = 0, alreadyContested = [], utxoInHead = [A,C], pendingDeposit = [], pendingDecommit = [B]}
Close {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 1, inHead = [A,C], toCommit = [], toDecommit = [B]}}
Expected to pass validation
-
So immediately after the
NewSnapshot
we are trying to close. This leads me to believe that the input datum contains different information than what we set in the output datum thus the signature failure. -
Visually I see the datums are different but I should try to parse and see exactly what is not matching in there.
-
After parsing both input and output datums I see that the utxoHash field is different.
-
In my opinion this hash should stay the same if we try to close without any actions before and then we experience failure.
-
I made sure to produce the same utxoHash as it is in the input datum by not taking out the utxo to decommit beforehand but after we see the decrement tx succeed.
-
Now I can confirm the two hashes are the same but we still get invalid signature error and I think I know what's at play here. In the TxTraceSpec we sign the snapshot right before calling the offchain tx building code. We say in the
close
that in case open and snapshot versions are the same and you still have someting to decommit in the snapshot that all you need to do is provide a already present signature but on-chain we still haddeltaUTxOHash
parameter for the decommit hash! The tests actually found a bug! -
On top of this the tests are failing because the redeemer contains the wrong hash it seems. This is happening because we have a signature for a snapshot that claims it already decommitted something (since versions are the same) but in fact we produced a signature that contains something to decommit - wrong! We should bump the version when generating snapshots I think.
-
This makes this seed pass so I can move on to the next error in contest
TooOldSnapshot
. Yay!
-
I've made sure to properly generate snapshots that contain valid utxo to de/commit. Currently experiencing some issues with contest and
H12
but in general I think I fixed a lot of corner cases we had in there. Current tactics is to generate bunch of different snapshots and than control them in precondition and validFailingAction to filter out the ones that should work. -
One note the de/crement tx bumps the snapshot version by itself so we should not bump before some de/crement action is performed.
-
Having problems writing up this logbook entry since I change so many details in the testing quick and then forget to write about it.
-
Continuing work on modelling the deposit/increment in the TxTrace tests.
-
Focusing on snapshot generation in presence of de/commits and pending deposits.
-
Having problems getting the expected coverage for fanout (with some pending utxo to de/commit)
-
Made everything green and then started with some minor refactoring for the close/contest redeemers. Still want to revisit this code since it needs to be bulletproof.
-
I fixed the TxTraceSpecs by making sure we don't produce invalid snapshots - ones that don't have matching versions but still don't have anything to de/commit in the snapshot.
-
Proceeded to fix one failing mithril test and then stubbed out the Deposit and Increment in the TxTraceSpec. Monday will be all about modelling Deposit/Increment in the trace spec.
- I basically wrote an e2e test to see how persistence behaves and it worked without any issues. I even tried parsing the state file to inspect the last tag we have recorded so I think the issue is somewhere in hydra-doom.
-
I think now that everything is green again I will finally implement contest redeemer changes.
-
I am looking at the spec and pretty much doing what I did with the close redeemers already but it seems we have a corner case - if the versions do not match which close/contest redeemer you want to set? We don't know if the increment or decrement bumped the version so we can't know which redeemer to set?
-
This requires us to think how to handle this situation both in code and the spec.
-
Our local and snapshot versions are the source of truth here. The problems are not that big as I initially thought and we should probably refactor some code I wrote in the end but the main thing is that the version determines everything when it comes to the knowledge of incremental de/commits.
-
Continuing from yesterday to try and figure out why hydra-scripts-tx-id parser is not working.
-
After staring at the code for a long time I figured out what makes the hydra-node executable halt and not respond when trying to run it. If you specify a
value
in your options parser and this value somehow is not matching with the type you are trying to parse everything halts for whatever reason! omg... -
And now the publishing fits into the reasonable tx sizes so we're back on track baby!
-
I want to package all changes nicely into a single commit so we can get rid of it fast in case we get the scripts to be smaller in the end. This solution works and I would go for it but just leaving room for some improvements.
-
I think now is finally the time to pay attention to close/contest offchain and make necessary changes to the redeemers which means we are wrapping up the feature. I don't see what else is missing a part from the spec changes?
-
Before doing that I decided to make all tests green again since spec was failing on fanout for some reason.
-
I got all StateSpec green and will do the same for some logging tests and similar stuff.
-
Since we merged the initial aiken validator work + plutus V3 to master now is the time to rebase it on top of incremental commit work and see how are script sizes.
-
Had to recompile aiken scripts and fix some stuff I screwed up during rebase but now I see that the publishing tx is
17401
bytes big so I think the next step is to try and useasData
plutus trick to cut down on the head script size. -
The other, completely opposite, approach is to use
Plutarch
and try to replace the individual Head script checks incrementally. Perhaps I should opt in to do this sooner rather than later since I have heard great things about Plutarch and the produced script sizes. -
Third option would be the simplest one and that is to use list of tx-ids as the argument to the hydra-node. Perhaps instead of wasting a lot of time to optimize a Head script easiest would be to just make the publishing work with multiple transactions instead of one. There really are no cons for this approach besides making the users give in a list of tx-ids instead of just one. On the command like it would look really similar so I don't think this is an issue. But on the other hand it is kinda lame to have to change the current hydra-node behavior so I am not really sure now...
-
I noticed we don't optimize our minting policy so I added
conservative-optimizations
andoptimize
pragmas which brought dramatic reduction (from11111
to5555
bytes) but since this is not the script we publish I still need to cut down on a Head script size. -
Ok. Let's try with the
asData
plutus trick and see where does that lead us. -
As expected there are a lot of changes since
asData
requires us to disambiguate all record fields and disableDuplicateRecordFields
. -
For some reason in the end I am getting:
hydra-node: Error: Unsupported feature: Type constructor: GHC.Prim.Addr#
Context: Compiling expr at "hydra-plutus-0.20.0-inplace:Hydra.Contract.Head:(761,5)-(761,48)"
and the line that fails is the one compiling the typed validator:
compiledValidator :: CompiledCode ValidatorType
compiledValidator =
$$(PlutusTx.compile [||wrap headValidator||])
where
wrap = wrapValidator @DatumType @RedeemerType
- When re-compiling with removed
Strict
andno-specialise
pragma in theHeadState
I see:
hydra-node: Error: Reference to a name which is not a local, a builtin, or an external INLINABLE function: Variable Hydra.Contract.HeadState.$mInitial_go
No unfolding
Context: Compiling definition of: Hydra.Contract.HeadState.$mInitial
Context: Compiling definition of: Hydra.Contract.HeadTokens.validateTokensMinting
Context: Compiling definition of: Hydra.Contract.HeadTokens.validate
Context: Compiling expr at "hydra-plutus-0.20.0-inplace:Hydra.Contract.HeadTokens:(187,5)-(187,100)"
so it seems this has something to do with the minting policy???
-
I couldn't get a proper answer in the plutus channel a part from the fact that there has to be some type in the
asData
that is not parseable by plutus for some reason. -
I moved on to implement the script publishing as three separate transactions and got pretty far with it. It is not a problem to not have the tx-ids in order since we compare the script hashes to figure out which output is which. Just need to get the parser working nicely.
-
Continuing to debug the initial validator. Assumption is we run into a validator branch that is not properly covered by some trace.
-
Even returning True from the
ViaCommit
redeemer yields the sameL82;5
error. Omg.... -
Enabling verbose tracing in aiken and trying to figure out if we get the same commit validator hash in the initial argument as we expect. This yields:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["redeemer: Redeemer"]))]
Again wat? I am trying to trace the commit validator hash here:
ViaCommit(committed_refs) ->
trace @"commit_validator": commit_validator
True
- Let's try returning true even before pattern matching on a redeemer:
expect Some(head_id) = datum
trace @"commit_validator": commit_validator
True
-
Still the same BUT! I got an idea! What if this happens because we are using plutus
V2
to assemble the redeemer while initial expectV3
version? And then the parsing of the datatype fails with this misterious error? Let's give it a shot! -
This makes sense to me but the problem is that it seems like the plutus version is baked into the ledger codebase so you can't easily mix them.
-
I think our best bet is to make the branch with plutus V3 work by trying to decrease the size of the initial validator so that it can fit in the publishing transaction and then rebase the incremental-commits branch on top of that one.
-
FT and me managed to cut down on the initial script size by around 300 bytes so now we can publish scripts again.
-
Need to fix two failing mutations related to commit transaction.
-
Commit is not healthy and I see
I12
because commit datum is empty for whatever reason. This is weird, I am tracing the commit script hash since commit always has a single output. -
Seems like we were using wrong validator hash in
aiken blueprint apply...
For some reason aiken can't use the hash I have available in plutus.json. It is happy to take in whole validator cbor on the other hand but that is just wrong. I saw a mention of calling acbor.serialize
function but the value is already a valid cbor?
-
Now that we rewrote the initial validator in aiken I can try to see if it fits in the publish transaction.
-
It does! woo-hoo! We still get some mutation errors and need to deal with golden plutus tests since now we need to automate the
blueprint apply
command but overall I am very happy since we are able to wrap things up in terms of script sizes. -
I need to be able to do
blueprint apply
manually in order to automate it later on but for some reason it always reports malformed cbor string. If I use interactive aiken prompt everything works. I am trying to go back to the version1.14
and see if this works there. (no it doesn't)
➜ aiken blueprint apply cf8b5028b108
Analyzing blueprint
Parsing inputs
Error aiken::blueprint::parse::parameter
× I failed to convert some input into a valid parameter
help: Invalid Plutus data; malformed CBOR encoding: decode error: unknown tag for plutus data tag
-
Whatever I try I can't make the aiken command take the cbor string directly. I'll leave it off for the time being but we need to do this.
-
Continuing to look at Commit/is healthy tests I see weird error:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["L82;5"]))]
-
Wat? Why can't I have nice things in life.... :)
-
Returning True in
ViaCommit
path in the initial redeemer didn't help. -
At this point I am not sure how to proceed. I wanted to print the increment tx and then run the tests and try to detect what is outstanding from looking at the commit tx from the tests.
-
At least one good thing is that basically all of the commit tests fail with
L82;5
-
But how to tackle this error?
-
Ok, one thing that is a step forward - if you try to commit using blueprint tx that one also fails with the same error!!! So this has something to do with the blueprint tx somehow.
-
I vaguely remember I had to actually make sure that
FullCommitRequest
works with commits but not really sure right now.
-
I think the main stream of work today is to make sure FT's branch work since that one contains the initial aiken validator and bump to plutus V3 which allows smaller script sizes for the work on incremental commits.
-
Right now we get a nasty failure that looks like this:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.
Caused by: unBData
(Constr 0
[ Constr 0
[ List
[ Constr 0
[ Constr 0
[ B #7db6c8edf4227f62e1233880981eb1d4d89c14c3c92b63b2e130ede21c128c61
, I 21 ]
, Constr 0
[ Constr 0
[ Constr 0
[ B #b0e9c25d9abdfc5867b9c0879b66aa60abbc7722ed56f833a3e2ad94 ]
, Constr 1 [] ]
, Map [(B #, Map [(B #, I 231)])]
, Constr 0 []
, Constr 1 [] ] ]
, Constr 0
[ Constr 0
...
-
So it looks like decoding of redeemer fails here? Initially I though it was the
ScriptHash
parameter for the increment transaction. -
I am trying to figure out exactly what fails here: parameter, redeemer or the datum?
-
From what I see the initial and commit script hashes look correct.
-
In the end we figured out we actually need to
apply
a validator argument usingaiken blueprint apply
... duh... -
Now we need to fix the initial validator since it still throws errors on mutations for commit and collect.
-
Seems like tracing gives fast results compared to looking at the logic to figure it out.
-
Ok, I am not going to try and get rid of
Option
in a list and fail early but leave it until the end to pattern match. -
Got everything green but one
I03
mutation -
Seems like we are checking if the value is
>=
in the validator but in the mutation we definitely produce larger values so this is expected to fail. Negating values worked. -
Everything is green but the script sizes are not satisfactory. We need to gain like 300 kb somehow if we want to get this merged. Seems like the head script got bigger by updating to V3.
- Ok, just updating the mithril to the latest version fixed the problem
- We got the user complaint on discord that this request :
{
"blueprintTx": {
"cborHex": "84a600d9010281825820fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a9000dd9010281825820fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a901018002000ed9010281581c9f55c6328203ab96b079ca0347194836aa8013532b68a8889afa6b840b5820e4522c9439f91001ae113ecba30044593de966600f64a0d170be818703defa2fa207d901028158c358c1010100332323232323223225333004323232323253330093370e900118051baa001132323232330010013758602260246024602460246024602460246024601e6ea8020894ccc044004528099299980799b8f375c602600401a29444cc00c00c004c04c004c03cc040008c038004c02cdd50008b1806180680118058009805801180480098031baa00114984d958dd7000ab9a5573aaae7955cfaba1574498011e581c9f55c6328203ab96b079ca0347194836aa8013532b68a8889afa6b84000105a18200008201821a000186a01a000f4240f5f6",
"description": "",
"type": "Tx ConwayEra"
},
"utxo": {
"fa8a263c3b2b4de3dca8c6f9abd6e6da1dc28c8ff3e0025796ed0d717d8b37a9#0": {
"address": "addr_test1wze4wd4nrv5wra70glypjlk7ff2n46ty29qf2dqv53q33us7q0lpp",
"datum": null,
"datumhash": null,
"inlineDatum": "d87980",
"referenceScript": null,
"value": {
"lovelace": 2000000
}
}
}
}
is yielding:
Error in $.blueprintTx: \nunexpected \"l\"\nexpecting hexadecimal digit\nIncorrect transaction id format: Expected Base16-encoded bytestring, but got b; invalid bytestring size
-
Since I need to sync the preprod and everything I suggested they try the request with the specified
txId
which you can calculate using cardano-cli and specify the tx type asUnwitnessed
to see if that fixes the problem before I can take a look. -
We went back and forth to figure what is the problem and finally it was just that the inline datum should be encoded as json value not the hash
-
Trying to help FT on this one. We see problems related to data formats and I think it is related to how we pass the commit validator to initial one.
-
The plutus error I get almost certainly points to some large structure (like script) but we can't mix and match Plutus V2 and V3 when providing argument to the initial script (the commit validator). Therefore we need to first bump to
PlutusV3
then check if FT work is ok and the initial validator works. If all of that is good finally I can use it on my branch to see if we are able to publish scripts with reduced script sizes.
- Continuing with the final step for the aiken deposit validator, I am missing to check if all recover outputs are matching the deposit outputs we recorded in the datum. Might be that the aiken custom type I use on-chain is somehow not aligning with the haskell version we put in the datum. What worries me is that when I trace what I get on chain it looks like this:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["depositOutputs: [_ 121([_ 121([_ 121([_ h'099828E6353CCC63F28FCB11FF71E92A48BE3DECCF7739A564CA3992']), 122([])]), {_ h'': {_ h'': 19 } }, 121([]), 122([])])]","deposited: [_ 121([_ 121([_ h'0000010000000001010000000100010000000000000001010001000101010000']), 79])]","preSerializedCommits: h'4BAECAAF54C46AC8C2CDF7CC256CEBE547806810608AE8B86B5B5AA78AF49FC7'","hashOfOutputs: h'916F843120D16BDE9D98F9D50E7A7708D5A1EABA1CA261BD3A710F7B60AAC271'","D05","Validator returned false"]))]
-
So deposit outputs are not matching what was deposited so the hashes of course don't match. I need to see exactly same values on both ends before I can compare them.
-
I made sure to follow exactly what I did in the haskell implementation of a deposit validator and BAAM! I got it working. What I needed to do is exactly sort by the
TxOutRef
and then use thepreSerializedOutput
field for hashing. Nice! -
Now I'd like to use a deposit script as a reference input to further reduce the tx size. Running
tx-cost
to see the current sizes and then use the deposit reference. -
I am realizing I didn't even add the increment transaction to
tx-cost
. Doing that now. -
This is how the results look:
## Cost of Increment Transaction
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
| :------ | ------: | --------: | --------: | --------: |
| 1| 1842 | 20.01 | 9.06 | 0.46 |
| 2| 2079 | 22.29 | 11.03 | 0.50 |
| 3| 2124 | 22.83 | 11.93 | 0.51 |
| 5| 2444 | 26.45 | 15.26 | 0.58 |
| 10| 3227 | 34.65 | 23.19 | 0.74 |
| 50| 9151 | 99.48 | 85.75 | 1.97 |
-
Now let's use the deposit as a reference to see the outcome.
-
I see
MissingScript
errors but don't want to deal with that right now so I'll move on to do needed changes in contest instead. -
First I refactored and added the missing
ContestUsedDec
mutations. We were missing to test out a bit the case when the decommit was not used. -
Now moving on to the new
ContestUnusedDec
. All these redeemers should have appropriate mutations.
-
Today we need to take a look at fixing the two functions that should hash the serialized
TxOutRef
in the deposit validator. -
It seems like the produced aiken datum is not in the correct shape as the plutus one.
Caused by: unListData
(Constr 0
[ Constr 0
[ B #3f4610dea46ef10fb5be4396c992dd657a4f5852019c8010b56aff32
, I 1730196006889
, List
[ Constr 0
[ Constr 0
[ Constr 0
[ B #0101010101010100000101010100000101010001000001010100010000000100 ]
, I 19 ]
, B #d8799fd8799fd8799f581ce0ae7da7d247a68e71b40a81b04b1d3f4dc809a1eeef582f35cd22b8ffd87a80ffa140a14003d87980d87a80ff ] ] ] ])) ["Expected the List constructor but got a different one"])),(ScriptWitnessIndexTxIn 1,R
-
Currently digging through internet to see if somebody fall into the same trap.
-
I also asked the hydra-doom guys since they are working a lot with aiken.
-
Trying to use plutusV3 to construct the datum and see if that works -> NO
-
Trying to use type alias for datum in deposit, the same way we do for commit.ak -> NO, but I think I am on the right path. Looking at how we did things when rewriting the commit validator should provide a solution.
-
BAM! Success! I was missing to properly encode datum and redeemer to plutus data and I used type alias for the datum just like we do in commit aiken validator.
-
Now the final piece of the puzzle is how to implement check that outputs match what was hashed in the datum?
-
I tried using both list of
TxOutRef
and already hashed data to compare but didn't get anywhere. -
Tracing to see what values I get on-chain and it seems that list of deposited outputs is a cbor string? There is a nice debugging function
cbor.diagnostic
which I am using to inspect the datum values. In general Aiken feels great even for me with a lot of years looking at haskell. Syntax throws me off of course but I could get used to it and I should if we plan to use aiken for the head script rewrite.
-
Today should be the time to deal with close tx and changes in the redeemer. The code looks a bit unwieldy but once it is correct I can try to improve upon it.
-
I ended up starting from scratch. Feels like there is a need for a new redeemer to signal that we are closing, there is nothing to inc/decrement but we are not using initial snapshot. This is not covered by the new redeemer types and we need to also add this into the spec.
-
This change makes the e2e tests green again.
-
Next I'd like to test the changes to the deposit aiken validator since I cherry picked the commit FT made.
-
If all works I would use the deposit as a reference input to increment and recover.
-
Had to propagate this change across the code-base and make sure I always use
PlutusScriptV3
for the deposit script.
-
Hydra doom people had a question related to one of our checks in the initial validator https://github.com/cardano-scaling/hydra/blob/master/hydra-plutus/src/Hydra/Contract/Initial.hs#L125C9-L126C37 . They have the need to change the values distributed on L2 without using an extra tx on L2 hopefully. The increment validator is making this impossible since it expects that whatever we put in the commit datum to be matching exactly with what we consume in the commit tx and the same needs to be present in the initial redeemer (
ViaCommit
) . What they would like is to keep the UTxO value check but don't check if the exact inputs are also locked so that they could distribute the UTxO on L2 as they please. Question is: does this affect any security guarantees of a Hydra Head protocol? -
I wrote up an issue about this and agreed to arrange a meeting with Matthias to go through the implications of this change.
-
I wanted to first fix the
StateSpec
tests I added yesterday. These are propertly tests so perhaps they find some interesting corner cases in our initial validator. Together with mutation tests these should give us the clear picture if our validator was implemented correctly or not. -
I realized most of the generated types depend on what we have in the deposit generator since increment is spending this output so I re-used
OpenState
andHydraContext
and improved couple of things (like deposit deadline re-use) so the tests are green now. -
Looking briefly at FT e2e test that faile (related to recover) I realize that
threadDelay
doesn't behave the same on MacOS and Linux so we added extra second in order to get this test green on FT machine as well. -
Happy that this worked out nicely I am looking at the e2e tests that fail. On close we try to use
CloseInitial
but that expects bothversion
andsnapshotNumber
to be zero (which feels wrong). We could have many snapshots but stay in the same version in case we never did in/decrement. -
If I revert all changes in
Close
module to master then the test is green. -
I am trying to refactor close off-chain logic but I think I better do it with a clear head on my shoulders on Monday since I am tired rn.
-
I need to time the time it takes for the hydra-node to catch up and see an open head if you use
--start-chain-from
in the past. This is needed for hydra-doom of course since they want to pre-open many heads and then make them catch-up with the chain. -
Lots of problems with making a linux command to look at websocket messages and stop once it sees
HeadIsOpen
and of course measure the time. -
I even pinged my friend who is systems guy and he helped use some python script since all linux commands we used were not stopping once we see the mentioned hydra-node message.
-
I tried doing this on AWS a week ago by my cardano-node kept restarting for some reason.
-
Locally I can't get to a open head state for some reason. I restart the hydra-node using start-chain-from which is one block before but still I don't see the head open.
-
Ok got some results at least. So the Head got opened here https://cexplorer.io/tx/cabd84249d9d4ba37c58801f7f0d695c2bb17569ffd23b29b0ccba45aa6d0a75 at
138313510.cabd84249d9d4ba37c58801f7f0d695c2bb17569ffd23b29b0ccba45aa6d0a75
-
If we go back one block to be able to observe Head opening and want to sync to
{
"block": 11013347,
"epoch": 517,
"era": "Conway",
"hash": "8a26f38d9118f8c7b51ab823e4c3eb8919c51ad9bea343a9c30332c25a43437f",
"slot": 138390441,
"slotInEpoch": 409641,
"slotsToEpochEnd": 22359,
"syncProgress": "100.00"
}
which is currently 21h31m ago so less than a day it takes 2m35.174s
- Now if I choose some slot from the previous epoch
https://cexplorer.io/block/f066dda7d6870b51bba99e5a2df538eb953f068858e0a23511748ef520c6ed4a
which is currently
9d17h59m
ago then to sync to a block
"block": 11013414,
"epoch": 517,
"era": "Conway",
"hash": "e69b2bb04568837690b7e810125608373fa722a2170a3a04fdbe6d36025cf2ef",
"slot": 138391676,
"slotInEpoch": 410876,
"slotsToEpochEnd": 21124,
"syncProgress": "100.00"
}%
takes 5m21.054s
- Curios to see
FeeTooSmallUTxO
when trying a new tx using 1,2 and 3 ada. Is the head actually calculating correctly fees on mainnet?
-
Went through some questions with SN, all seems good so far just minor clarifications. We should rewrite our initial contract in aiken to be able to publish transactions.
-
Continued with the blueprint script commit test in the e2e and FT and me had some luck to find the problem. The only thing is I'd like to know why the
utxoToCommit
is not the same as the deposit out value? -
Found out that the value mismatch is happening because when constructing the increment tx we try to find the deposit UTxO in the
spendableUTxO
and for some reason the UTxO inside has slightly different value. I am investigating further. -
Phew... After some serious debugging I found out that the deposit output gets balanced and if the incoming value is too small (just like in my case) the output gets like around 700 ada more after balancing. So the observed value and the value in the UTxO is not matching. If I produce high enough lovelace output then the test is green. How to deal with this problem? I think since I try to commit a script input with just above 1 ada this get's autobalanced in the wallet when posting deposit and then when we observe the value is not matched with what we expect. There should be a note somewhere (in the docs most probably) that users need to be careful to provide correct amount of ada at each deposit they send.
-
Happy to see this working end-to-end so we now move forward knowing the current situation is working fine.
-
Yesterday I solved two major bugs. Now I want to make sure everything works (as in the spec) and tests are green before diving back to the deposit aiken validator.
-
We found another major bug which we fixed. We were assuming the head output is always the single one but there is also a change output to the internal wallet which we neglected.
-
Now increment validator is behaving nicely and I am looking at commit blueprint tx spec in the e2e since that one also fails with
H4
. -
Ok, time to gain back some sanity. Ok, time to gain back some sanity. Why would the full commit request (using blueprint tx) fail when we check that deposit value + whatever value the head contained is equal to the head output?
-
This is the corresponding check:
mustIncreaseValue =
traceIfFalse $(errorCode HeadValueIsNotPreserved) $
headInValue <> depositValue == headOutValue
- I will print the corresponding increment tx that failed in wallet to spot what is wrong.
-
I wanted to see in the end if I use deposit (aiken) script as a reference how will the increment tx sizes look like? Right now I am getting a missing script error for some reason and need to investigate since I already made sure to publish the script beforehand.
-
I think I altered all possible places to mention the new aiken deposit script but still get
MissingScript
errors in e2e. I decided to step back and try to find out the original cause for the plutus script error I got since it is not gone yet. Yesterday I saw all tests green when I added big enough contestation period value but now I just think I am crazy. I still see the same phantom script error so want to go back and deal with it once and for all. -
I am having problems matching the proper version needed for
plutus-debug
executable to work. -
Went back two commits behind since I realized that the mutations are flaky when it comes to checking snapshot signature and
utxoToCommit
field. Need to make these bullet-proof before continuing further. -
The problem was that I was accessing tx inputs by index in the increment validator! Now the increment tx is valid and all of the mutations. BUT there is still a problem in the e2e tests that I need to fix before continuing further. Tx fails on submission and the CEK machine error is not pointing to anything meaningful.
-
I'll figure out exactly which check in the validator is doing this by removing all of them and then re-enabling. Ok so it was unexpectedly the check that the head output needs to be head input + whatever we deposited! Getting the deposited value on-chain proved to be the problem so I put it in the redeemer (for now) and the check is working again.
-
A lot of problems on every step but we are moving forward at some pace I don't really like but that's the reality of it. After fixing these two major bugs I was not aware of I think we are in good position to move forward and try to reduce script size using the new deposit validator in aiken.
-
Continuing to debug and figure out why increment tx is causing weird plutus errors. I commented out the suspicious check where I compare txOutRefs and left alone all other checks.
-
I feel like this has something to do with the lower validity bounds for increment. What if the tx is valid but the time passes between tx going to the mempool and then getting out of the mempool. I see that tx was rejected in cardano-node logs but there is no reason.
-
As soon as I increased the contestation deadline in the test everything started working again since there was enough time to post increment before the deadline. Woot-woot!
-
Since rewriting the Head script to Aiken will take some sweet time I could reduce the size of increment tx by rewriting deposit in aiken. Since increment tx consumes the deposit input and needs to attach deposit script this should lower down the total tx size.
-
Ok, turns out working with Aiken is a breeze even if you never saw it before! The script size for deposit is now 571 bytes and it was 3096 before! Sadly this is still not enough so I'll try to use it as the input reference and see if I can gain any tx size benefits.
-
Trying to finish the aiken deposit validator I am having some problems translating the functions we have in haskell to hash the tx outputs (sha_256) and compare it to the commits we stored in a datum. A bit disappointed changes didn't lead final success but we need to keep reducing these scripts. Perhaps I should spend some time profiling the head script?
-
I got all checks for the increment validator implemented and covered with tests
-
Next problem is that the head script grew too big so I need to do some optimisations
MaxTxSizeUTxO 16554 16384
-
I tried using
asData
TH function to use the more efficient data encoding but that clashes withDuplicateRecordFields
and in the end I couldn't compile the code that easily so I gave up - perhaps I'll revisit -
Tried
Strict
pragma but that did basically nothing -
Looking into using our custom
ScriptContext
type for the deposit - how to go fromBuiltinData
to theInterval
type? -
Ok this makes the tx size even bigger for some reason:
MaxTxSizeUTxO 16861 16384
- What if I used deposit script as a reference input in the increment/recover transaction? let's give it a shot. Ok, this didn't help also.
MaxTxSizeUTxO 19697 16384
- Trying to use the same deposit reference but also use custom ScriptContext in the deposit validator.
MaxTxSizeUTxO 19963 16384
-
Wow! nice work...not
-
Ok, I decided to deal with this problem in the end when I alter the close validator too and then perhaps re-write the deposit validator to aiken or plutarch.
-
Continuing with the close validator/redeemer changes - Adding new constructors for increment in the on-chain code was a breeze. Slight problems with the off-chain code where we need to detect if we de/commited and then use the corresponding constructor.
-
e2e tests for incrementally committing now fail but not sure why. Increment fails and it should be ok since the changes I am doing affect only close tx.
-
Perhaps it's just the tx size? NO it is some weird plutus failure I am not even sure what happens since I don't see full error. I'll continue in the morning.
Golden file changes
- Learned that the looking at the diff of the golden files is not very useful! Better to know that they should change if the associated types changed.
Blueprint tx
- When committing, you must provide a transaction (which we call the
blueprintTx), which specifies how to spend the UTxO that has also
provided. The type is
CommitBlueprintTx
. - It's a "blueprint" because we modify it ( see
hydra-tx/src/Hydra/Tx/Commit.hs
) by adding the hydra-node signing key to the signers, adding in a requirement for it to spend the initial input - By setting attributes of the transaction, for example adding a script output, this would bring in additional logic into L2
- "PT" stands for "participation token"
-
Ok, trying to figure out why the added check that the deposit script is spent is not working
-
Now I am trying to trace on-chain representation since I don't see off-chain any differences between the two
TxOutRefs
-
Ok, I found out that calling
toBuiltinData
before comparing the two values did the trick! Question is why we can't compare the two on-chain without turning it intoBuiltinData
? -
By writing the corresponding mutation I discovered the change is actually wrong since I can use any input in the redeemer and tests are green. Wat?
-
I reverted the recent changes so I can start again. Perhaps comparing hashes should work out?
-
So comparing hashes didn't work so I'll just leave this check as the last thing to do since other ones should be easier.
-
Next check in line is the signature validation. How to get the correct
$U\alpha$ ? What we include in the head is contained within the deposit datum. I should decode the datum and then use the information inside to produce the correct hash. -
Correct, this is also what the spec says (though that might be expensive)
-
Again I find it hard to debug when the validator fails since what I see in code seems correct but not having some easer way to trace on-chain code is making my life miserable.
-
Ok, I found the problem. Test snapshot is just plain wrong and doesn't contain anything to commit.
-
Another issue is that sometimes I wasn't able to find the deposit input properly. I think it had to do with the
resolveInputsUTxO
but now I changed the code to not rely on the input ordering and try to find the deposit by address so everything works again. -
This was actually the problem all along! Even the redeemer check from yesterday is ok now! Oh man, I am happy now
-
I don't understand why altering the order of these two conditions affect the increment tx healthyness?
&& claimedDepositIsSpent
&& checkSnapshotSignature
-
After writing the initial mutation that should check the deadline I can't seem to influence the deadline in the increment tx in such way to trigger the validator (deposit deadline seems to be set in the past, it is a negative number).
-
Perhaps I need to make sure the deposit deadline has a meaningful value first in order to resolve this test.
-
Creating the UTCTime from day, month, seconds did the trick but the input datum is for some reason invalid - can't be deserialized.
-
Ok, this happened because I was using the head input instead of deposit one. Now the datum can be decoded properly.
-
Since we are re-using the contestation period for the deadline, both in the http server when setting the deadline and in the handlers when constructing the increment, these values clash now. I opted in to set the deposit deadline as double the amount of contestation period. Hopefully this is enough time to be able to post the increment, if not we can always have a custom cmd line argument.
-
Now the deadline related mutations are green and I also updated the incremental commits documentation PR to reflect these new changes.
-
Next in line is the check for the head id. In the head contract we have the previous and the next datum to compare the headId. I opted in to put the head id in the
Claim
redeemer instead so that I have some value to compare when spending the output. -
I got both mutations I wanted green and I added a fixme to address needed changes in the spec for the Claim redeemer
-
Continuing to tackle the increment redeemer. There is immediately some confusion with the redeemer fields, seems like we need to keep the
TxOutRef
of a deposit we're trying to spend? -
I added one fixme to tackle the spec wording where
n
is mentioned in the first check that parameters are unchanged -
Now I noticed another discrepancyActually...I tell a lie, it is the deposit input.фincrement
vs.фdeposit
where the latter one is not introduced properly. -
This next check is interesting
Claimed deposit is spent
𝜙increment = 𝜙deposit
-
So this would mean I need to find the deposit input to increment transaction and then compare it to the data I have in the increment redeemer (
TxOutRef
) -
I still see deposit input first in the list, is the tx rendering borked? I see
sortBy (compare
onfst) (txIns content)
so we sort than print which could be the cause of my confusion. -
I don't immediately see why the check that deposit script input is spent is not working.
-
I got the request from Trym to time how much time it would take to open a head, then wait for couple of days and then start-chain from the moment the head was opened so we cut down on the chain congestion.
-
My plan is to use faucet keys we have and open a head on mainnet. Then stop the node and give it some slot/block from couple of days before and time how much time it takes to re-sync.
-
Question is would my aws instance survive two cardano-nodes at the same time? Let's see.
-
Downloading mitrhil db snapshot for mainnet
-
Fond a command to look for specific string in the output, hopefully it work for timing since we need to see
HeadIsOpened
eventually
time until websocat ws://localhost:4001 | grep -m 1 "HeadIsOpened"; do : ; done
-
Perhaps I would need to enlarge my aws instance since mainnet db requires a lot of disk space
-
I got the cardano node to run on the mainnet with usual quirks related to conway-genesis file and it's hash in the node config
-
I got the head opened as well but need to connect using websocat in order to be able to measure the time to catch up
-
I am realizing I need to re-map the ports differently since this aws server already runs one hydra-node and one cardano-node on preview
-
Now that I can connect with websockat I stole a small script to be able to measure the time until we see
HeadIsOpen
. This is the block before the head opening https://cexplorer.io/block/9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee and this is the slot/block hash
, "--start-chain-from", "137342970.9df1b9f9c1357e9fc0960a079ab183fa99c669a3d493aca0af315b0d8f521eee"
- I'll try to test it out now with one day difference to see the timing
-
I am starting by looking at the spec - perhaps there are some areas that we can already unmark from the red color since we implemented them?
-
On the second thought we unmarked everything last time once we had the implementation so I'll wait on this one.
-
Wrote the initial increment mutation that should test that increment happens before the deadline.
-
Funny that the deposit input is the first one, I was expecting the head input to be always the first one so need to see what is happening.
-
There is missing tx upper bound for the increment it seems, we need to set it's validity so it becomes invalid if it is equal or greater than the deadline.
-
Interesting problem - how to come up with a meaningful value for the upper validity slot for increment? We could use the current slot and calculate upper bound like we do for contest. I think this is the correct solution, we don't need to depend on saving the deadline in the local state so that we can use it for increment.
- I've fixed the test I was working on yesterday (I keep forgetting we don't issue any network messages on commits)
-Perhaps we should introduce a network message to signal to other parties
that deposits/recovers/increments are happening? Not sure if there is anything
related in the spec
-
The spec doesn't mention any network messages on deposit so all good.
-
Anyway I think we should always try to snapshot whatever we have pending for deposits and the logic on
AckSn
will discard if something is not valid. -
Continuing to write similar tests as the one we had for decommits to gain better understanding and perhaps find some quirks in the head logic.
-
One thing I found which is different from decommits is that a node can observe increment transaction but not have the txid stored locally since it is not propagated by the network message. Perhaps the commit messages should be propagated over the network
-
Other thing is we need to try to snapshot any pending deposit on each ReqTx. Similar to decommits, if there is something pending it should go into the next snapshot.
-
The fact that genSnapshot is so random and hardly meets our coverage drives me to a different strategy: Generate plausible sequences of snapshots and only pick from them when generating Decrement/Close etc. actions.
-
Bringing back
ProduceSnapshots
action, should it contain a list of snapshots or only a single one and have multiple such actions? -
Considered to remove snapshots from Model now after use, but this would make the overall test suite very optimistic. For example, we would not keep any snapshots with old versions after a decrement and never test outdated snapshots are rejected.
-
Realized that I need to improve the precondition of NewSnapshot to ensure it remains a plausible sequence of snapshots.
-
Turns out we need to keep track of the state version and inferring from knownSnapshots is not good.
-
At some point a fanout was invalid with a hash mismatch on the delta. Printing the corresponding
realWorldModelUTxO
shows an interestingTxOut
:TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraConway) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "9a81d9534c3eed3e374a921fd06ead4a822018e6a0281ec28423ee67")) (StakeRefBase (ScriptHashObj (ScriptHash "7734a436653455cf24035622e1216c83676446d1088b47d589f81956"))))) (TxOutValueShelleyBased ShelleyBasedEraConway (MaryValue (Coin 10) (MultiAsset (fromList [])))) (TxOutDatumInline BabbageEraOnwardsConway (HashableScriptData "\EOT" (ScriptDataNumber 4))) ReferenceScriptNone
That is, it contains a stake credential! Could this be broken right now? Why don't we test with stake keys in other, more "unit" properties?
- I suspect that the fanout generator is also not generic enough?
- Could not confirm. Neither stake address nor datum was the issue.
-
For a fanout to have a mismatching hash, it could also be that the hash in the closed state is wrong. And inded, in the counter example we have a Close, Contest and then Fanout. Looking closer, the Contest was not resulting in a different delta UTxO hash!
-
The TxTraceSpec is always issuing contest transactions with ContestOutdated redeemers. Why?
-
Now shrinked this is a very intersting case:
do action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ NewSnapshot {newSnapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}} action $ Decrement {actor = Alice, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ Close {actor = Carol, snapshot = ModelSnapshot {version = 0, number = 4, snapshotUTxO = fromList [(A,10),(B,10),(C,10)], decommitUTxO = fromList []}} action $ Contest {actor = Bob, snapshot = ModelSnapshot {version = 0, number = 5, snapshotUTxO = fromList [(A,10),(B,10)], decommitUTxO = fromList [(C,10)]}} action $ Fanout {utxo = fromList [(A,10),(B,10)], deltaUTxO = fromList [(C,10)]} pure ()
Notably, the
Close
andContest
here are already using outdated snapshots (theDecrement
of version0
already happened). Obviously, the fanout with(C,10)
is not allowed here. This is a model error and we should not expect this to succeed. -
As we will make our genSnapshot generation more model-driven, we should be able to get rid of some complexity in precondition / validFailingAction.
-
When checking coverage again, I realize we are not hitting
fanout
very often. Especially the one without utxo inside, i.e. all utxo was decremented. -
Not reducing the values and just decommitting the whole A-C makes us reach fanout with an empty head more likely.
-
Confused by the fact that when I reduce frequency of generated Decrements to
0
, the coverage still reports "has decrements"?
-
There is a leftover from the incremental-commits off-chain work which specifies that we need to add more interesting tests for our new commits
-
As a first step I wanted to separate the existing decommit tests in the
BehaviorSpec
and do something similar for the commits. -
I realize pretty quickly that we send client inputs for decommits but not for commits so perhaps I need to move the tests someplace else?
-
I think I could just use the mocked chain layer to directly post a deposit, let's see
-
Oh, ofcourse, you can't post deposit, only increment and recover. Perhaps I can post increment directly but that requires a lot of parameters and I am not too happy with this level of testing.
-
I implemented couple of missing validations for the new api endpoints
-
Added a test that on recover we check the head id is correct
-
The deposit deadline is not properly configured so I wanted to re-use the contestation deadline for the same purpose (and see later if users would like to configure it differently)
-
To do this I had to pass the environment to the http server
-
Now the recover tests are failing because of the deadline check which is now set further in the future depending on contestation deadline. They are fixed just by waiting for sufficient period of time to pass aka contestation deadline
-
Now I would like to actually record the deposit deadline and upon each recover warn the users beforehand that recover would fail unless we wait for deposit deadline to pass.
-
I am realizing that these changes conflict with the PR I drafted yesterday to remove some local state and now I need to actually re-add the deadline.
-
The test I am writing is supposed to prove you can do multiple deposits but the issue is, they actually go one by one at the time since snapshots are sequential. Is it safe to increment multiple at once if the state contains multiple UTxO?
-
Since increment is spending only one deposit output this needs to be sequential for now
-
When I try to do two sort of simultaniously I get
ReqSnCommitNotSettled
error which I need to investigate
-
I was looking through code trying to see how to use the deposit deadline in the best way to avoid stuck Head in case when recover spends the output we want to snapshot.
-
What I wanted to do when it comes to deposit/recover actions is to make sure the deadline is correctly set (we can probably define it in terms of contestation deadline + some buffer) and then it seems like the most appropriate place to check the deadline would be when we receive the client input.
-
we could check that in case of any pending deposit we want to recover, it's deadline has passed so that it is safe to do recover. Perhaps the good thing to add is to check that there is no snapshot in flight too and if possible compare the information in the snapshot to make sure the same utxo we want to recover is not in there.
-
This is a bit hand wavy but what we could do is attach a timestamp to each client input (requires a separate type to wrap
ClientInput
and add time information). Then in the head logic, when we receive the client input with a timestamp we could check if recover input time is larger than the set deadline and only then try to recover. -
What I don't like is the fact that we have to deal with time in the pure
HeadLogic
but at the same time we would just compare the time we already would receive together with the client input. -
What are the alternatives to this? I don't see many unless we make recover to be received just by HTTP call and keep the deadline around somehow so we can reject the wrong recovers immediately.
-
After drafting a nice test yesterday with SN I added on top the piece of code I already had to make everything green
-
PR review suggested some code changes which are nice so I did those but there weren't any major objections into merging this
-
I sent the docker image to Micrograx to test it out, hopefully it all works out so we can move on.
-
I continued working on this today, progressing to remove any local UTxO state we actually don't need.
-
Now it is time to use the
spendableUTxO
and find the correct UTxO to spend in increment/recover in there. -
Seems like using the
isScriptOutput
is the easiest way to track down the UTxO I need to spend -
After making everything to compile I don't see e2e tests green as I suspected. I think last time I looked at this I suspected that
adjustUTxO
is doing a bad job when it comes to these new transactions since they are related to L1 so we need to preserve UTxO as they were without adjusting. I'll test my hypothesis real quick. -
I was wrong about this, both increment and recover can find the deposit output but recover fails to post with
D05
- incorrect deposit hash, so debugging this rn. -
In the end it was just a condition in
HeadLogic
that was wrong, everything works now so it is time to wrap things up and make all tests green again. -
One note: I need to think about the deadline since we should incorporate this information we have to perhaps reject recovers that are bound to fail because of the deadline.
-
This is a follow up from the work on the incremental commits off-chain.
-
I kept a local map with the deposit UTxO I need to spend in the recover but this should not be needed since we should have all observed UTxO in the chain state already.
-
Looking at this user issue again. I noticed the rewarding redeemers are missing so I added them to the commit tx and I also see them after the balancing so this change should be correct but the error remains the same one I saw before
WithdrawalsNotInRewardsCERTS
-
If I just try to naively submit the user transaction the error is the same so perhaps the user transaction is invalid (it is missing the outputs anyway) but I am not sure if no outputs are causing this.
-
I also created cardano-cli test using the same commands as user did in the issue description to see if it leads to the same result and it acually did.
-
Looking at the ledger code it seems that there is a check to make sure any rewarding is matched with a proper certificate. The user transaction is missing certificates so I could perhaps try to add them programmaticaly and see if that helps?
-
The proper thing to do is to construct a valid tx that does rewarding and make sure these tests are green.
-
I am pretty sure on hydra end everything should be ok once I add the non-spending redeemers but need to confirm with the user if what they are sending is actually ok and should work since I can't submit a user tx without any error.
-
We looked at pr this morning together with the team and decided to merge it (yay!) and identify next steps in order to get it even more in shape
-
I discovered some weird stuff in the api schema so had to fix that and also add a changelog entry to remind us not to release this version by any means since it contains security holes big as... well major flaws with the validators returning true.
-
Now I can move on with my life and have future small PR to fix all I wanted to fix in there.
-
I finalized incremental commits off-chain branch and need to look at the user reported bug where our blueprint transaction doesn't allow withdraw redeemers.
-
I have the request json, script used to construct the withdraw script output so I'll start by first inspecting the request UTxO
-
There was already a test I drafted in a branch so looking to improve this test to detect possible bug we have
-
Seems like mocking the entire thing is time consuming, I'll just slap around quick e2e test since I have the request body available.
-
Ok so this is the error I see when using the user request:
SubmitTxValidationError (TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraConway (ApplyTxError (ConwayUtxowFailure (MissingRedeemers [(ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}}),ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")]) :| [ConwayUtxowFailure (UtxoFailure (UtxosFailure (CollectErrors [NoRedeemer (ConwayRewarding (AsItem {unAsItem = RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")}})),BadTranslation (BabbageContextError (AlonzoContextError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})))))]))),ConwayUtxowFailure (UtxoFailure (ValueNotConservedUTxO (MaryValue (Coin 21466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))) (MaryValue (Coin 23466323) (MultiAsset (fromList [(PolicyID {policyID = ScriptHash "8dc5d80bb963464a696850684f783ea8b7dacd1cea9b38f1425b7716"},fromList [("f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d",1)])]))))),ConwayUtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash "0b95bec9255a426435e6aa7ff9749d76ee4c6d815acb47f10142db889b0edce4"}) (TxIx {unTxIx = 3})]))),ConwayCertsFailure (WithdrawalsNotInRewardsCERTS (fromList [(RewardAccount {raNetwork = Testnet, raCredential = ScriptHashObj (ScriptHash "39c520d0627aafa728f7e4dd10142b77c257813c36f57e2cb88f72a5")},Coin 0)]))]))))
notably WithdrawalsNotInRewardsCERTS
is pointing me that this could be
correct reproduction even if utxo is perhaps invalid.
-
Consulting the ledger code I see that this error means Withdrawals that are missing or do not withdrawal the entire amount so hopefully once I fix the missing withdrawal redeemer I can say the bug is fixed.
-
I need to figure out where is the problem - is it in the commit tx building or in our wallet while balancing.
-
Ok seems like the problem is in the commit tx building where we try to re-map spending redeemers since we want to add one (initial) input.
-
Seems like I could just record whatever rewarding redeemers where there and add them on top after the spending redeemers are re-mapped.
-
It worked in the sense I can get passed the error I saw above and now I get the
ScriptFailedInWallet
which probably means that the wallet code is missing to properly adjust the reward redeemers too. -
Nice findings - I would like to have a general function that re-maps all of the redeemers so we are covered for any case there can be in the future.
-
Did some code review to kick off the day
-
Majority of work is done already and now it is the time to cleanup and make all of the tests green.
-
It turns out all tests are green a part from the logging/api schema validation - the most annoying tests to make pass
-
I realized that I am missing to finish up the tutorial also so this is something to do first and then fix the tests
-
Of course I found a bug by doing a tutorial! We need to do this more often. I discovered that we always serve past deposits in a list since the projection is never deleting any. Fixing this and adding a test to make sure it works.
-
Now all that is left is to tackle the schema errors it seems since the tutorial is updated.
-
Yesterday I created the test case that will drive the implementation of blueprint tx in the deposit construction.
-
I noticed the
hydra-tx
is not compiling because of some changes in the observations so I'll make that compile somehow. Not sure if we even want this executable since it is not serving it's purpose really. If you managed to draft deposit/recover using this executable then the tx is not balanced so the UX is not good at all. -
We looked at some minor problems with Noon to solve the new endpoint that lists the deposits so that thing is done (still missing to document it in our api schema)
-
Wrote something about incremental commits in the monthly report
-
Continuing with the blueprint support for the incremental commits - I should look at how I did commit before and do similar stuff in here too.
-
Adding the blueprint support for deposits was done smoothly, main problems were finding the types between cardnano-api and ledger-api but overall nice - the tests are green.
-
Looking at the PR comments and it seems the biggest part is to change
TxIn
toTxId
for recovering so maybe I get that out of the way before starting the final cleanup -
Ok, this proved to be easy enough. I was wondering if it is safe to assume that any deposit
TxId
can be turned intoTxIn
by always using the zero index? -
Now for the cleanup
-
Before diving in to make the recover work with head logic instead of drafting directly from the api code I wanted to fix the observation tests first now that we think our observations are correct.
-
This made me need to return more information when it comes to increment transaction since the head UTxO is not enough to observe this transaction.
-
Now it is the question if I should make all the tests green again or do that after recover is implemented differently. Since nobody will look into this soonish I'll just make the recover changes first and then make all things green again since I don't want to do any duplicated work.
-
I am not sure why I though this step would have a lot of code changes. I propagated all the changes and now see again
D04
which is the deadline validator check. -
I set the recover deadline to whatever was set as a deadline in deposit with a caveat that I do convert to UTCTime using
posixToUTCTime
so perhaps the conversion is wrong or I just need to wait in the e2e test? -
It would also be nice if the deadline is expressed in slots since that way I could emit errors even from the
HeadLogic
. -
Let me summarize the situation right now: deposit deadline is set in the api code (users do not influence it at all), recover deadline is set to whatever we already have in the state as our deposit deadline for certain
TxIn
(we keep the map ofTxIn
containing additional data), so obviously recover deadline is the same as the deposit deadline so the validator check is triggered wrong. To solve this I could pass either expect the users to give me a deadline (not very good UX), I could create new deadline as soon as the recover endpoint is hit (more feasible), I could also look at the existing deadline and make sure it is increased in recover (can lead to tx rejection by the cardano-node if the slot is in the future) -
If I just add some time to the saved deposit deadline the test passes so I realize that I need a way of providing the recover deadline and my plan is to take the currentSlot directly from
HeadLogic
upon receiving the recover request and then convert the slot to UTCTime in the handlers upon constructing the recover transaction -
This seems like a correct thing to do here - construct a deadline from the current slot.
-
Ok, the test is green, moving on.
-
I need another endpoint to list the pending deposits - this is very easy to implement now that we record the pending deposits in the local state. I could draft a test and have one of the guys implement this to help me out while I continue with the rest of the changes needed for the off-chain part of the commits (like suport blueprint transactions)
-
Adding support for the blueprint committing feels like important and also most complex part of the work that is left.
-
Ok, so today I'll get the observations in place correctly before I move on to fixing the recover workflow so that it goes through the Head logic completely instead of drafting the recover directly from the api code.
-
Removing the
depositScriptUTxO
fromIncrementTx
. I find it a bit hard to envision what exactly do I need in order to build theincrement
transaction successfully. If I don't have the correct UTxO for deposit script then I need to find it in the local UTxO state (since this is where UTxO ends up if it's being observed) but then I need to pick from this deposit UTxO (potentially many outputs) the one I am actually trying to deposit. -
Feels natural to include
TxIn
in theDepositObservation
so that I know exactly which deposit I observed so that I can use the sameTxIn
to find the proper UTxO to increment. -
I think I see the whole workflow in my head but it will take some sweet compilation time to get to a place where everything compiles again and works. But on the other hand I can say that the deposits/increments are done correctly then and I can focus on fixing the recover workflow. Let's dig in!
-
Lost a lot of time looking into why deposit is not observed after my changes only to realize I removed the constructor from
observeHeadTx
🤦 -
The condition to make sure deposit datum contains correct information was wrong. The observations tests didn't seem to care and they were green but the e2e tests proved we are not observing the deposit tx correctly.
-
My focus today is just bad. If all observations and e2e work I will not go into digging why did these changes make sense since I need to move faster.
-
I decided to keep
depositScriptUTxO
in the deposit observation since I didn't find the convenient way to find the deposit script UTxO to spend in the local UTxO. It seems a bit weird since If I observed the deposit this means that it should be part ofspendableUTxO
no? -
Instead I propagate the needed values (so deadline, depositTxIn etc) so that
OnDepositTx
can bring them in to theHeadLogic
. -
Now I needed to extend the
IsTx
typeclass with the new associated typeTxInType
since I want to keepMap TxIn (UTxO, UTxO, UTCTime)
locally where first utxo is the one you wanted to deposit, second one is the script utxo that is to be spent in the increment and the map key is there so we can have multiple deposits around. Deadline is just the deposit deadline, we might need that one as well if nothing else just to show the information when it is ok to recover the funds. -
Ok, after tackling all related code upon adding this new associated type I am moving through the head logic with the new �
PendingDeposit
alias that should hold all information we need to do increment/recover later on. -
I propagated changes through the head logic and now would like to use
TxIn
to find the deposits for increment/recover -
I also propagated
TxIn
inIncrement
andOnIncrementTx
to be able do easier find the deposit I want to remove. -
Next step is to start with the api changes in recover (do not call handler code from the api but go though the head logic) and that will reveal easier way to delete recovered deposits by using the new available
TxIn
.
- When handling the error of
etcdctl put
the question arises: what ifbroadcast
fails? How should theHeadLogic
react on this? - Not allow
broadcast
to fail and buffer not succesfully sent things instead. - For some reason
withProcessTerm
now correctly stopsetcd
.. well, fine. - Difference between resilience against minority cluster writes and crash-recovery is really minimal. i.e.
TBQueue
vs. a "persistent queue" - Storing the last known revision in a file seems to be sufficient to avoid re-delivering already seen messages. But do we need this? Our application should be fairly resilient against duplication of messages. On the other hand, writing a file on a best-effort basis should not be too much of a problem (this is not write ahead!)
- With these latest changes temporary package losses of up to
100
percent are handled just fine! The only reason why benchmarks would stop is because of the timeout of confirmations (20 seconds), but if the package losses are intermittent it's just fine - Using pumba, we can also randomly select nodes which perceive a
100
percent package loss every couple of seconds.
-
Continuing the work after yesterday's successful demo
-
The problem I had yesterday with recovers not working is related to the fact that you can't recover from any node (currently). Only the node that crafted the original deposit has in it's state the needed utxo that needs to match with the recover.
-
There are many changes that need my attention but I'll focus on the observations first
-
I moved the recover observation to hydra-tx package and removed deposit and recover from tests that make sure head transactions are observed. Now they have their dedicated tests
-
I got the recover observations working but experience
ScriptErrorRedeemerPointsToUnknownScriptHash
for index 0 which is unexpected here -
Turns out this is just a consequence of
D04
which is the script deadline error -
I made sure to resolve fixme in the deposit observation which mentions that we need to check that the deposit datum contains exactly the deposit transaction inputs
-
Removed
utxo
field from theDepositObservation
since it is not needed (deposit TxIn will be convenient here tho) and propagated changes -
This broke the increment creation since we used this deposit scipt utxo to easily find the deposit we want to spend. Should be simple to fix this.
-
Next work will be probably to track multiple observed deposits in the local state and then just pick one existing one on recover
-
We got the new user issue where they say that the withdraw redeemers are stripped once the wallet balances the transaction.
-
I want to write a test to exercise this theory and I think I could just use the
WalletSpec
to create a transaction with some withdraw redeemers, then callcoverFee
and then take a look at the redeemers and if they stayed the same. -
Upon balancing we adjust the spending redeemers but the others should be intact.
-
I stubbed out the initial test case and noticed that the tx generator doesn't actually produce any redeemers so this is the next step to do.
-
Main takeaways:
- changelog update missing
- inline api docs on post draft commit
- Deposit should work with a blueprint tx!
-
I continued working on the branch: simplifying the api code to re-use existing types we had in place and making the recover use the path instead of query string
-
Wrapping things up so we can have a review soon.
-
I also want to try to decommit from the tui today with our fresh Head
-
Need to address user issue and try to reproduce the issue mentioned with cardano-cli (about the datum hash mismatch)
-
I had a problem where I couldn't ssh into my aws account and it turns out if you have too many ssh keys this sometimes happens. As a workaround I had to provide additional flag
-o "IdentitiesOnly yes"
-
Continuing to fix the red tests - it seems like the logging tests are especially a PITA since it is a bit hard to figure out why we get failures sometimes.
-
We got another user issue related to our blueprint transctions which mentiones that if you try to commit to a Head using blueprint transaction the withdraw redeemers are dropped from the transaction.
-
Need to read upon how withdraw zero trick exactly works but so far I know that you can reduce execution costs if you specify the script purpose to be rewarding instead of spending.
-
I'll run the demo just so I can get to an open Head and try to use the request body in the issue to reproduce the issue on my end.
-
Yesterday evening after work I realized what could be the problem with the e2e test for the recover. As I mention we try to balance the tx twice, I should only return the transaction from the handlers and then make the hydra-node post in on client input.
-
Ok, I was right and now I got the recover e2e green! Yee-haw!
-
I'll include the recover workflow also as part of this pr which makes me happy.
-
Question arizes: should I also observe the recover transaction and then have a way for a cleanup in the local hydra state? I think so since we can't rely to do a cleanup on posting only.
-
For sure, even our protocol.md diagram says so.
-
I'll use the opportunity to clean up unused variables etc. while I add the recover observation.
-
One thing is: how to get the head id from recover transaction? Currently there seems to be no way which means we would need to check the recovered UTxO against whatever we have in the state related to deposits and if the two UTxO match then we could say recover is approved/finalized.
-
I ended up looking for head id in the resolved recover inputs, in the deposit datum and this worked nicely.
-
Tests are green and now it is time for a major cleanup.
-
I almost came to the end of the recover implementation but got
uncaught exception: JSONException
which was surprising. -
Debugging a bit to see where this happens and it seems sometimes on the api level we get this error but I don't see why exactly would this be happening.
-
Realizing I've changed what I return from the recover endpoint - it is no longer the transaction but just the string
OK
. Now the recover transaction is posted by the hydra-node and not the user in the end. -
After making sure all pieces are in the correct place so that I can recover a deposit I get
ErrNoFuelUTxOFound
error when trying to post recover transaction. -
Very weird since my wallet UTxO has plenty of ada.
-
I found out that this is happening in
mkChange
function that is supposed to calculate the change and error out in case inputs are lower or equal to the outputs but there is a fat fixme in there that says we need to account for minimum UTxO value too. -
Realizing this is just the outcome of some other problem I have (most probably in the Handlers when we construct and balance the recover transaction)
-
Recover is trying to spend the deposit output and produce as outputs whatever was deposited (outputs present in the deposit datum)
-
I need to think more deeply about the recover tx and how does it play with the internal wallet since I need to check if all inputs are properly resolved and how come there is
ErrNotEnoughFunds
error coming out of the wallet -
It is time to gain some sanity again I think. One question is how come I didn't see this error before when working with the recover transaction? The answer might be that I prevously passed in the UTxO user wants to commit to the
draftRecoverTx
and now I rely on the datum in the deposit output. -
In the wallet code it seems that the input balance is lower than the output balance by around 6 ada:
> 24687860 - 24681788
6,072.0000000000
What am I not accounting for here?
-
Could it be that the UTxO's are duplicated and I need to remove the deposit and commit UTxO from spendable UTxO before giving the tx to the wallet for balancing?
-
Good thing is that the amount seems to be always the same!
-
I tried several things and I think the bottom line is that we are balancing the recover transaction and then post it which should be doing the balance all over again. In my mind this operation should be idepontent but perhaps it is not?
-
If I just try to submit the recover transaction directly after assembling it I don't see any errors but I fail to query the wallet address to make sure the recover worked.
-
I really feel under the weather right now so I'll leave dealing with recovers for tomorrow or next time while I try to make the rest of the codebase into some revievable state for the team to look at.
-
We did a nice session this morning to see the current state of the PR and discover the next steps
-
These are the steps I wanted to outline that would be needed for the users to test out this new feature:
### Prepare the UTxO to deposit
- How to make the deadline configurable for our internal purposes like testing?
### How to issue a http request to kick off the deposit process
- Can we copy/paste the json utxo representation directly from one of the explorers?
### Sign and submit the deposit transaction
- cardano-cli example most probably
### Query the Head to see the deposited UTxO present on L2
- Issue `GetUTxO` call to view the L2 UTxO state
## How to recover deposited funds
### Follow the deposit steps
- How to demonstrate recover if hydra-node will pick up the deposit as soon as it is observed?
- Play around with the network so one participant doesn't see the new snapshot request?
### Once we make sure deposit is not observed we can issue a recover request
- How to make sure user keeps the committed UTxO representation around so that they are able to
issue the api request with the same json?
- They also need to provide the txin which complicates things? How could we make it more simple, could they query the hydra-node
to get this information?
-
Plan right now is to get the PR into shape as much as possible and outline in the PR what is outstanding which will need to be implemented as series of next PRs.
-
I have a good understanding where the main problems are where are we at currently so this should not be an issue at all.
-
As a concrete next step I'll try to make the recover e2e test pass by shutting down one of the nodes to that the increment is not posted so that I can exercise the recover workflow nicely (right now the recover tx get's rejected by the cardano-node because of the lower validity bound).
-
If I don't get it to work in a reasonable time I will leave the complete recover feature to be something we think about and implement as one of the next steps since in the end we do need to think more about the recover and the optimal way of doing it.
-
Even before any of this I should not expect the users to provide the deposit script UTxO (actually any UTxO since we should be able to have it around in our chain state) and take in
TxId
instead ofTxIn
at the recover endpoint. Also use the path instead of query string (just remembering all of the stuff we talked about this morning). -
I removed the undeeded UTxOs from the recover endpoint request and altered the e2e test to spin up two nodes, observe Head opened and commit approved and then shut down one of the nodes in order to be able to do recover instead of observing the increment.
-
Looking at the recover code - so you need to know the deadline to be able to construct the deposit datum so that one can spend the deposit output but how can you know the deadline if that is something hydra-node decides on?
-
The e2e recover tests are failing with the validator error related to invalid commit hash so currently debugging the UTxOs to see where is the discrepancy (commit utxo I am sending to hydra-node api and the local utxo we get from observing head transactions)
-
Realizing I just need to decode the datum of the deposit script output in order to get all parameters I need for the recover.
-
I looked at the increment observation tests and it turns out they are flaky. So not all of them fail but sometimes we experience some weird behavior and I decided to leave this problem for later on since the e2e is green and this should be a problem somewhere in how we wire things together (increment depends on deposit output also)
-
The most relevant next chunk of work should be connecting the recover endpoint to the handlers.
-
I slowly progressed one thing following another to make the e2e test that will prove that recover transaction worked.
-
This took some time - I decided to aslo encode the query string (
TxIn
) as base 16 string since the hash in the rendered transaction input doesn't play nice with query strings. -
Now I finally get the
unknown input
error on the recover transaction so progress! -
After making sure I use the correct
UTxO
for recover request (utxoFromTx depositTx
) I am gettingErrScriptExecutionFailed
D04
which is related to deposit deadline. Yeee-haw our validator is kicking in. I'll make sure to pass in the proper deadline. How to make the users life easier in terms of these recover inputs they need to provide? -
I commented out the validator check for the dealine to see if the locked commits are matching and I'll worry later on about the deadline. Oh nice...I get the
D05
so the locked commits are also not valid. Debug time! -
Looking at the issue again I realize users don't need to provide the deadline at all so removing the field from the request type.
-
If I just remove true from the recover validator I am getting the
NoFuelUTxOFound
error. I think we modeled the recover as if we will just run it as a standalone cli tool but to have hydra-node wallet balance these transactions we need more. What I think will work is to provide the UTxO which is locked at the deposit script so that we can satisfy the validator. Then we would also need the deposit script UTxO since we need to spend it so it must be given to the hydra-node wallet. -
Success. Validator doesn't run currently but at least our internal wallet is happy. Now turning on the validator checks so that I can construct a proper valid request and see the recover working. Feeling pretty close to success.
-
Ok, adding the appropriate slot lower bound made the deadline check pass. I need to make sure these values make sense.
-
The check for locked outputs is naturally ok once I've provided the correct UTxO to the recover tx so that's satisfying.
-
As the final step for this test let's make sure the deposited UTxO is returned back to the user and is not in the Head.
-
For some reason the test is hanging - can the user actually submit the tx node created?
-
I added the recover client input since I we need the hydra-node signature on the recover tx so it is easier if the node just does it.
-
Adding the
RecoverApproved
message so users can have some sort of insight in what is happening inside of Head. -
I feel burned out a bit, there are a LOT of changes needed.
-
After fixing the
Authenticate
network module to not drop messages, I ran into an issue whereoffline
mode would not open the head when configured with other--hydra-verification-key
parties. -
Fixing manually run
offline
mode nodes to communicate- Need to simulate other parties'
OnCommitTx
- Still seeing
MessageDropped
? - Disabled signature verification for now
- Sending the
ReqSn
now seems not to arrive on the other end. Use distinct keys per node (again). - Seeing
InvalidMultisignature
now:
BTW: at first the
InvalidMultisignature
content was tripping me now because it only contains a singlevkeys
when one signature fails- Each node is seeing the other node's snapshot signature in
AckSn
as invalid. I suspect something's wrong with offline-mode. - Re-generating keys did not help.
- Found it: the
headId
in the snapshot isoffline-alice
andoffline-bob
so we can't have networked offline heads because of that.
- Need to simulate other parties'
-
Side note: Is
cabal bench hydra-cluster
single mode broken? -
Lots of our tests depend on
PeerConnected
messages. How would we deal with "connectivity" information in case ofetcd
? It's not really that we are connected to node x or y, but rather whether we are connected to the majority or not. -
Etcd config
initial-cluster
requires a consistent list of peers across all instances. Right now I derive it from the given--host
,--port
and--peer
cli options, but this is very fragile. -
Problem: submitting messages is not possible while we are not connected to the majority or leader election is ongoing?
-
After giving
etcd
a bit time in the beginning the benchmarks work reliably. However, since I switched towatch
it got slower? Maybe using the command line parsing is not the fastest way to do this (and it depends on the internal loop ofetcdctl watch
).
-
Continuing where I left off yesterday - see how to fix the increment tx posting after changes in the deposit observation.
-
It seems like I need to re-gain some sanity here so let's break down things.
-
I made changes to the deposit observation so now what we return from the observation is not just the deposit UTxO but also what we locked inside of the deposit script so the code around observing deposits/ posting the increment needed to change to accomodate for these changes.
-
Now that e2e is green again and I also added the
GetUTxO
call in the end to assert deposited funds ended up in the head on L2 it is time to consolidate and try to get somebody involved with this work since there are too many things in scope and we want to get this work merged asap. -
One area in which I need to grow is to learn to ask for help! Right now I think the fastest thing to do is keep coding since I know I will lose time explaining what needs to be done but at the same time this puts the pressure on me to deliver all in time.
-
So what is outstanding is to implement recover e2e, implement api endpoint to list pending deposits, remove increment parts (observation mainly) but leave the transaction construction since we want to keep uninplemented validators to return False if we want to merge this thing. Also write a tutorial explaining the steps to deposit/recover.
-
I've looked at the increment observation tests and it seems like we will need to find a way to alter these tests so that we can return some UTxO that is working nicely with increment (needs to contain both head and deposit scripts). Main thing I see is we can't find the deposit datum in the observed UTxO
- After changing the deposit observation (reminder: still need to implement increment observation tests ) the e2e test is broken now since what I return from the observation is input UTxO locked in the deposit script not the actual UTxO we saw coming in from L1 so there is some work needed in order to get this right but nothing too hard to do.
-
As I realized before the observations are not done correctly confirmed by SN. They should be pretty similar to the commit ones since they are even using the same type
Commit
. -
I've made sure to implement the deposit observation in the same manner as commit one (since they are pretty much the same) but didn't have the strength to continue further (too tired) so I'll make it work on monday.
-
After adding the increment mutation I want to make sure that the increment tx is valid as a starting point.
-
Did a small change to provide the correct UTxO as the argument to
incrementTx
. What I need here is the output from a deposit tx. Right now I get:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left (ScriptErrorEvaluationFailed (CekError An error has occurred:
The machine terminated because of an error, either from a built-in function or from an explicit use of 'error'.) ["PT5"])),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorMissingScript (ScriptWitnessIndexTxIn 1) (ResolvablePointers ShelleyBasedEraConway (fromList [(ConwaySpending (AsIx {unAsIx = 0}),(ConwaySpending (AsItem {unAsItem = TxIn (TxId {unTxId = SafeHash "913afe731ffea84a50b0b
so it seems like I didn't attach correctly a deposit script appropriately (index 1 here) and the head script (index 0 here) failed with PT5 (which is expected since validator returns false).
-
After returning
True
from a increment validator I can see that the head script got attached properly while the deposit is still missing (need to look at the deposit datum which is missing) -
Found out this was because of two issues:
- I used
mkScriptReference
and we are not referencing the deposit script so it should bemkScriptWitness
- I was attaching the headScript instead of deposit script derp
- I used
-
I quickly added a test to make sure we don't observe deposits from a different head
-
Now I am able to connect the deposit snapshot signing and posting of the increment tx as a result of it. Posting from head logic and connecting the handlers to be able to actually post the increment tx.
-
Wrote some Handler code to be able to post increment on deposit observation. Also added a type for observing the increment. We need this so that we can say the commit is finalized once the increment is observed.
-
I can try the e2e workflow now to make sure everything works up to observing the increment. It seems that deposit was not observed back after hitting the api so I investigate.
-
Turns out that after balancing we end up with a deposit that has two inputs and two outputs. One of these belongs to the hydra-node wallet as it's the one responsible to pay the fees to we need it's input and also we end up with the output that holds the change from balancing the deposit tx.
-
Debugging to see why we don't observe back the deposit. So it turns out we need a
utxoFromTx
on a deposit tx observation in order to be able to find the correct script output. -
What I notice is that we never issue AckSn but that is not so important right now.
-
Figured it out: I shouldn't include the the deposit tx in
localTxs
onReqSn
. This got the head stuck by waiting to apply the deposit transaction which is never applicable. (perhaps I am misunderstanding something since my explanation is fragile) -
We get to the next error which happens on posting the actual increment:
ConwayUtxowFailure (NotAllowedSupplementalDatums (fromList [SafeHash \"3135e9c31e91873b8b3a7d2922038e934785abd68401589bc67f40591af137c3\"
-
This happened because the increment tx construction was wrong and I should have used
InlineScriptDatum
when constructing the deposit witness. -
Progressing further since now I get to see the
CommitApproved
message and all that is left for the e2e to work is the increment observation! Nice, I am happy things are progressing nice and fast although there is really a lot of changes needed. -
Adding the increment observation now.
-
After needed changes I get to observe increment transaction but the committedTxOuts are not matching with the local deposit utxo. This should be easy to fix.
-
I've decided to return the UTxO directly from the observaton and compare it to the local one.
-
BAM! e2e is green now!
- Need to change semantics of
broadcast
as it should only "deliver" once the majority of the network saw a message. Going to give the callback a name too. - While
NetworkCallback{deliver}
makes things more consistent with literature and easier to read in the components, its also more verbose. Need to ask the team about this trade-off. - When disabling the
broadcast
shortcut, theNodeSpec
tests failed first, but only because it was asserting on additional message being submitted to ourselves. The mocked, recording network is not short-cutting and does not need to, so simply adding some more received messages and asserting less is fine here. - The
BehaviorSpec
tests were eating the whole memory until I removed the filter for own party in thebroadcast
implementation (to follow the expectation). Probably some busy loop wait for a message which never came. - After fixing the
ModelSpec
to also broadcast to self (increateMockNetwork
), I wondered.. which test is actually covering that short-cutting (or non-existence of it)? - Turns out only
hydra-cluster
end-to-end tests are testing this integration. Anyhow, making the lowest level network component (Hydra.Network.Ouroboros
) self-deliver after enqueuing a fire-forgt should resolve the previous behavior. - Not seeing the network layer operational right now (hard to tell through only
hydra-cluster
tests). I suspect it's because the network stack is filtering when seing a message delivered at the lowest layer.
-
Ok so after adding a field to the
Snapshot
that would hold the information on what will be committed to a Head it is time to move further and make the HeadLogic correct upon observing a deposit transaction. -
Looking at the issue it seems that deposit observation was a separate task but I did it already so I updated the issue.
-
Added some code to wait on unresolved decommit before issuing a
ReqSn
on deposit observation. -
Looking into the spec to figure out if there is any validation needed before including a depositted UTxO into a Head. Seems like there is no validation necessary which makes sense. If you managed to lock a utxo with a deposit tranasaction then this UTxO can't be spent twice so it is safe to include it in the Head UTxO.
-
I added check upon observing
ReqSn
where I need to make sure there is no other de/commit in flight before including the deposit UTxO into an active UTxO set and asking for a snapshot signature. -
Also added the
commitUTxO
to the localCoordinatedHeadState
to keep track of it. -
Seems like the time has come to actually construct the increment transaction from the observed deposit (and recover also). Perhaps it is better to make sure all tests are green first and implement recover workflow. The increment transaction is most interesting work but we need to build everything else around it since this would be the last step in this PR.
-
Made some logging tests green and now looking into failing deposit observations in hydra-chain-observer and hydra-node. Basic observation seems to work but plugging in the deposit generation + observation seems to fail.
-
Fixing this required me to pass in also the
utxoFromTx
so that the deposit can be observed since head UTxO doesn't contain the necessary datum as these transactions are not tied to the Head ones. -
Making almost all green (a part from one test in HTTPServer) paved the way directly into increment transaction construction and observation (I could do the observation in the separate PR).
-
Wondering if we merge this (incoplete) branch to master are we harming the regular hydra-node operation in any way? I'd assume not since users should only rely on the released versions anyway.
-
Detecting a place in code in the
HeadLogic
where the posting of the increment transaction should be triggered - most probably on snapshot ackgnowledgement. -
I was right, we can plug the increment posting easily on
AckSn
. Let's build the transaction first and then think about the plugNplay :) -
Doing this soon I realize we will need to add the Deposit to the script registry. BUT on the other hand I have the deposit UTxO so I can find the deposit input.
-
I tried running the end-to-end test for the incremental commits and get
PPViewHashesDoNotMath
which is surprising. -
Seems like our wallet code is assuming we are always having scripts in the Head transactions but deposit is different. It produces the script output but it doesn't run any scripts.
-
Tracing some code in the Hydra wallet module to figure out why we can't have a transactions without any scripts.
-
I did a conditional attaching of script integrity hashes with
scriptIntegrityHashTxL
and notice we also dorecomputeIntegrityHash
after a tx construction again. -
In case this hash is
SNothing
I just return already constructed tx body so hopefully if there are no scripts the wallet would still be able to construct the correct transaction. -
If I just remove
recomputeIntegrityHash
I got success on the deposit but the question is if we actually need this function for other transactions. Why did we need this in the first place? -
Decommit works just fine without
recomputeIntegrityHash
let's see about the other transactions but they should also be fine I think. -
So it turns out that
recomputeIntegrityHash
was a leftover that needed to be removed and all of the sudden the deposit tx is valid again. Nice! -
Moving on to the observation code now.
-
Writing simple observation didn't seem hard at all. Let; see if it actually works.
-
It worked after some tweaks - we look at the input UTxO and don't try to resolve the inputs as we are only interested if we can decode the datum to the correct one. Should we look at the deadline here too?
-
Moving into the changes in the HeadLogic now. I need to add some code to request a snapshot once a deposit tx is observed.
-
I stubbed out a function to be called on observed Deposit, added new field to hold commit utxo data in the
ReqSn
-
Added
utxoToCommit
to theSnapshot
andalphaUTxOHash
to theClose
datum. This required me to build a lot of modules. Tired now, continuing tomorrow.
-
Moving forward, I got the hydra-tx changes so I want to see the deposit end-to-end workflow working up to the API level.
-
Connected the hydra-tx/depositTx to our API and now doing the same for
recoverTx
-
Ok, the recover is connected too. I am itching to try out if deposit could be imported into a wallet but I need to work further :)
-
Continuing where I left off - making the script output that contains the datum hash so to prove that we either have a problem (most likely) or hydra decommit/fanout work as expected.
-
Realized that the
createOutputAtAddress
is just providing a pub key witness (usingmakeShelleyKeyWitness
). Somehow I need to also add theScriptWitness
but not sure exactly how in this setup and why is this needed now? Shouldn't I add the witness when trying to spend? ... Ok yeah. The error is coming from a hydra-node that tries to spend this script output so makes sense. -
It is actually funny since I was the one implementing the commit feature. I need to send a transaction that contains the script witness for the script I am trying to spend :)
-
Giving it a show and the test is green! hmmm.....
-
Failing test related to snapshot signature verification. This is surprising since I didn't touch the related code.
-
Trying out with removing the validation of some snapshot fields to see which field trips the validation.
-
It is the
utxoToDecommit
field that causes this. -
I can get the valid signature if I make sure that I also produce hash for
mempty
in case this field is not present and I don't see that any code changed at all which is surprising. -
It has to be something related to hashing of the empty UTxO by using
serializeData $ toBuiltinData mempty
. Previously we hadMaybe UTxO
as the argument and now we haveMaybe Hash
but I am not sure if I should spend time exploring this. -
Making the signature accept a
Hash
instead ofMaybe Hash
and fixing all the call places. This should introduce changes to our on-chain code mostly while we would still keepMaybe UTxO
in theSnapshot
. -
This yields some changes I would need to think about :
case
contestRedeemer of
Head.ContestCurrent{} -> -- TODO: think of the consequences of this
toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit
_ -> toBuiltin $ hashUTxO @Tx $ fromMaybe mempty utxoToDecommit
-
so the check
isNothing deltaUTxOHash
should be replaced with the check for empty hashhashTxOuts mempty
. -
Close mutations are affected by these changes.
H30
is happening now in the mutations forCloseUsed
-
Turns out in the
CloseOutdated
redeemer we need to make sure thatutxoToDecommit
is not present. TheCloseOutdated
means we are closing while the decommit is pending but I am not entirely sure why thenutxoToDecommit
needs to beNothing
? -
This highlights the fact that this knowledge needs to be captured somewhere (it was in my head). Most likely spec has the answer to this - checking.
-
The spec uses different terminolgy
Initial, UnusedDec, UsedDec
soUnusedDec
reminds me of ourCloseOutdated
but this is not the case where we expectutxoToDecommit
to be bottom! It is actually the filename of the mutation test that points to the correct redeemer on-chain! -
We need to make the connection between mutations and the spec/on-chain code obvious to avoid spending much time on this in the future and make it obvious which is which!
-
Working on incremental commits will hightlight this since we will need to add more close redeemers for the close!
- I realized I called deposit drafting
draftIncrementalCommitTx
so renaming that and connecting the chain handle to our API server. - Next step would be to actually construct and serve the deposit transaction but since I still don't have the code to do it I will move to the observation.
- I will not be able to complete the OnDepositTx observation off course but I can hook things up.
- After adding
OnDepositTx
I had to add a corresponding type toChainTransition
but should this really be here? Stubbed out the observation code with undefined's and errors here and there - Stubbed out the on-chain increment validator too and now since I got the green light on the PR I'll start using the new code and finally kick off incremental commits for real!
- Continuing with the off-chain changes for incremental commits
- Separating the
recover
flow to use REST DELETE soPOST /commit
route will be responsible fordeposit
flow as well as normal commits -
Commit
client input now takes aUTxO
instead oftx
- For the next stage I would need the deposit PR merged since I'd like to create and balance the deposit tx and return it to the user for signing/submitting.
- Could quickly specify
DELETE /commit
route for recovering pending deposit - Question arises: Should the hydra-node pay for the
recover
tx balancing or we should serve the tx to the user to balance/sign/submit? - It is a bit better UX to make the hydra-node do it since we don't have an easy way (yet) to balance the tx using cardano-cli and we are not sure if importing tx and balancing in the user wallet would even work (eternl can import the tx but not sure of other wallets)
- Let's make the node balance the transactions since we mentioned a plan already that we could have a feature to make only the requester's hydra-node pay for the transactions once they are the next leader.
- Tackling initial code to handle
DELETE /commits/tx-id
so that the user is able to also recover the deposit
-
Moving to this since I got a review and I need this work to continue on incremental commits
-
Not too bad, I can get this done quickly it seems
-
I should get the deadline and headid from the decoded datum of a recover UTxO
-
Recover mutations seem to be missing here also so I'll need to spend some time to add them
-
One mutation should be one that mutates the deadline of the recover transaction. I expect to see
DepositDeadlineNotReached
D04
. -
In order to do that I need to mutate the input datum deadline.
-
If I just remove the lower bound validity I should get
DepositNoLowerBoundDefined
D03
-
It took some sweet time to compile the mutation that changes the input datum deadline and I still don't get the wanted failure. This happens because of slot/time conversion so I am looking into this right now
-
I noticed I was using the same deadline for deposit and recover so that's one problem. I got the mutation working but broke the recover transaction so that's a bit puzzling. Working with time is always a headache especially in the pure Haskell setting like this.
-
Debugging the current seed:
input datum deadline is 7364703000
recover datum deadline is 7364703001
tx lower validity slot is 7364703
and we get D04
which means we tried to recover before deadline has passed or
said differently recover lower validity slot is not after the deadline.
- Ok, I was re-using the recover slot so found a bug. I really like logging the work since it helps me immensely.
- I could start working on implementing first the offchain parts of incremental commits since deposit/recover PR is now in review. I'll check with Noon where did they left off since they started looking at this initially.
- Ended up looking at haddoc builds and our website haskell documentation. Turns out you can't build haddocs if they are in the testlib (at least it seems that way)
- Ok, hydra-tx initial work is merged, deposit/recover validators and tx building is PR'd so I'll continue further and see where things are standing on the initial commit branch
- First I'll make everything compile again and then focus on the mermaid graphs and make changes which should be well specified already in our Hydra spec
- I removed the mermaid diagram in protocol.md which was I suppose added as a possible alternative and kept the one we are currently implementing.
- Did a small refactor to resolve a TODO in HTTPServerSpec and now looking into issue itself to make sure I don't miss something important before diving in.
- Added the new Chain handle to tackle the new request (could reuse the draftCommitTx too but wanted to be verbose at the beginning at least) and I look around the API server code to determine what changes are needed.
- Seems like we need two more types in order to support deposit/recover drafting. Possible code solution:
...
| IncrementalCommitDepositRequest
{ utxo :: UTxOType tx
, deadline :: UTCTime
}
| IncrementalCommitRecoverRequest
{ utxo :: UTxOType tx
, deposittedTxIn :: Text -- TODO: This should be a TxIn but in the API server we are not yet using the Tx (Cardano) type.
, deadline :: UTCTime
, recoverStart :: Natural
}
-
Slightly problematic is that in the API server level we are still abstract over the transaction type so using for example
TxIn
as the argument torecover
should be done differently. We can always just try to parse the request field into the type we expect off course. -
I wanted to explore first to get understanding and so that we all agree on the execution plan in order to minimize implementation time.
-
Will we call hydra-tx directly from the API server? Do we even need new
Chain
handle?
-
I wanted to add number of recover outputs in the recover redeemer
-
While doing this I realize the recover script witness is not properly attached
-
To fix the healthcheck for recover tx I need to do exactly what I mentioned couple of days ago - build depositTx and obtain the resulting utxo so I can create arguments to the recover tx from it.
-
Ok after all this is done my mutations still reveal the issue
D05
which means the locked utxo hash is not matching with the outputs. -
I realized I actually create a script output from a recover transaction. ouch. What I should do is try to decode the datum off-chain and produce whatever was locked in the datum.
-
I like current work on this as I feel like I am gaining better understanding but it can also be that I was too distracted when I implemented this originally?
-
Looking at what we did to unlock the committed outputs - logic should be exactly the same
-
Ok, so I ended up decoding the datum from the user provided TxOut and had a minor bug with the number of hashed outputs therefore the mutations turned out to be red
-
I have stubbed out the deposit and recover transactions and now it is time to do the same for the increment one.
-
Working on increment transaction should reveal missing parts in the deposit/recover (if any) and allow us to write mutation tests to test out the validators.
-
I drafted increment tx for the mutation tests but it is not yet valid:
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Left
(ScriptErrorTranslationError (RedeemerPointerPointsToNothing (AlonzoSpending
(AsIx {unAsIx = 1}))))),(ScriptWitnessIndexTxIn 1,Left
(ScriptErrorRedeemerPointsToUnknownScriptHash (ScriptWitnessIndexTxIn 1)))]
-
It seems I didn't attach the deposit script properly but from what I see all is good
-
Since the deposit datum needs to match in the increment transaction I could assemble a deposit tx and grab a UTxO from it to pass it to the increment transaction. Maybe this is the cause of increment not being valid?
-
Ok, this seems to produce nicer errors
Redeemer report: fromList [(ScriptWitnessIndexTxIn 0,Right (ExecutionUnits
{executionSteps = 209565180, executionMemory =
672406})),(ScriptWitnessIndexTxIn 1,Left (ScriptErrorEvaluationFailed (CekError
An error has occurred: The machine terminated because of an error, either from
a built-in function or from an explicit use of 'error'.) ["C02","PT5"]))]
-
By the way
C01
means the PT token is missing and if I disregard this for now I get to seeD02
error which means deposit doesn't have upper bound defined. Nice! we are making progress in terms of deposit tx now. Mutation tests like this are really valuable since they allow us to make sure all parts of the assembled transactions are as we expect. -
If I want to add upper validity for the deposit transaction I need a slot and currently we accept UTCTime in hydra-tx cmd line arguments.
-
If I choose to accept slots how hard it would be to build a deposit datum?
-
I couldn't find appropriate functions for now to go from
SlotNo
->POSIXTime
since this type is what get's set in the deposit datum. -
Turns out I was looking at outdated tx traces 🤦 The spec PR contains the new changes related to these validators.
- Starting to work on https://github.com/cardano-scaling/hydra/issues/1574
- We want to have a deposit (both
$\mu_deposit$ and$\v_deposit$ ) and refund validators in order to be able to commit to a Head or refund to the user in case of unsuccessfull commit - I am looking at the tx trace graphs and mermaid diagrams for these workflows
-
$\nu_deposit$ is parametarized by the seed (outputs user wants to commit) and the$\v_deposit$ validator hash - It seems like it would be easiest to start with
$\v_deposit$ and specify needed mutations first but in order to specify mutations we need off-chain tx construction together with the on-chain code so I'll start with drafting the needed types/functions for$\v_deposit$ instead and then$\mu_deposit$ - I added checks for before/after deadline for the Claim/Recover redeemers, check that head id is correct and deposit token is burned (not complete yet)
- Another trial of magic-nix which uses the github internal cache as a binary cache for
nix
. - Magic nix does not offer out-of-the-box any public cache interaction and we would need to make this work together with cachix. In another exploration, @locallycompact encountered problems by having both enabled: https://github.com/DeterminateSystems/magic-nix-cache/issues/78
- However, before tackling this I wondered what the build time improvement would be?
- Plan: Getting a first single package built and rebuilt with both
magic-nix
orcachix
. - Magic-nix:
- Using
accept-flake-config
to seed magic cache from cachix. - Rerun of
hydra-node
build with primed cache: https://github.com/cardano-scaling/hydra/actions/runs/10418269282/job/28854308115 -> build step only:1m04s
, overall2m49s
(still uploading) - Most time is spent on evaluating the derivation (IFD)
- Post action uploads are rate limited - does a typical rebuild result in enough uploads?
- Seems like magix-nix post action is re-uploading similar store paths on each rerun -> overall build time counts:
3m14s
Are our builds non-deterministic due to IFD? - Rebuild hydra-node (changed version number) with magic nix: https://github.com/cardano-scaling/hydra/actions/runs/10418479282 ->
4m45s
and no pushed paths pushed? - Rerun of the same action also resulted in a rebuild (indeed no paths were pushed) ->
4:19s
, also no paths pushed again
- Using
- Cachix:
- Switching over to using
cachix-action
https://github.com/cardano-scaling/hydra/actions/runs/10419403396 -> rebuild of hydra-node in4m47s
- Rerun of cachix build https://github.com/cardano-scaling/hydra/actions/runs/10419403396/job/28857732485 ->
1m28s
,1m50s
- Dedicated rebuild of hydra-node with cachix https://github.com/cardano-scaling/hydra/actions/runs/10419717683/job/28858340501 ->
1m25s
(accidental cache hit) - Now with a proper, new version for rebuild with cachix: https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28859845924 ->
5m04s
- Rerun after this https://github.com/cardano-scaling/hydra/actions/runs/10420234054/job/28860297642 ->
1m39s
- Switching over to using
- Conclusion:
- Magic nix is slightly faster in fetching (from github cache than download from cachix), but also spends more time in pushing paths after builds
- Also saw some paths pushed too often, or not at all (e.g. the deliberate rebuild of
hydra-node
was never cached) - It does not provide public access to the cached derivations
- Not worth it
-
When looking at the
BehaviorSpec
tests related to the fix we wanted to do I noticed actor is requesting a decommit with UTxO they don't own. -
Another problem ( and this is th original issue we started to look at) is can process transactions while decommit pending Here the problem is that we still have around the decommit transaction which was snapshotted but not observed (so it is still both in the new snapshot and the confirmed one) and we don't have a way of detecting if the next snapshot (L2 tx in this case) should ignore the pending decommit and just sign the snapshot.
-
I am tempted to add a field to
ReqSn
to detect if the reqest for snapshot is just L2 tx and in this case not check for decommit tx in the snapshot. -
Would this approach somehow influence also requests for snapshots that have some decommit tx? Not sure at this point but I don't think so.
-
So it is possible to have multiple snapshots with the same decommit tx in case there is a pending decommit but this decommit tx can't be altered until the decommit is settled. Only then you can request a new decommit and the local txs would still continue to work with the pending decommit.
- Browsed https://github.com/dao-xyz/peerbit/ a bit: A replicating
document storage framework that uses
libp2p
under the hood to discover peers and establish transport. - Peerbit applications seem to be structured around
Documents
that are replicated across peers. https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L89 - Eeach document has an and
index
and aSharedLog
that seems to concern itself with replication to "replicas". - Modifications on documents are
AppendOperations
replicated through the shared log https://github.com/dao-xyz/peerbit/blob/master/packages/programs/data/document/document/src/program.ts#L395 - More details on the sharding approach can be found here: https://peerbit.org/#/topics/sharding/sharding
- The shared log ultimately uses lbp2p pubsub to reach replicas and submit messages https://github.com/dao-xyz/peerbit/blob/5205167d4d1be4176741aa08458a6841f9b32c81/packages/programs/rpc/src/controller.ts#L284
- The
RemoteBlocks
abstraction suggests it does something similar like ouroboros BlockFetch https://github.com/dao-xyz/peerbit/tree/5205167d4d1be4176741aa08458a6841f9b32c81/packages/transport/blocks
- Fix I did to make this work enables us to not apply decommit transaction twice by looking at the previous confirmed snapshot.
- We may want to exclude
utxoToDecommit
on the next snapshot if this is the case. - I will draw transaction traces on the miro board to see possible scenarios
- There is another problem with the pr: by adding the
threadDelay
to make sure we observe transactions on the next block we broke some logging test. - Perhaps I can insert this thread delay only if the caller says so?
- Ended up wrapping the
withSimulatedChainAndNetwork
so it takes extra parameter that determines if there should be a delay when posting/observing a tx. - Other main issue is that we are not sure what are the consequences of
keeping/removing
utxoToDecommit
in the following snapshot/s. - It seems that we want to make sure that we do not sign snapshot if the
version is the same as the confirmed one and the
utxoToDecommit
is different. In this case we should reject thereqSn
request and in case where version andutxoToDecommit
are the same we just want to skip applying it. - There are tests already that help us reason about these changes and I'll focus next on trying to come up with a design for both spec and code that we care to maintain. Currently it feels fragile and it seems we can't predict the exact behaviour of decommit feature since we came a long way just to realize this problem right now.
- Looking at can only process one decommit at once in the
BehaviorSpec
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":13,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":13,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"inputId":13,"tag":"EndInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effect":{"serverOutput":{"decommitTxId":1,"headId":"31323334","tag":"DecommitFinalized"},"tag":"ClientEffect"},"effectId":0,"inputId":14,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"input":{"chainEvent":{"newChainState":{"slot":7},"observedTx":{"distributedOutputs":[42],"headId":"31323334","newVersion":0,"tag":"OnDecrementTx"},"tag":"Observation"},"tag":"ChainInput"},"inputId":14,"tag":"BeginInput"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"effectId":0,"inputId":14,"tag":"EndEffect"}
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"inputId":14,"tag":"EndInput"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"outcome":{"error":{"message":"decrement observed but no decommit pending","tag":"AssertionFailed"},"tag":"Error"},"tag":"LogicOutcome"}
- Seems like another issue we have is to somehow loose the local decommit tx before the observation happens?
- I see 4 observations of
OnDecommitTx
so on first occurrence we delete the local decommit tx but afterwards we experience the error? - Then afterwards we see a new
DecommitRequested
and aReqSn
:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"effects":[{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"}],"stateChanges":[{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"newLocalUTxO":[],"tag":"DecommitRecorded"}],"tag":"Continue"},"tag":"LogicOutcome"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"serverOutput":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"headId":"31323334","tag":"DecommitRequested","utxoToDecommit":[22]},"tag":"ClientEffect"},"effectId":0,"inputId":15,"tag":"BeginEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effectId":0,"inputId":15,"tag":"EndEffect"}
{"by":{"vkey":"ecc1b58727f3f12b3194881a9ecb9de0b28ce7b207230d8e930fe1bce75e256c"},"effect":{"message":{"decommitTx":{"id":2,"inputs":[2],"outputs":[22]},"snapshotNumber":2,"snapshotVersion":0,"tag":"ReqSn","transactionIds":[]},"tag":"NetworkEffect"},"effectId":1,"inputId":15,"tag":"BeginEffect"}
- And eventual failure because of not bumped snapshot version:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"lastSeenSv":0,"requestedSv":0,"tag":"ReqSvNumberInvalid"},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}
-
I start with a failing test case already where we try to decommit, wait to see the decommit approved and then do a new tx and expect it to be snapshotted correctly
-
Snapshot number 1 is confirmed with utxo = 2 and utxoToDecommit = 42
-
Then we see TxValid/ReqTx by both nodes after ReqSn one node can't apply next snapshot:
{"by":{"vkey":"d5bf4a3fcce717b0388bcc2749ebc148ad9969b23f45ee1b605fd58778576ac4"},"outcome":{"error":{"requirementFailure":{"error":{"reason":"cannot apply transaction"},"requestedSn":2,"tag":"SnapshotDoesNotApply","txid":1},"tag":"RequireFailed"},"tag":"Error"},"tag":"LogicOutcome"}
- So we have a utxoToDecommit in the next snapshot and it is not applicable since the utxoToDecommit was already removed from the confirmedUTxO but we also didn't observe the decommit tx.
- How would we know when the next snapshot comes if we already applied decommit tx or not?
- One idea I had is to check before trying to apply decommit tx and see if in the local confirmed snapshot we already have the same utxo and if that's the case then we would not try to apply the decommit transaction again.
- This works but we need to think if this strategy will work in all different scenarios we can have with L2 transactions while decommit tx is still kept locally/not observed.
-
Turns out the ledger rules are actually preventing empty inputs/outputs in some transaction. In case you don't provide any inputs you get
TxBodyEmptyTxIns
since it doesn't make sense to accept a transaction with empty inputs and in case there are no outputs theValueNotConservedUTxO
is thrown. -
I'll add one test that will assert this is true for decommit transaction and assert on the error so we can call this non-issue done.
- Exploring
hydra-doom
watch mode - Start by adding a command line flag to individually enable
-send-hydra
and-recv-hydra
- Looking around for command line arguments and found the -wss and -connect args from doom websockets.. should we integrate as a net_module too?
- The
doom-workers
repo is a nice, small, vanilla JS application to take inspiration from (good shortcuts likedisplay
) - When only watching with
-hydra-recv
I realized that the datum decoding was actually not working. - Disabling the demo makes it nice and idle if
?watch
is set - After figuring out encoding and decoding the redeemer, submitting
ticcmd_t
pieces this was quite easy. Now the question is.. how would we get a synchronized view/start of the game? How is this working in multiplayer?
- Everything looks good to me. Nice job, I am glad this is finally done.
- I introduced new snapshot type in our transaction traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764594997541131&cot=14
- There can be
Normal
,Inc
orDec
type of snapshots each signaling their purpose. Normal ones are reserved for new L2 transactions.Inc
snapshot includes some new UTxO into theHead
andDec
takes something back to L1. - For the spec changes I decided on the letter
$\kappa$ to represent this new snapshot type for no specific reason$\kapa \cup Normal \cup Inc \cup Dec$ - The tricky part is to explain all of the possible decisions we can make in close transaction based on this snapshot type and the type of a close redeemer.
- In the end it wasn't that bad to update the close transaction. I just had to specify 4 different close current/outdated with respect to
Inc
andDec
snapshot types and also update the figure to include the snapshot type$\kappa$ in the close redeemer.
- I updated the tx traces we added yesteday to depict the flow with the increment/decrement transactions
- I think we gained some clarity together with Franco
- Updated the drawing with the new snapshot type I propose and the close redeemer addition to specify this type
- Now close needs to distinguish what to do based on the snapshot type:
Normal, Inc or Dec
and the redeemer type:CloseCurrent, CloseOutdated, CloseInitial
- It is time to update the spec with these proposed changes
- Changes in spec mainly need to consist of explaining the new snapshot type where appropriate and use the new type in the increment/decrement transaction.
- Validators need to contain check for conditions I explained in the tx traces https://miro.com/app/board/uXjVMA_OUlM=/?moveToWidget=3458764595074776662&cot=14 which take into account the new snapshot type together with close redeemer type
- This snapshot type will be propagated through the close redeemer too
- Seems hard to find information on when the head was initialized on-chain?
- I used the slot/block when I published hydra scripts for that but how would you find it otherwise?
Answer: We could search for tx metadata. All hydra init transactions contain
HydraV1/InitTx
. - Do we need to expand on the documentation for these corner cases in order to provide more information to our users?
- After putting the close redeemer casing, I had also aligned the "outdated" case signature verification to use version - 1 -> this made tests fail!?
- Using
version
instead ofversion - 1
makes mutation tests green. This is wrong? - EndToEnd tests now also fail with "MustNotChangeVersion" -> this is connected!
- Fixing the version used on close to the last known open state version (
offChainVersion
), makes E2E pass - While the "can decommit utxo" was green right away, "can close with pending decommit utxo" was red first. After adding a delay of 2 secs after submitting the decrement it was green -> race condition in
hydra-node
when trying to close while decrement was not yet observed back. I think this is expected and we need to manage the UX here (e.g. try close again). - In any case, the pending commit E2E test is redundant.. it's not a strictly a pending decommit and the test is the same as the one above
- After mutation and e2e tests are fine, model tests still choke on new
MustNotChangeVersion
- Found an original bug (or current inconsistency?) in HeadLogic: version is incremented whenever the snapshot leader sees the decrement, but not to what the chain observation includes.. could be a race condition (model tests fail)
- When trying to tie the knot correctly in
onDecrementTx
, more and more abstract functions are needed. What is the most common denominator? - Testing correct observation of decrement is not easy as all our tests use "pure transactions" and only e2e tests spot misalignment on "real" transactions (with fees paid, change output etc.)
- Removing snapshot number check from decrement: should motivate not having number in open state datum. Running mutation tests, tx trace and model tests after dropping it.
- Only
UseDifferentSnapshotNumber
failed, and fails with a different reason:SignatureVerificationFailed
. - This is expected as the snapshot number to verify in
decrement
txs is still taken from the datum instead of the redeemer. Let's change that now. - Also an interesting
TxTrace
failure with a singleDecrement
of snapshot number0
. Was expected to fail and does not fail now (is this a problem?) - Sub-typing all datums and redeemers likely makes sense. At what cost?
- When fixing the TxTraces to not expect failure on decrement with snapshot 0, I run into another issue where
contestTx
is called with version0
or otherwise too old version, resulting in a "should not happen" case. - I realize there are pros and cons about
utxoToDecommit
anddeltaUTxOHash
being aMaybe UTxO
andMaybe Hash
respectively.
-
Maybe it can be worthwhile exploring the idea of how to trick the Head protocol into decommitting twice.
-
So we would ask for a decommit after which there would be no further snapshots.
-
Then we would close but the contestation should try to contest with the snapshot we used for decommit.
-
Would the fanout re-distribute already decommitted funds again?
-
Franco wrote an e2e test that wanted to send a decommit request, wait to see it approved but then close instead.
-
The test was failing because of
SignatureVerificationFailed
and it was happening because of the version missmatch. -
I fixed this by updating the version check:
- snapshotVersion = alreadyBumpedVersion - 1
+ snapshotVersion = max (version - 1) 0
-
This way we don't go below 0 which was causing the snapshot validation to fail.
-
This worked and got us to another interesting problem. Close transaction was failing with:
InsufficientCollateral (DeltaCoin 0) (Coin 2588312)``
-
This is interesting since the collateral input had around 21ADA available which should be more than enough.
-
I was looking at the cardano-node logs and what seems to happen is that the posted decommit tx got into the mempool just before the close one. The close tx was then failing with the error above but this doesn't explain the error I was seeing. What I would have expected is the close tx to fail with
BadInputsUTxO
. -
Our suspicion is that since the hydra-node has never observed the DecrementTx taking place on-chain, our local view of the wallet UTXO is invalid. This is likely causing the collateral error we are experiencing.
-
We decided to leave exploring this for now since it seemed like a race condition between the decommit and close tx but still I'd like to get to the bottom of it.
-
I wanted to prepare a small demo for the team and demonstrate that decommits are working as expected.
-
In the process of decommitting I had small problems submitting a decommit tx since the balancing is on user side so setting the fees is smallish annoyance.
-
After successfully decommitting (a single utxo of 10ADA) now the head is still open but there are no funds inside.
-
Posting a close yields:
An error happened while trying to post a transaction on-chain: FailedToPostTx {failureReason = "TxValidationErrorInCardanoMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError (UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [AlonzoSpending (AsIx {unAsIx = 0})])) :| [UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"0e96eb70e55ac2cfa0b9d93aa5ef3fd57dff54de704d72b7d355896581e70a85\")) (SJust (SafeHash \"2862039cf16f77fc21cb3673a4cb6e9185636c9b6a3e519f3ec8416a4af469e5\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue (Coin 0) (MultiAsset (fromList []))) (MaryValue (Coin 9982912420) (MultiAsset ( fromList [(PolicyID {policyID = ScriptHash \"ff0fc67587997fc3330fb29dff6c388166baaf0d62f934882deee38d\"},fromList [(\"4879647261486561645631\",1),(\"f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2\"}) (TxIx 2)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (DeltaCoin 0) (Coin 2586728))))])))"}
-
So we get InsufficientCollateral errors and now I want to see why would hydra-node's internal wallet yield this error.
-
If there are no funds in the Head this should not be related to what hydra-node internal wallet owns.
-
Ok, the error above seems to be just the outcome of trying to Close multiple times without waiting for a close tx to be observed BUT fanout yields two UTxO with the same amount!
{
"11e5ce75e751d901d4f233960ffa6e86f4c9e5c3f9c3a3c276eea5a38da213e2#1": {
"address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
"datum": null,
"datumhash": null,
"inlineDatum": null,
"referenceScript": null,
"value": {
"lovelace": 9836259
}
},
"cddf3345a486396a8516418244c54161aab511efb99234580836114fa0f5b6cc#0": {
"address": "addr_test1vp5cxztpc6hep9ds7fjgmle3l225tk8ske3rmwr9adu0m6qchmx5z",
"datum": null,
"datumhash": null,
"inlineDatum": null,
"referenceScript": null,
"value": {
"lovelace": 9836259
}
}
}
which means we decommitted 98362259 lovelace, then closed with the same
ConfirmedSnapshot
which had the utxoToDecommit
field still populated and
that ended up creating another output with the same amount! A BUG!
-
We went through the decommit model for testing and determined we can't just randomly produce some utxo to snapshot. We need to maintain the utxo set of the head if we want to be able to decommit something but not alter the head value.
-
I did a spike yesterday and tried to evolve the head utxo set in the model but decided to drop it since I noticed even before this change there were some flaky tests.
-
Now I want to make sure everything is green all the time before I try to add decommits again.
-
I ended up removing the
Augmented
snapshot number constructor. We didn't account for them in thevalidFailingAction
and it was causing weird behavior in the presence of negative testing. Now all non-decrement actions are green and I need to come up with a way of evolving the head utxo and make sure there is some value to decrement before actually attempting to decrement (TxBodyOutputNegative
errors). -
I enabled
Decrement
actions again and made sure that the head utxo value contains the initial utxo value at the start. Then, when decrementing I make sure to pick some utxo from thespendableUTxO
and decrement it. This way all is green but we also need to make sure to test actions with various snapshots. -
This is not so straight forward to do and I need to think about the easiest way to do this.
-
Perhaps keep the seen snapshots in the model and occasionally pick one of the old ones?
-
Are the negative actions we produce enough to say the decrement is working properly?
-
Seems like everything is green just because we
adjustUTxO
and not really remove anything from the head utxo. -
I want to add a check to make sure the value in the Head is reduced right after a decrement.
- Lots of pain with emscripten and typescript, but forgot to keep track of log items... anyhow:
- Websocket.addEventListeners were leaking heavily, with removeEventListener things seem fine. I wonder though, is there a way to re-use "subscription infra" on the frontend and put events into a queue of some sorts?
- When running the scenario for a while, the log of the hydra-node grow heavily (should control log density). So I removed the file from disk (the file descriptor likely was still existing) .. and perceived performance got a bump!?
-
Last thing I remember is that we experienced
CannotFindHeadOutput
errors sometimes when running theTxTraceSpec
tests. There is some flakyness here but most of the time the tests are green which is not something I would expect. We wanted to see failures in close/fanout related to Uw (U omega) addition/checks. -
Sprinkled some tracing in the close/contest/fanout tx creation to see what kind of UTxO is there and detect why Head output is not found.
"Close: 9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace\n52f744b6ac#71 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14" "Contest: 1d3367a1b0#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace" "Fanout: f5110cd941#0 \8614 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.4879647261486561645631 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.6749e4613f026c0e05707edb47e9fcf72a74400b6540a6c65a50dc78 + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.67ec7ad74d4eab5c5a0fcaf7fc333d3e6e64c2488161ac182ce7fe8f + 1 25e84a2978367f8eaf844ab0ec17b27f7af27b0f4277fbc43717cccd.ae254a0090f3a1995aab7b20d8b16d36f5925998e2d503815f28cb14\n9553a3fd89#0 \8614 100 lovelace\n9553a3fd89#1 \8614 100 lovelace\n9553a3fd89#2 \8614 100 lovelace"
-
From the traces above I don't see clearly the head output with some PT's
-
I disabled the
Decommit
action generation and the same problem still exists- so this is not tied to the changes I added for decommit action. I'll try to go back few commits to assert tests were green before decommit snapshot addition.
-
Going back just leads to missing decommit utxo in the snapshot errors.
-
Tracing in the
State
module suprisingly reveals the head UTxO is actually there??? -
Ok, removing shrinking with
qc-max-shrinks=0
reveals the problem again.
Assertion failed (after 4 tests):
do action $ Close {actor = Alice, snapshot = Augmented 3}
failingAction $ Fanout {snapshot = Normal 3}
action $ Fanout {snapshot = Augmented 3}
pure ()
Model {headState = Closed, latestSnapshot = Augmented 3, alreadyContested = []}
Fanout {snapshot = Augmented 3}
Failed to construct transaction: CannotFindHeadOutputToFanout
-
Seems like we always seem to produce something to decrement (decommitUTxO in the snapshot can be
Just mempty
) which doesn't make sense (and there was a comment in the PR related to semantics ofJust mempty
vsNothing
). Let's fix this first since I think it is tied to missing Head utxo problem above. -
After directly producing the snapshot with something to decrement we get the problem of negative output
TxBodyOutputNegative
which is expected. We just generate snapshots without taking care the actual UTxO is part of the Head UTxO. -
I think the next step is to make sure whatever we decrement needs to be part of the Head UTxO.
-
After this change I see
H09
orHeadValueIsNotPreserved
error in the decrement tx. -
This means that the head input value does not match the head out value since we added decommit outputs.
-
If I update the UTxO state in the run model with the new UTxO containing the decommit outputs then I get
H24
which isFannedOutUtxoHashNotEqualToClosedUtxoHash
. -
This points to the fact that we don't do a good job of maintaining the UTxO set between close/fanout in our specs. But we wanted to keep it minimal so that was fine so far, maybe it is time to add more UTxO modelling in our specs.
-
We seem to think that the cardano-cli is using the TextEnvelope transaction format. At least this is what we expressed in one of our tests (accepts transactions produced via cardano-cli).
-
This test was generating an arbitrary transaction and then serialise it to the text envelope format and check if we can parse it back.
-
This test now fails because the tx type we expect is not correct. The transaction type depends on the fact that tx is signed or not ("Un/Witnessed Tx BabbageEra") but the
serialiseToTextEnvelope
produces just "Tx BabbageEra" which makes our test red. -
I tried assembling a tx using cardano-cli and noticed that what we get is the "Ledger CDDL" format instead and proper string for the tx type ("Un/Witnessed Tx BabbageEra").
-
So it seems there is some discrepancy between what we expect and what we actually get as the cardano-cli tx format.
-
Progress. I can see the blockfrost submit error with the instructions provided by MLabs.
-
Seems like the correct hash for the commit tx is detected (
12e5af821e4510d6651e57185b4dae19e8bd72bf90a8c1e6cd053606cbc46514
) but for whatever reason it doesn't hash to61f765d1abd17495366a286ab2da9dbcb95d34729928f0f2117f1e5ad9f9e57b
which was expected hash.
-
Last thing I did in the new model tests for decommits is to add the actual UTxO to decommit to our modelling of the Snapshots.
-
This led to H35/34 validator errors which means the Snapshot signature is invalid and in turn it means whatever outputs wanted to decommit are not complete in the Snapshot and/or they are not properly signed.
-
Tracing the produced Snapshot with something to decommit reveals that decommit output is actually present in the decommit transaction. Does this mean the snapshot is not properly signed?
-
Indeed this is the case. We create a snapshot and the signatures but don't set
utxoToDecommit
. Afterwards we set theutxoToDecommit
before callingdecrement
function but the signatures are invalid at that point. -
After re-signing the proper snapshot with decommit output we go further and the model spec fails with invalid Decrement action which is expected to fail.
-
Turns out the problem was that I was removing the decommit outputs from the utxo set and this is not what we want to do since the snapshot utxo is just generated and was never actually committed.
-
Now we come to a problem where head output was not found for contest/fanout txs.
-
Our users MLabs are having issues with the commit tx produced by hydra-node. The issue is that the auxiliary data hash is not correct.
-
Our current tests are all green and we explored the wallet code in order to figure out at which point the auxiliary data hash becomes invalid but with no success.
-
There are both property tests and the tests that actually complete the whole workflow and return the signed commit tx to the user but nothing reveals the issue.
-
The last attemp was to recalculate the aux hash in the conversion from/to ledger tx but this turned out to not be a proper fix.
-
After this I tried to assert the following for different combinations of:
- Aux data and hash is valid when create tx using
commitTx
function againstdraftCommitTx_
- Aux data and hash is valid when create tx using
draftCommitTx_
function against signedcommitTx
tx - Aux data and hash is valid when we encode/decode tx to make sure that the aux data is not lost in the process and all the tests are green ...
- Aux data and hash is valid when create tx using
-
Is this somehow blockfrost specific? I know MLabs are using blockfrost to submit this transaction and so maybe this is worth exploring.
-
They have the workaround by just recalculating the aux hash since they have a way to re-sign the hydra transaction but this is not a good solution.
-
I think further steps are to try to reproduce the issue together with MLabs and see if we can find the root cause.
-
When working on reproducing the contest bug we found out that our mock chain layer is not validating transactions properly.
-
When calling the
evaluateTx
we getEither EvaluationError EvaluationReport
but we never check ifEvaluationReport
contains someScriptExecutionError
-
Another problem we found is that
addNewBlockToChain
updates the new slot and block but keeps theUTxO
same. -
We added check when trying to
submitTx
to see if the tx we are trying to submit is actually valid, but, with the changes above we observeH03
error which isReimbursedOutputsDontMatch
. -
We error out in the
checkAbort
with this error in case the hash of the committed UTxO doesn't match with our hashed outputs in the abort. -
So since the error points that there is perhaps something wrong in the way we simulate commits I added check in the
performCommit
to make sure when we see theCommitted
output, the UTxO inside matches with what we actually committed. I think we were observing other node's commits and that led toH03
we were seeing. -
Now we get a completely different failure (seed is
590033147
):
TransactionInvalid (TransactionValidityTranslationError (TranslationLogicMissingInput (TxIn (TxId {unTxId = SafeHash "2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c"}) (TxIx 5))))
-
The only thing is, when printing a tx I actually see the input
2c7ce242a2cd37f5d53e0c733985eb1430a2cbb52ade7e2d95225f0be578e64c
with Ix 5 so WAT is happening. -
I see that we need to invoke
toApi
on our utxo in theevaluateTransactionExecutionUnits
and this is justcoerce
so it shouldn't matter but I added roundtrip tests anyway just to regain some sanity.
- Worked on the model and some back and forth, succeeded to issue close
transactions of the
initialUTxO
available in the mock chain (surprised that it is kept around and made available asfutureCommits
?) - The model did not fail and wee seemingly can contest and fanout without problems. Let's look back at the logs of the bug issue
- State changed events: Both nodes see the
HeadClosed
at slot38911656
with deadline2024-01-18T08:48:47Z
- After four blocks with
TickObserved
, theHeadIsReadyToFanout
- Node 1 stores a
ChainRolledBack
state changed event, seemingly to slot38911656
(which would be exactly the point where the head closed) - Node 2 has two rollback events, one to slot
38911660
(after close) and then to38911656
(at close) - In the logs, node 1 observes at
2024-01-18T08:47:36.188861Z
in slot38911656
aOnCloseTx
, at2024-01-18T08:47:40.410448Z
in slot38911660
aOnContestTx
with snapshot1
, and at2024-01-18T09:09:22.762856Z
in slot38911660
anotherOnContestTx
of snapshot1
(Arnaud mentioned he tried to wipe the state, re-observe and try fanout again) - Node 1 tried to post twice the
CloseTx
, first on2024-01-18T08:47:04.257801Z
with success producing tx94a7f0b86ee448dedf3667114f968dd825055bee4c90379a4659d8563b2e7d4a
, the second attempt at2024-01-18T08:47:28.565181Z
failed withTxValidationErrorInMode (ShelleyTxValidationError ShelleyBasedEraBabbage (ApplyTxError [UtxowFailure (AlonzoInBabbageUtxowPredFailure (ExtraRedeemers [RdmrPtr Spend 0])),UtxowFailure (AlonzoInBabbageUtxowPredFailure (PPViewHashesDontMatch (SJust (SafeHash \"1119bb152dc2c11c54389752f0d6c8398912582a73d2517570e9898c1818bb3e\")) (SJust (SafeHash \"7bf7ccdce09da7e95d17ed5faf048b52f3d77a9c96928dd7656ad4e1647059b1\")))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (ValueNotConservedUTxO (MaryValue 0 (MultiAsset (fromList []))) (MaryValue 89629254 (MultiAsset (fromList [(PolicyID {policyID = ScriptHash \"1f8f4076f72a0427ebe14952eb0709c046a49083baef749920cf6bfa\"},fromList [(\"1ad7cb51c9e2d6250bd80395a5c920f6f628adc4b1bd057a81c9be98\",1),(\"dc9f50cac6e292e1aeaa438c647010f7f6f4fc85010092ff6cb21cf3\",1)]),(PolicyID {policyID = ScriptHash \"3ee6157268c7a3b7d21111b6772bc06aed0f425ef64b285223b308fc\"},fromList [(\"2d2da703f0e1fb682f997f121e3f90084d12c11203dc8c1314da16d3\",1),(\"4879647261486561645631\",1),(\"c312067b108b2ddd1a50ad1e757720570100fad3f14d6f88fd64ed42\",1)])])))))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (BadInputsUTxO (fromList [TxIn (TxId {unTxId = SafeHash \"3b6ded925ce15cffc3f47ebee62242e75bb57b63d7dad943397ebe55112c7094\"}) (TxIx 0),TxIn (TxId {unTxId = SafeHash \"b3704d8d7d7067c4b6910e4dd84b2074ffa4713027c14d81f18928ae3700d40d\"}) (TxIx 1)])))),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure NoCollateralInputs)),UtxowFailure (UtxoFailure (AlonzoInBabbageUtxoPredFailure (InsufficientCollateral (Coin 0) (Coin 5293200))))]))
- Node 1 did post a
ContestTx
at2024-01-18T08:47:36.189167Z
with id495878f0639aa2aed384af4b72bcdea058b339e6fd5fc6f63b5e7238d7297baf
- Then node 1 did post a
FanoutTx
at2024-01-18T08:49:18.378833Z
which failed validation with error codeH25
- All further attempts to fanout from node 1 failed for the same reason (2024-01-18T08:50:07.49598Z, 2024-01-18T08:58:38.158696Z, 2024-01-18T09:03:42.313769Z, 2024-01-18T09:17:42.869217Z)
- Node 2 did close the head with the initial snapshot at
2024-01-18T08:46:49.728882Z
with tx id5a4ed22b4e8137118eff07f26b366e031d880e66ebf11b3fafa130b50fc95441
on slot38911656
- All fanout attempts of node 2 failed with error code
H24
, on2024-01-18T08:49:18.323083Z
,2024-01-18T09:04:42.427032Z
and2024-01-18T09:18:44.892685Z
- Node 2 observed
OnCloseTx
at2024-01-18T08:47:36.18Z
on slot38911656
,OnContestTx
at2024-01-18T08:47:40.478408Z
on slot38911660
and the sameContestTx
again at2024-01-18T09:09:26.563872Z
- Unclear what actually caused it
- Situation: Tx requestor did not sign snapshot, the others (including snapshot leader) did
- Consequence: No re-requesting of snapshot, original requestor can't do anything now?
- Can we test this?
- Could / should simulate this in tests (e.g. by deliberately using wrong hydra keys in BehaviorSpec)
- Manipulating
state
is hard, especially as we also haveacks
andnetwork-messages
which need to be consistent- Why are we storing non-confirmed data? e.g. things that aggregate into
localUTxO
andlocalTxs
- Why are we storing non-confirmed data? e.g. things that aggregate into
- Coordinated reset https://github.com/input-output-hk/hydra/issues/1284?
- Not loading in-flight information from persistence would have the same effect
- Gossip of signed snapshots is related to this as well (and would not require persisting network messages?)
- Use the chain to check point a snapshot to coordinatedly (safe) reset point?
- Multiple
ReqSn
for same snapshot (upon new transactions "in the mempool"?)- Can't recover from non-consensus
- Remove
localUTxO
to at least void "inconsistent" local views and instead do that only upon Snapshot request / validation? - This whole thing feels super ad-hoc
- Concrete next step: Sasha pays for closing and initiates a new head
-
Tried removing the commit witness in order to test out if the produced cbor is compliant with the HW/CIP-21 but we get
ScriptFailedInWallet
when trying to cover the fee using internal wallet. -
Not sure if I really want to pursue this further but at the same time it is interesting to see the needed changes.
-
I might just do this in my free time. One option is to make the user pay for the fees instead of hydra-node wallet.
-
Trying to figure out how to produce commit tx from hydra-node that is compatible with HW.
-
There is no option to import the tx into trezor-suite and simply test that it works.
-
Pasting here the answer from the Eternl guys just for the reference:
The issue comes down to the body map ordering like I said in the beginning of
this thread. I then said that it was a requirement to have all maps sorted
canonically according to CIP21. This is the issue we face with Eternl as well.
Eternl parses the input cbor, extracts the body cbor raw as is, ie as its
sorted in original cbor.
This raw body cbor is hashed and signed.
Eternl then needs to re-construct a final tx, for this, CSL is used to
de-serialize the tx, inject the witness Eternl creates and then outputs the new
tx cbor.
But when this is done, the body map is sorted in canonical order and thus the
body doesn't match the original one any more.
So the witness created by Etenl is correct, but the tx cbor returned contain a
different structure for the body
Possible solutions
- Make sure that tx created follow CIP-21
- After creating hydra tx, pass it through cardano-hw-cli transaction transform
to reconstruct body to follow CIP-21. Must be done before any witnesses are
added.
- Change Eternl to put tx back together using the original body cbor and not CSL parsed body.
-
cardano-hw-cli
tool is not really acceptable to us since we should be able to figure out how to produce appropriate tx using cardano-api. Seems like what we produce is just not compatible with CIP-21.
-
Continuing work on incremental decommits I wanted to add the off-chain observation code.
-
Looking at contest tx which should be similar to decrement in a sense that it consumes Head output and produces another one.
-
I added the observation code without too much problems but need to revisit to make sure I didn't skip some expected checks.
-
Now our test fails because of
NotEnoughFuel
which makes sense since we are actually setting fees to zero inmkSimpleTx
. -
Increasing the amount of seeded lovelace for hydra internal walled fixes this issue and now the only thing that fails is the last check in the test where we want to assert that the UTxO we just created is adopted and can be queried from cardano-node.