-
Notifications
You must be signed in to change notification settings - Fork 41
Stuck verifier due bad block stop 3 different sentinels and drop out just now #45
Comments
We will look into this issue. If the block signature was good, this is likely an issue with the "wrong" block being frozen by the verifier, which would indicate a failure of the consensus mechanism. We haven't seen any consensus issues since the transaction limit was added to block scoring in v560, but it's certainly possible that this represents a consensus issue that hasn't been seen before. Thank you for the block and the logs. They will be helpful. |
But why 3 different sentinel that track 10 to 20 verifiers got stuck from that verifier and open way to drop most of my verifier if I had not checked + holo6 that was dropped out after ~4-5h (the issue or attack happened when I mostly sleep) |
The two ways that a verifier can be legitimately dropped from the cycle are:
While most of the removals lately have been due to the second case, the first case was also a common cause for removals, especially when we had consensus issues. These removals will happen almost immediately, because the only other choice is for the blockchain to stall. This case would happen if both the verifier and all sentinels protecting the verifier got stuck on the one incorrect block. There are a few interesting potential solutions to this problem, but the most sensible would be to add a concept of a "semi-frozen edge" or "pseudo-frozen edge" to the sentinel. In this case, the verifier would still get stuck. For reasons that are a little complicated, we actually want the verifier to get stuck in this case, because allowing a verifier to get itself unstuck automatically would introduce some difficult problems. But if the sentinel is able to cover for the verifier, that would buy enough time to manually fix the verifier. This also highlights further that an alternate entrance process, where the manual override is actually used, would strengthen the system. In this case, if we see that holo6 was removed due to an issue with the code, the appropriate action would be to promote holo6 to the top of the entrance queue and allow it to rejoin. This would also reduce concerns of malicious removals, as the cycle allowing a verifier back in would suggest that the cycle did not have any malicious intent against the verifier. |
I also encountered this problem.The incycle verifier got stuck,the latest block can be obtained by the sentinel but it can't update frozen edge for ManagedVerifier. sentinel's log: |
@awshrb This appears to be a different issue. In your case, the sentinel is continuing to track the blockchain and protect verifiers. In the sentinel log, the latest frozen edge is 16,046,092, which is 1,376 blocks past the frozen edge of the stuck verifier. The log statements you are seeing are monitoring that was added in version 612.
This is a special check that is done to see if a verifier is truly behind or whether the sentinel has just gotten unlucky in its recent block requests to that verifier. This happens in the Sentinel.updateFrozenEdge() method. |
@reb0rn21, this is a really interesting case. Your verifier did actually freeze a different block than the rest of the cycle, which would be a failure of the consensus process. We saw a lot of this back when the blockchain stalled all the time, but we hadn't seen it happen in the past few years. The block that your verifier froze had hash The blocks differed in that one included a If we look now at the identifier that produced the block (https://nyzo.co/status?id=bf9e.91c1), we can see two verifiers running on that identifier. The system was not actually designed for this, and we were planning to add some code to handle multiple verifiers running at the same time. Running of multiple verifiers on the same identifier always presents a challenge to the consensus, and it actually looks and behaves exactly like a brute-force attack on consensus would look and behave. The first of the two verifiers is We'll add monitoring to see if other cases of this happen. We did not have the consensus tracking introduced in v543 (https://tech.nyzo.co/releaseNotes/v543) enabled on any of our verifiers for this block. We'd need that level of monitoring to have information that would be useful for tuning the consensus algorithm directly. However, we can still begin to address this by adding the pseudo-frozen edge to the sentinel. We'll get that into our planning for development. If/when we collect more information about a similar issue in the future, we will look at how that information might be applied to a tuning of the consensus algorithm. As I mentioned, this further highlights the need for an alternative entrance process. Issues like this, where a verifier is dropped due to a core issue in the code, should allow for an easy manual override to let the verifier back in. There is a manual override now, but it isn't easy enough to be used frequently. A proper alternative entrance process would also address the join/drop situations. It appears that many users are waiting months to get in, finally join the cycle, and then drop after a cycle or two. |
@reb0rn21 To ensure this issue doesn't happen again without being noticed, we've added a check for hash on the nyzo.co web server. We also made a small modification to Currently, this is only displaying for verifiers in the cycle. However, it has detected a few verifiers that got stuck on block 3835422, which was a blockchain stall that we had back in July of 2019: These out-of-cycle incorrect blocks are not of consequence; they are just interesting. |
holo6 just dropped out:
*** left at block 16025645, cycle length: 2574
verifier version 607
VPS had no issue CPU load was <30%
at block i_016023877 was corrupt verifier got stuck and all sentinels 3 of them that tracked him (each sentinel had 10-20 verifiers in total) got stuck at same block, the holo6 then dropped out some 3-4h later as none of 3 sentinel protected it
holo6 score 6 hours after the drop (score was perfect before it drop a bit only as it was stuck for ~4h before the drop):
holo6 -137532 360 -137574 180 -137520 360 -137544 360 -137874 360 -148116 0 -148106 0 -148116 0
I will post logs of sentinel, holo6 log and will post block that do not match hash of the others, I do not want to presume anything how can perfectly fine verifier that had none issue on past 1year just get bad block all of sudden on top of it how that bad block can kill three sentinels that tracked holo6
sentinel where version: 610, 612 and 3th I updated so I am not sure which version it was but it was 608+ maybe even 612+
All 3 sentinel had to be cleaned of all block and resynced to get working again
holo6 there is log + block that was malformed, only sentinel2 have the log when it got stuck rest I was late to save logs
nyzo.zip
The text was updated successfully, but these errors were encountered: