Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hotfix(operator): manage subscription to events without panic #1729

Open
wants to merge 12 commits into
base: testnet
Choose a base branch
from

Conversation

JuArce
Copy link
Collaborator

@JuArce JuArce commented Jan 9, 2025

Manage Operator's event subscription

Description

Previously, the Operator panicked if one connection (either main or fallback) failed.

The previous solution to this (#1692) was to iterate infinitely over both connections, this had the "silent failure" problem, as an operator could fail and retry forever without notice to its owner.

Now if one connection fails, the Operator will use the other one, while continiously trying to reconnect to the failed one; and if both connections fail, and the retryables are consumed, the Operator will exit, logging the errors of each connection RPC, so the owner/manager of the Operator server can fix it accordingly.

This PR also added the usage of fallback when operator calls DisableVerifiers

To test

Follow guide in #1692 to set up a proxy to your anvil, but I recommend setting up 2 proxies so you can individually control each Operator RPC connection, with the following docker-compose.yml:

version: '3'
services:
  nginx:
    image: nginx:alpine
    container_name: nginx-anvil-proxy
    volumes:
      - ./nginx:/etc/nginx
    ports:
      - "8082:8082"

  nginx2:
    image: nginx:alpine
    container_name: nginx-anvil-proxy-2
    volumes:
      - ./nginx:/etc/nginx
    ports:
      - "8083:8082"
  • Verify the Operator behaves as described, and while submitting proofs take down 1 connection, take down the other connection, take down both connections.

Type of change

Please delete options that are not relevant.

  • New feature
  • Bug fix
  • Optimization
  • Refactor

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change batcher/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@uri-99 uri-99 marked this pull request as ready for review January 13, 2025 14:36
Copy link
Contributor

@uri-99 uri-99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

biased self-approve

@Oppen Oppen added audit cantina Audit report from Cantina labels Jan 20, 2025
@JuArce JuArce removed audit cantina Audit report from Cantina labels Jan 20, 2025
Copy link
Collaborator Author

@JuArce JuArce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comments apply for V3

core/chainio/avs_subscriber.go Outdated Show resolved Hide resolved
core/chainio/avs_subscriber.go Outdated Show resolved Hide resolved
aggregator/pkg/subscriber.go Outdated Show resolved Hide resolved
core/chainio/avs_subscriber.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hotfix(operator): manage subscription to events without panic
3 participants