-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: minimal e2e framework #1586
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1586 +/- ##
=======================================
Coverage 50.96% 50.96%
=======================================
Files 92 92
Lines 5751 5751
=======================================
Hits 2931 2931
Misses 2520 2520
Partials 300 300 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is dope! I appreciate the similarities to the tendermint e2e test, and see how we can add the desired tests.
Are we wanting to eventually enable celestia-node to use this as well? I could see them being able to reuse a lot of the code here.
mainly just a had a few questions, but other than that I think it LGTM, and we should merge and start adding tests/iterating as needed
services: | ||
{{- range .Nodes }} | ||
{{ .Name }}: | ||
labels: | ||
e2e: true | ||
container_name: {{ .Name }} | ||
image: ghcr.io/celestiaorg/celestia-app:{{ index .Versions 0 }} | ||
entrypoint: ["/bin/celestia-appd"] | ||
command: ["start"] | ||
init: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my local docker setup might be wonk, beacuse I had to add
user: root
in order to get the test to work. If not, the test will appear to be waiting, but none of the nodes actually start because they do not have permission to write to the volume.
does anyone know what I should do to fix that locally? change the group permissions I have for docker? cc @sweexordious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The docker image runs using user 1001 however I made the permissions of the volume 0o777 so everyone should have read write execute access.
Yeah Potentially. Matt mentioned having a common repository for e2e testing of various node types in a system. If we are to do this I'd definitely recommend reusing parts of this code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff 🎉
When I run this locally, the validators start correctly, but when I want to stop them using ctrl + c
, everything hangs until I kill the terminal session.
When this happens, I check the running docker images and I see none, so probably this is not related to docker runtime.
Is that the correct way to stop the framework?
Is anyone having the same issue or it's just me?
It should catch the sig term and just cancel the context. Let me dig in a little. In any case running |
cc @celestiaorg/testing for visibility |
Few questions from my side why we opted for in-house framework instead of using interchaintest?state-sync example why not just re-use their implementation of chain creation using docker-compose and networking?https://github.com/strangelove-ventures/interchaintest/blob/main/chain/cosmos/cosmos_chain.go It has the same capabilities as we have rn. Same with network pruning/setup imho, it looks like they have took care of docker networks cleanup which is clumsy if we want more control over implementation - why not reuse compose-spec like libp2p does?Seeing the matrix versioning schema that libp2p does - we can def reuse the hard work they pushed. Pros of aboveBoth have CI/CD implementation already meaning that we can speed up dev time and contribute back if something we really need |
My main concern over this PR - we are signing up on maintaining long-term, while there are tools that can help us achieve the same without maintaining it (interchaintest is constantly being updated, same is for libp2p/test-plan) |
Hey,
The simple answer is we have a modified fork of the SDK which is not compatible with Secondly, interchaintest is very IBC orientated. This means some of the initial goals of upgrade testing (we will have a different upgrade mechanism to most cosmos chains), compatibility testing and sync testing don't seem to natively fit their framework. Lastly, for an open source library, I think there documentation and tutorials around using their library are lacking (although the inline commenting is relatively helpful). As well as maintenance we have to also consider onboarding costs to understanding a new (and large) codebase (when something simpler might suffice).
I'm sympathetic to this. Ideally we reach our testing goals with little duplication and overlap. When we spoke prior to this it seemed like testground was not suitable. Rene tells me that robusta uses a different more light-weight framework that may be ideal for the outlined purposes. I'm not sure about the other (libp2p) library you mention but I definitely think that what's done here is quite generic: It's simply managing a series of binaries within a network and using the RPC endpoints to make assertions. What I'm more concerned about is that it seems like these testing goals are perhaps the same across the other teams (node have their "swamp" testing) which seems, at least to me, to heavily overlap with what core/app is doing. If we have a "testing" team then perhaps efforts from individual teams can be consolidated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I'm encountering an error while trying to run locally
$ ./build/e2e -f networks/simple.toml
Setting up network simple-53126
Spinning up testnet
Starting validator01 at height 0 on http://localhost:4202
Error: starting network: exec: "docker compose": executable file not found in $PATH
I have docker compose installed in my $PATH:
$ docker-compose --version
Docker Compose version v2.15.1
$ which docker-compose
/usr/local/bin/docker-compose
- Should we list that installing docker compose is a prerequisite (potentially in README.md)?
- Am I missing a different prerequisite step?
@@ -79,6 +93,7 @@ require ( | |||
github.com/dgraph-io/badger/v2 v2.2007.4 // indirect | |||
github.com/dgraph-io/ristretto v0.1.0 // indirect | |||
github.com/dgryski/go-farm v0.0.0-20200201041132-a6ae2369ad13 // indirect | |||
github.com/docker/docker v20.10.21+incompatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[informational] I was concerned by the +incompatible
suffix but it seems ok per https://stackoverflow.com/a/57356777
testing/e2e/pkg/rpc.go
Outdated
return nil, errors.New("network is not running") | ||
} | ||
|
||
// return heights in ascending order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] this line claims to sort heights in ascending order but the function godoc comment claims to return heights in descending order.
// GetHeights loops through all running nodes and returns an array of heights
// in order of highest to lowest.
func GetHeights(ctx context.Context, testnet *Testnet) ([]int64, error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
There are 2 topics that got confused and we clarified them here. The original idea behind this PR is for teams to do nightly tests using minimal e2e framework. I thought that this is only for core/app and integration ones executed on every PR. Going forward, we separated into 2 topics:
Nightly tests@celestiaorg/devops will implement infra features for robusta-nightlies after @cmwaters shares the notion doc with us. Integration testsThis PR should be used on Integration level per PR run. In addition, we need to migrate it to a standalone repo, so @celestiaorg/celestia-node and @rollkit can use to fulfil their testing strategies In addition, we will reuse the compose-spec that libp2p/test-plan has to make it CI/CD ready out-of-the-box Thanks for the input and clarifications 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this PR. 👍
"path/filepath" | ||
) | ||
|
||
// execute executes a shell command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// execute executes a shell command. | |
// exec executes a shell command. |
Key crypto.PrivKey | ||
} | ||
|
||
func LoadTestnet(manifest Manifest, file string) (*Testnet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit-optional] I think filePath
is closer to what this parameter means.
func LoadTestnet(manifest Manifest, file string) (*Testnet, error) { | |
func LoadTestnet(manifest Manifest, filePath string) (*Testnet, error) { |
testing/e2e/pkg/testnet.go
Outdated
|
||
// Address returns an RPC endpoint address for the node. | ||
func (n Node) AddressRPC() string { | ||
return fmt.Sprintf("%v:26657", n.IP.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit-optional]: Introducing a constant variable to represent the port number 26657
would enhance the code's future maintainability and readability.
} | ||
|
||
func (n Node) IsValidator() bool { | ||
return n.SelfDelegation != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] Since a non-zero SelfDelegation
is the only requirement for being recognized as a validator, why not use a boolean type for SelfDelegation
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we may want to set the self delegation to an actual number which will reflect the initial voting power of the validator
morge? |
Sorry for missing this @rootulp. |
96fae19
Ref: #1256
This PR puts together the base pieces for an e2e testing suite. It includes a CLI that can:
Further work aims to support: