-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ranking for function signatures. #23
Comments
Sounds like a good idea! Just asked amberdata.io via twitter - maybe they can run this query. |
Why rank them? What's the objective? And why not just look at all method calso in the last N (100k) blocks and produce a toplist? That would also show the size of supplied data, which can be checked if it matches the abi parameters |
Seems like a cool idea! Speaking internally (Amberdata). Thanks for reaching out on twitter! |
I think ranking them is a great way to know which one to show as a default. Perhaps even hide the lower ranked one when the difference in rank is really big (e.g. in the case of the ERC-20 forced collisions) |
To weed out collisions, though... Then you need to validate calldata against abi params. And then the collision-game just becomes a bit harder since they need to use identical params. Or would you use natspec to resolve conflict? |
So what I meant with 'why rank' is that it won't solve collisions, not easily at least. |
The whole thing can be done pretty nicely with geth tracing, though. Except the natspec part, I guess. |
@holiman my thought is that
|
Re 2). How? How can you tell which sig is the one being called? |
@holiman I don't think we can know, but my thought is that we're just providing some contextual data and it's up to people who use this list to decide what to do with it. Theoretically EthPM is the real fix for what this list is used for, at least in the case of wallets. If people use packages as their gateway to accessing contracts, then the wallet can just use the ABI data from the package instead of having to use a possibly wrong reverse lookup from this list. |
Oh wow. I remembered having implemented a 4byte-tracer long ago, and thought it may have gotten lost since then. But it's actually sitting right here: https://github.com/ethereum/go-ethereum/blob/master/eth/tracers/internal/tracers/4byte_tracer.js So it's fairly trivial to just let it run on a few thousand blocks, and we can see how good coverage 4byte has right now, and what the big blind spots are. EDIT: linked to the wrong file. |
Usage
The tracer output format is a dictionary, where each key is It's also possible to trace a range of blocks, and continually get pushes when new blocks arrive, if you use ipc+subscription. |
Doing a dump now, going to take a few more hours. In the meanwhile, I've downloaded the data for about 500K (top-level) transactions (the data also covers internal
I'll need to download a new dump of the 4byte db, so I can correlate the signatures with known ones. Then we could also get a toplist of unknowns. Plus also get a list of mismatches, where the supplied data length does not match any possible length of the signature in the db. |
@pipermerriam , you wanted to weigh in the gas and gasprice. Do you really that is needed? |
@holiman 👍 I do not think the gas and gas-price is needed. so but was just thinking if there is a collision that is really not forced but done by chance - then this is really problematic. This method will never really have the chance to change it's position if we hide it in the dark. In the end we really need natspec - and then shame contracts that did not upload a natspec to swarm by making a big red warning sign on all transactions that interact with such contracts. |
@ligi the full json is |
for comparing 4byte collisions, preferring the one with the simplest signature (shortest name, fewest arguments) is a decent heuristic. tho not exactly sure how best to unify those two dimensions (name, arguments). could maybe just measure the amount of entropy in the stringified function signature and take the smallest one (?) |
Here's a starting point for how we might objectively rank function signatures.
C
be the set of all contract addresses which contain bytecode matching the pattern used toJUMPDEST
based on the first 4 bytes of the message data.With just
C
we can establish a basic ranking for signatures. This ranking is however trivial to game.T
be the set of all transactions who's first 4 bytes match the signature.With
len(T)
orsum(t.gas_price * t.gas for t in T)
we should have a less easy to game metric. I suspect that this will be suitable until we find someone directly attacking the rankings at which point we can iterate on this.Question is, how do we easily get these metrics. I think there is a BigQuery database for most of the chain data that I may be able to get access to, otherwise, maybe someone else knows of a relational database with all the chain data?
The text was updated successfully, but these errors were encountered: