Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Use parso instead of ast #30

Closed
benjaminjkraft opened this issue May 18, 2018 · 2 comments
Closed

Use parso instead of ast #30

benjaminjkraft opened this issue May 18, 2018 · 2 comments
Assignees

Comments

@benjaminjkraft
Copy link
Contributor

Python's builtin AST doesn't really do what we want: we have to use asttokens to find the information we need, and even then it's kinda messy. Also, it means we can only parse the exact version of python we are running in, which means even if we support python 3, you'll have to end up with a python2 slicker binary and a python3 slicker binary if you want to operate on both. What we really want is something that provides a concrete syntax tree (i.e. including comments and the exact syntax) and does so for any python version.

I looked at a few: astor, lib2to3 and its fork awpa, baron, and parso. Parso seems by far the best: it has proper support for all versions of python, a reasonable API (not quite as nice as ast in some ways, but does have some useful utils defined), and I didn't find any obvious problems from 5 minutes of playing with it. It's also probably easier to contribute back to than lib2to3, which is the next contender.

This also means we can add support for operating on python 3 independently from moving slicker itself to python 3 (although we should do both, of course). (See #21.)

@benjaminjkraft
Copy link
Contributor Author

One issue that we may need to worry about is that Parso is slower (since it's written in python): on the file I tested it took 55ms to parse, vs. 2ms for ast, and 16ms for asttokens. (A few other files gave roughly proportional results.) Hopefully that will be ok: the first quick string-contains check will still work just fine, so we'll mainly just be 4x slower on files we actually touch, which is not terrible. It'll be a lot worse for files we don't have to touch but do have to parse, since for those we could just use ast.

@benjaminjkraft benjaminjkraft self-assigned this May 18, 2018
@benjaminjkraft
Copy link
Contributor Author

I spent a couple hours trying this yesterday. I was not pleased with the result: it took me about an hour to rewrite util.toplevel_names, and the result was longer and required more comments to understand. This is because Parso is still pretty close to lib2to3, which is to say its CST matches the way CPython internally parses the code, not the logical structure. For example in ast a function with a decorator is a single FunctionDef node with a decorator_list attribute, whereas in lib2to3/Parso it's a decorated containing a list of decorators and a funcdef node. Assignment statements are worse -- there isn't even an assign node, but rather an expr_stmt whose second child is an operator with value '='. In many cases parso does wrap the lib2to3 node type to add something closer to ast and add utilities, but this doesn't actually change the structure of the tree, which means if you want to do something the utilities don't provide, you're no better off.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant