-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make path manipulation play nice with symlinks by deferring symlink resolution #749
base: main
Are you sure you want to change the base?
Conversation
…esolution pathlib.Path.resolve() would follow and resolve symlinks, which is usually not what we want. In most cases we want to defer symlink resolution. This change makes all places use pathlib.Path.absolute(). It does not normalize the path like resolve, but in all cases we don't care if it is normalized. It is also probably good in general that downstream functionality is not brittle by accepting only normalized paths. When doing path manipulation we may care about the surrounding directory structure, which may be a collection of multiple symlinks, so doing a symlink resolution would put you in a different directory structure, which is probably not the intent.
THIS_DIR = Path(__file__).parent.resolve() | ||
THIS_DIR = Path(__file__).parent.absolute() | ||
REPO_ROOT = THIS_DIR.parent.parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a motivating example file structure and a sample of what errors you encounter with the current behavior? I can definitely imagine there are unhandled edge cases with symlinks in here, but this change seems counter to the recommendation at https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parent
If you want to walk an arbitrary filesystem path upwards, it is recommended to first call Path.resolve() so as to resolve symlinks and eliminate ".." components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this error when running tests and the repo directory had a symlink in the path.
INTERNALERROR> File "/home/bpetkant/ws/sharktank/.venv/lib/python3.11/site-packages/_pytest/cacheprovider.py", line 339, in pytest_collection_modifyitems
INTERNALERROR> res = yield
INTERNALERROR> ^^^^^
INTERNALERROR> File "/home/bpetkant/ws/sharktank/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR> res = hook_impl.function(*args)
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/home/bpetkant/ws/sharktank/repo/sharktank/conftest.py", line 25, in pytest_collection_modifyitems
INTERNALERROR> rel_path = item_path.relative_to(root_path)
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/usr/lib/python3.11/pathlib.py", line 730, in relative_to
INTERNALERROR> raise ValueError("{!r} is not in the subpath of {!r}"
I think usually we should treat symlinks as if they are regular directories/files. Isn't that the whole point? For example if you have a structure
/a/b -> /some/directory
/a/c -> /some/other/directory
If you resolve, you lose the context of the symlinks and the relationship and you don't treat them transparently.
In cases where you want to copy/package whole directory structures you may want to copy the link or the contents. But in most situations you should operate with the paths as if regular directories/files.
I actually did not know that Path.parent
would treat ..
lexically. In that case we can use os.path.abspath, which returns the normalized absolute path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale to treat symlinks specifically like that with Path.resolve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... in other projects I've just done
THIS_DIR = Path(__file__).parent
then only use .resolve()
or .absolute()
or .relative_to()
for printing to stdout/stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a two examples of prominent software.
In Bash when inside a symlink directory
cd ..
would put you in the directory where the link is. Not the target.
on the other hand
ls ..
would print the contents of the symlink's target parent.
I think the cd
behavior is the correct one. Maintaining the context of the symlink. When printing paths we should in general maintain the symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more example.
which python
would print the symlink path and not the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change to just THIS_DIR = Path(__file__).parent
(no .resolve()
or .absolute()
), does your setup work?
This pattern looks like a mess to put in all files :P
THIS_DIR = Path(os.path.abspath(__file__)).parent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated here with Python 3.9 __file__
is absolute. Does not say normalized though, maybe it is assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed os.path.abspath
where we assume it is normalized.
pathlib.Path.resolve() would follow and resolve symlinks, which is usually not what we want. In most cases we want to defer symlink resolution.
This change makes all places use os.path.abspath(). When doing path manipulation we may care about the surrounding directory structure, which may be a collection of multiple symlinks, so doing a symlink resolution would put you in a different directory structure, which is probably not the intent.