-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOSFS #10] Implement rmdir API #23
Conversation
97acf39
to
1ba8d8a
Compare
@xianyinxin @openinx please take a look, when you have time. |
tosfs/core.py
Outdated
If the path is not a directory. | ||
TosfsError | ||
If the directory is not empty, | ||
or if there is an error during the removal process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it raises an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yanghua for the contribution, I just left serveral comments.
tosfs/core.py
Outdated
self.tos_client.delete_object(bucket, key.rstrip("/") + "/") | ||
self.invalidate_cache(path.rstrip("/")) | ||
except tos.exceptions.TosClientError as e: | ||
logger.error("Tosfs failed with client error: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue, if we decide to raise the error, then we are not required to log the message everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only cleared unnecessary logs belonging to this PR. And filed an issue to track this work for other apis. #26
@openinx @xianyinxin From my side, it seems all the concerns have been addressed. Please take another look. |
tosfs/core.py
Outdated
if not self.exists(path): | ||
raise FileNotFoundError(f"Directory {path} not found.") | ||
|
||
if not self.isdir(path): | ||
raise NotADirectoryError(f"{path} is not a directory.") | ||
|
||
if len(self.ls(path, refresh=True, detail=False)) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exists and isdir will call one info ( which means object HEAD) representively ? And the self.ls
will call an extra OBJECT LIST, which is quite time consuming ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exists and isdir will call one info ( which means object HEAD) representively ?
Yes, for now.
Some of the context within this is: There are some basic APIs here, such as the ones you mentioned - exists
, isdir
, isfile
, info
, ls
and so on. There are dependencies among them in the default implementation of fsspec. To sum up: We can only start implementing and overriding from the most core methods. But some methods are available by default (although there are performance bottlenecks, the default invocation here is mainly to complete the logic of the current API. This is why currently we haven't given high priority to considering the issues of stability and performance. ), as you can see. Some methods have performance issues when using the default implementation. These will also be in the gradual replacement list, such as the already created exists(#18) and #14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the self.ls will call an extra OBJECT LIST, which is quite time consuming ?
Actually, the current ls
supports the dir cache
, and it is controlled by the refresh
parameter. The default value of this parameter is: False
. Here, since deletion is involved and it is necessary to determine that the current dir
is empty, therefore, the call to ls
here will forcibly initiate a request to the tos server
. Regarding this point, do we have a better option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe list objects and limit 1 is a better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go first, but I have a bit concern about the RMDIR 's performance. I think we need to be careful about the approach to implement TosFs if we want to gain the better performance.
if not self.isdir(path): | ||
raise NotADirectoryError(f"{path} is not a directory.") | ||
|
||
if len(self._listdir(bucket, max_items=1, prefix=key.rstrip("/") + "/")) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@openinx Put a solution that is better than the previous one, but it's uncertain whether there is a more efficient one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, the reason why I commented as not efficient is: we call two self.infos , and at least one OBJECT LIST ( the object LIST the a limit 1000 QPS by default). So for the rmrdir
, it will easily reach the QPS limit. Also the latency of limit is much more higher than the normal HEAD (10ms vs 1ms ).
But I don't want this block your further PRs, So I plan to merge this firstly. We can improve it in the next following PRs.
Thanks.
@openinx I have fixed the commit signature issue. It needs |
Summary 📝
Write an overview about it.
Details
Describe more what you did on changes.
Bugfixes 🐛 (delete if dind't have any)
Checks