Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOSFS #10] Implement rmdir API #23

Merged
merged 1 commit into from
Aug 21, 2024
Merged

[TOSFS #10] Implement rmdir API #23

merged 1 commit into from
Aug 21, 2024

Conversation

yanghua
Copy link
Collaborator

@yanghua yanghua commented Aug 19, 2024

Summary 📝

Write an overview about it.

Details

Describe more what you did on changes.

  1. (...)
  2. (...)

Bugfixes 🐛 (delete if dind't have any)

Checks

  • Closed #798
  • Tested Changes
  • Stakeholder Approval

@yanghua yanghua requested a review from xianyinxin August 19, 2024 11:07
@yanghua yanghua self-assigned this Aug 19, 2024
@yanghua yanghua force-pushed the 10-rmdir branch 3 times, most recently from 97acf39 to 1ba8d8a Compare August 19, 2024 12:40
@yanghua yanghua requested a review from openinx August 19, 2024 12:43
@yanghua
Copy link
Collaborator Author

yanghua commented Aug 19, 2024

@xianyinxin @openinx please take a look, when you have time.

tosfs/core.py Outdated Show resolved Hide resolved
tosfs/core.py Outdated
If the path is not a directory.
TosfsError
If the directory is not empty,
or if there is an error during the removal process.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it raises an error.

Copy link
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yanghua for the contribution, I just left serveral comments.

tosfs/core.py Outdated Show resolved Hide resolved
tosfs/core.py Show resolved Hide resolved
tosfs/core.py Show resolved Hide resolved
tosfs/core.py Outdated Show resolved Hide resolved
tosfs/core.py Show resolved Hide resolved
tosfs/core.py Outdated
self.tos_client.delete_object(bucket, key.rstrip("/") + "/")
self.invalidate_cache(path.rstrip("/"))
except tos.exceptions.TosClientError as e:
logger.error("Tosfs failed with client error: %s", e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue, if we decide to raise the error, then we are not required to log the message everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only cleared unnecessary logs belonging to this PR. And filed an issue to track this work for other apis. #26

@yanghua
Copy link
Collaborator Author

yanghua commented Aug 20, 2024

@openinx @xianyinxin From my side, it seems all the concerns have been addressed. Please take another look.

tosfs/core.py Outdated
Comment on lines 236 to 242
if not self.exists(path):
raise FileNotFoundError(f"Directory {path} not found.")

if not self.isdir(path):
raise NotADirectoryError(f"{path} is not a directory.")

if len(self.ls(path, refresh=True, detail=False)) > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exists and isdir will call one info ( which means object HEAD) representively ? And the self.ls will call an extra OBJECT LIST, which is quite time consuming ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exists and isdir will call one info ( which means object HEAD) representively ?

Yes, for now.

Some of the context within this is: There are some basic APIs here, such as the ones you mentioned - exists, isdir, isfile, info, ls and so on. There are dependencies among them in the default implementation of fsspec. To sum up: We can only start implementing and overriding from the most core methods. But some methods are available by default (although there are performance bottlenecks, the default invocation here is mainly to complete the logic of the current API. This is why currently we haven't given high priority to considering the issues of stability and performance. ), as you can see. Some methods have performance issues when using the default implementation. These will also be in the gradual replacement list, such as the already created exists(#18) and #14.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the self.ls will call an extra OBJECT LIST, which is quite time consuming ?

Actually, the current ls supports the dir cache, and it is controlled by the refresh parameter. The default value of this parameter is: False. Here, since deletion is involved and it is necessary to determine that the current dir is empty, therefore, the call to ls here will forcibly initiate a request to the tos server. Regarding this point, do we have a better option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe list objects and limit 1 is a better solution?

Copy link
Collaborator

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go first, but I have a bit concern about the RMDIR 's performance. I think we need to be careful about the approach to implement TosFs if we want to gain the better performance.

if not self.isdir(path):
raise NotADirectoryError(f"{path} is not a directory.")

if len(self._listdir(bucket, max_items=1, prefix=key.rstrip("/") + "/")) > 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openinx Put a solution that is better than the previous one, but it's uncertain whether there is a more efficient one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, the reason why I commented as not efficient is: we call two self.infos , and at least one OBJECT LIST ( the object LIST the a limit 1000 QPS by default). So for the rmrdir, it will easily reach the QPS limit. Also the latency of limit is much more higher than the normal HEAD (10ms vs 1ms ).

But I don't want this block your further PRs, So I plan to merge this firstly. We can improve it in the next following PRs.

Thanks.

@yanghua
Copy link
Collaborator Author

yanghua commented Aug 21, 2024

@openinx I have fixed the commit signature issue. It needs gpg key.

@openinx openinx merged commit c6e8806 into main Aug 21, 2024
8 checks passed
@openinx openinx deleted the 10-rmdir branch August 21, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants