Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Eliminate the need for anon=True when you might be opening a public bucket & object. #675

Closed
dmoore247 opened this issue Dec 21, 2022 · 5 comments · Fixed by #823
Closed

Comments

@dmoore247
Copy link

My routine is given a s3 url to a file. I don't know apriori if the bucket is public read-only bucket or has been secured via IAM role.

The current s3fs interface, requires my code needs to pass anon=True if the bucket is public if I want to avoid the python exception handling that generates the "Unable to locate credentials" error.

This enhancement request is to have s3fs automatically handle opening public s3 buckets & paths whether they're public (permitting anonymous access) or private and requiring a supplied credential. Eliminate the need for anon=True

@dmoore247 dmoore247 changed the title ENH: Automatically avoid "Unable to locate credentials" error when accessing public bucket, don't require anon flag. ENH: Eliminate the need for anon=True when you might be opening a public bucket & object. Dec 21, 2022
@martindurant
Copy link
Member

I can understand why you might want this. It would be difficult to implement, however, since the whole s3fs instance has a single configured aiobotocore client that does all the communication - so we would require two (perhaps we could have a global, module-level anon one??). Also, we would need to be able to recognise specific types of auth exceptions (e.g., no credentials is not the same as not-authorized).

One potential thing we've done in the past is to try to use the given credentials immediately, so that we can proactively fall back to anonymous. The trouble is that, while this would catch no-creds, it may also fail if there are, but they don't have permission to do whatever check we attempt to do with them (because IAM is complex). Also, silently creating an anonymous instance where the user presumably wanted to automatically use their creds seems like the wrong thing to do.

@BENR0
Copy link
Contributor

BENR0 commented Oct 24, 2023

Sorry for bumping this. I don't have any knowledge about IAM but I understand that some automatic checking seems difficult. But maybe this is more a problem of documentation/good exception messages?

I am thinking if the default is anon=True in case where the user does not give any credentials it would be "easier" in the public bucket situation. I think the assumption that a user wants to use a public bucket (and therefore anon=True) is not unjustified since a user who knows that the bucket needs credentials would supply them and not try anon=True first (maybe this is not true in the automatically use credentials case you @martindurant mention above which I don't know exactly what you mean by that; config files?).

This would obviously need to be documented that anon=True in case of absent credentials.

If I understand correctly the problem is that due to the complexity of IAM and not being able to differentiate between types of exceptions an automatic solution is difficult/not possible. But if there are exceptions during any checks with defaulting anon=True the error message should tell the user that the operation might not be possible with no credentials and he/she should try and supply them.

Does that make sense or is that naive due to my underestimation of the complexity?

@martindurant
Copy link
Member

We could put some effort into recognising the no-credentials situation and at least informing them that they likely wanted anon rather than giving back the botocore botocore.exceptions.NoCredentialsError directly. This happens the first time the user makes a remote call, not before.

There is no reliable way to know whether a bucket/key can be accessed via anon without trying it, however. To try to automatically fall back to anon seems foolish, however. Given that any key might be public, it would mean trying every call twice.

In my opinion, most s3fs access is probably using credentials. Any fully public file can be accessed via direct HTTP anyway.

Docs on credentials in s3fs: https://s3fs.readthedocs.io/en/latest/#credentials ; note that automatic determination from the environment or config files or machine metadata is very common.

@BENR0
Copy link
Contributor

BENR0 commented Nov 14, 2023

I agree; most users probably access buckets with credentials and that's exactly my point. The users who know that they need credentials will supply them and rarely will try to use anonymous access. But right now both kind of users are forced to supply arguments even if the want anonymous access. I tried to come up with a solution (see #823) which accounts for that. It might not be perfect and maybe I oversaw something because as I sad I am not very familiar with all the intricacies but let me know what you think.

@bsipocz
Copy link

bsipocz commented Jun 8, 2024

This issue may need to be reopened as the PR that closed it has been reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants