Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3fs w/ multi-processing hangs #914

Open
max-kaufmann opened this issue Nov 13, 2024 · 3 comments
Open

s3fs w/ multi-processing hangs #914

max-kaufmann opened this issue Nov 13, 2024 · 3 comments

Comments

@max-kaufmann
Copy link

max-kaufmann commented Nov 13, 2024

Hello, we are using s3fs to store data on s3 as part of our AI evaluations library Inspect AI, and we are seeing that using multi-processing (but interestingly, NOT multi-threading) leads to our data loading hanging. i.e. for our internal function process_eval_logs, this will hang:

baselines_ctf = ["s3://platform.ws.aisi.gov.uk/agents/eval_data/"]
eval_data = process_eval_logs(baselines_ctf)

and this will not:

baselines_ctf = ["/home/ubuntu/inspect_ai/eval_data_local"]
eval_data= process_eval_logs(baselines_ctf)

It will hang in the s3 case if we use a ProcessPoolExecutor for our dataloading:

    with ProcessPoolExecutor(max_workers=num_workers) as executor:
        futures = {executor.submit(process_eval_log_single, eval_log): eval_log for eval_log in eval_logs}

but Not a ThreadPoolExecutor:

    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = {executor.submit(process_eval_log_single, eval_log): eval_log for eval_log in eval_logs}

I'm happy to give more details on how we are using the library, but before I do that I just wanted to check if this pattern matches to a known issue/limitation?

@max-kaufmann max-kaufmann changed the title Using s3fs with multi-processing seems to hang s3fs w/ multi-processing hangs Nov 13, 2024
@martindurant
Copy link
Member

The asyncio/thread use in fsspec async implementations including s3fs is not safe to fork. Although some attempt is made to detect this manner of launching processes (and there was a PR to improve this in fsspec, which I can't immediately find), those techniques remain imperfect.

Remedies:

  • launch processes with spawn or forkserver
  • do not instantiate any filesystems in the main process before using the process poolk
  • call fsspec.asyn.reset_lock() as the first thing in the task run by the processes.

@max-kaufmann
Copy link
Author

Thank you for the speedy + helpful reply! It might be worth adding to your docs, as I ctrl + f'd "thread-safe", "concurrency" etc. but couldn't find any mention of what you can + can't do w/ s3fs.

@martindurant
Copy link
Member

Probably this should be added in https://filesystem-spec.readthedocs.io/en/latest/index.html and referenced from the s3fs docs and others. Would you like to add it? I'm not quite sure where it fits in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants