-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multipart downloads when downloading large ranges via TransferManager.download() #248
Comments
@tim-finnigan is there any update on this work? Excited to see #260! |
I don't have any updates at the moment but will check in with the team. |
@tim-finnigan just checking in again. Did you hear back from the team? |
Hi @forrestfwilliams thanks for your patience and apologies for the delay in getting back to you. This issue was reviewed in the last couple of weeks and it was determined that it will need some further investigation at a cross-SDK level. I think there are some planned improvements related to S3 transfers that may or may not overlap with this issue. I wish I had more details to share at this point but unfortunately that is the extent of what I know at this time. I'll still plan to update this issue when there is more information to share. |
Hey @tim-finnigan, any updates on the "planned improvements related to S3 transfers" that overlap with this issue? Thanks! |
Hi @forrestfwilliams thanks for following up - this feature request is still in process but moving forward. It is part of a broader effort to improve S3 transfers across SDKs and a thorough review process is required before the logic would be updated. |
It has been more than a year, is there any updates on this feature? |
This issue references issues #1215, and its duplicate #3466 from the
boto3
repository. It has also been discussed in this stackOverflow post.Issue
s3transfer
supports ranged download requests and multipart downloads, however it is not possible to perform a multi-part download over a specific range. This results in slow download times when attempting to download a 1GB range of data from a 4GB file in S3.Use Case
I work at the Alaska Satellite Facility, where we distribute large amounts of remote sensing data to users across the globe via AWS. Many of these datasets come in legacy formats, such as zip files, that are not cloud-friendly. Due to the highly structured nature of these datasets, we can identify byte ranges that contain subsets of data that our users would be interested in downloading directly. However, since these datasets are still large (~1GB within a larger 4GB zip file), and multipart downloads are not supported for range requests, we cannot offer extraction of these dataset with low latency. I know of many other groups that have encountered this issue while trying to distribute large remote sensing datasets.
Proposed Solution
It would be great if a range argument were added to
TransferConfig
, that could then be passed to aTransferManager.download()
call, which would then download data ranges with sizes greater than themultipart_threshold
via a multipart download.I am willing to participate in developing this solution.
The text was updated successfully, but these errors were encountered: