A fast, scalable, multithreaded download statemachine based on AWS.
- 10+ respective threads (lambda functions) download files from target file URL.
- 10+ threads upload each chunk to the target S3 bucket.
- Chunk merged and job finished.
- System Efficiency: 300+ GB/ minute
- Scalability: scalable with additional threads (AWS Lambda Functions)
- Install AWS CLI : installation guide
- Install AWS SAM CLI : installation guide
- AWS Account Credentials : How to guide
- Replace each aws-account in template.yaml with your own AWS account number. (Ctrl + F, aws-account)
- Config each Role in StepFunctions and Lambda Functions and grant corresponding permissions. (Ctrl + F, Role)
- Build and deploy the sam application to AWS cloud.
$ sam build
$ sam deploy --guided
- Enter your target S3 name for storing the downloaded file.
$ Parameter TargetS3 [targetS3]: your-target-storage-s3-name
{
"src": [
{
"filePath": "https://download-file-1.ext"
},
{
"filePath": "https://download-file-2.ext"
},
{
"filePath": "{other-file-paths.ext}"
}
]
}
With the above solution, the system efficiency achieved 300+ GB per minute. The scalable system could also be further refined by thresholding the lambda functions (threads).
- Jeffrey Wang ([email protected])