Running the HLS PGE on AWS Batch #74
riverma
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Running the HLS PGE on AWS Batch Guide
Description
This tutorial demonstrates the steps necessary to execute the HLS PGE v1 er1 on AWS Batch. Running the HLS PGE on Batch can be useful for quick profiling and performance evaluation of the PGE, among other uses.
Requirements
Setup
Running the HLS PGE in AWS Batch requires the following setup / architecture:
AWS ECR
You'll need to upload the HLS PGE to AWS ECR. Make sure you have write permission to ECR, and follow this tutorial.
Create Repository
First, you'll want to create an AWS ECR repository to store your PGE images. You can do this by navigating to AWS ECR, and clicking the "Create Repository" button on the main page, and following the instructions. Ask your system administrator if you don't have such permissions. Note down the repository name / URI (i.e.
*.ecr.us-west-2.amazonaws.com
or equivalent) - which you'll need in the below steps.Tag Your PGE Locally
Locally on your own machine, you'll want to tag your HLS PGE prior to sending to AWS ECR, example below:
docker tag opera_pge/dswx_hls:1.0.0-er.1.0 [YOUR-ECR-HOSTNAME].ecr.us-west-2.amazonaws.com/opera_pge/dswx_hls:1.0.0-er.1.0
Push PGE up to AWS ECR
Then push it up to ECR:
docker push [YOUR-ECR-HOSTNAME].dkr.ecr.us-west-2.amazonaws.com/opera_pge/dswx_hls:1.0.0-er.1.0
[YOUR-ECR-HOSTNAME]
with your hostname specified within your AWS ECR account. You can find this information on AWS ECR or from the earlier steps.Once the PGE image has been pushed / uploaded, you should see it appear in the list of ECR images, under the new repository you created. Below is an example of what you should see in your ECR repository, after a successful upload.
AWS EFS
In order to read sample input datasets for your PGE, as well as to read a configuration file and write out output results, your PGE on AWS Batch will need a storage system to interact with. Batch has integrations with AWS EFS to allow for this. Some setup is required however.
Create an EFS filesystem
The first step is to create an AWS EFS file system onto which you can write and read files from. Ask your administrator, or alternatively if you have permissions, set up your own EFS file system by navigating to AWS EFS' dashboard, and clicking the "Create file system" button.
Virtual Private Cloud (VPC)
that you plan to use for AWS Batch's compute environment (details below)! Otherwise, your PGE images will not be able to access the AWS EFS system. Feel free to specify a single Availability Zone to limit costs, but again, ensure your Batch compute environment is located within the same availability zone.fs-XXX
ID, which you'll need in subsequent steps.Upload PGE Necessary Data to EFS
Before you can upload PGE necessary data and folder structures to EFS, you'll need to have an AWS EC2 instance available that has the AWS EFS filesystem from the previous step mounted as a network data store. Ask your system administrator to set up an EC2 node with the prior EFS filesystem as a mount for you, or follow these steps.
Once you've confirmed you have an EC2 instance available with your prior AWS EFS file system mounted, you'll want to open up a terminal to begin the upload process:
output_dir
to write results in, arun_config_dir
that contains your sample configuration file, and atest_datasets
folder that contains sub-folders of sample input data.Example below:
scp
client to transfer the folder structure and contents above to your AWS EFS mount on your EC2 instance:output_dir
on EFS is writable by the group or user your containers are leveraging. If the permissions are not setup correctly, your PGE will fail due to write and permission denied errors when trying to write results to theoutput_dir
. Consult your system administrator on the proper setup.AWS Batch
We'll now work on setting up AWS Batch such that you'll be able to do a test run of your PGE against the sample data that you previously uploaded.
Setting Up AWS Batch Compute Environments
You'll need a compute environment to run your PGE on Batch. The compute environment helps tell Batch the resources (which EC2 instance types) and the type of procurement model (Spot market, on-demand) to leverage. You can have as many compute environments as you like for various scenarios.
Ask your system administrator, or follow the steps below if you have permission to do these steps:
Setting up AWS Batch Job Queues
AWS Batch Job Queues are essential for defining a mapping between submitted jobs and respective compute environments. Since you can't submit an AWS Batch job directly to a compute environment, you use queues to be the go-between. Create as many queues as you want, including more less or more than the number of compute environments, in case you want to establish a precedence ordering of which compute environment to leverage based on job volumes. See this guide for more explanation.
aws-batch-(ops|dev)-(username)-spotonly-queue
for example. Leave "Priority" to be1
and leave "Scheduling policy ARN' to be disabled.In the "Tags" section, feel free to add tags as you wish. Contact your system administrator on tagging norms and policies.
In the "Connected compute environments" section, select your previously created compute environments from the previous step, and set the compute environment ordering if you select multiple. Multiple compute environments can be mapped to a single queue and are leveraged based on the AWS job scheduler. Why have multiple compute environments? To help balance your resources if you need massive compute scale. See step 8 in this guide for more features / limitations.
Setting up an AWS Batch Job Definition
Job definitions help define the type of job you'd like to submit. For example, you may have different configurations for your PGE that require different settings, versions of binaries, or Run Configs. Having different job definitions helps you with these needs.
:1
or a:2
etc postfix.3600
may not be enough, so set carefully depending on your specific PGE. You can always override this value when submitting a job though!The first step is to specify your "Image" to point to your AWS ECR PGE image. Use the ECR URI from the earlier ECR setup instructions here. Additionally, you'll want to specify "Bash" for "Command syntax" and use the custom HLS PGE Docker arguments to run the command needed (see HLS PGE Users Guide for more details). An example is given below:
Next, you'll want to fill out the "vCpus" and "Memory" fields per your job requirements. This will help map your future job submissions to appropriate EC2 instance nodes within the compute environments you set up, based on the allocation strategies you specified. The default values below are probably sufficient for a test run.
Next, you'll specify the "Job role" and "Security configuration". For the job role, use the Batch service role you defined earlier, and for "Security configuration", enable it and set the user as
conda
per the HLS PGE Users Guide requirements.Next, you'll set up your "Mount points configuration". Again, consult the HLS PGE Users Guide on the specific requirements of which mount points are needed to run the PGE, but for the HLS v1 ecr1 PGE, the following should be copied verbatim:
Next, in the "Volumes configuration" section, you'll want to specify specific sub-directories within your AWS EFS file system to map against the mount points your HLS PGE Docker container requires. Again, consult the HLS PGE Users Guide for more custom specifics for your use case, but for the HLS v1 ecr1 PGE, the following is sufficient, noting to modify the
fs-XXX
to point to the AWS EFS file system ID you set up in a previous step.Submitting a Job
If you followed the steps in the Setup section successfully, you're ready to submit a job! As with all AWS actions, you can use the Command Line Interface (CLI) to programmatically interact with AWS. Submitting jobs is no different. The coverage of using the CLI is outside the scope of this guide, but see this guide for how to set up the CLI and these resources for more details on the Batch CLI interface. For this guide, we'll use the GUI interface. Your local AWS environment may need additional steps to enable CLI use! Consult your system administrator.
Customize the "Job configuration" as you see fit or use the defaults within your job definition.
Modify the "Retry Strategies" if desired, otherwise ignore.
Click "Submit" to submit the job
You can also navigate to the Jobs view in the left hand panel to see more information about your job. You should eventually see your job in the "Running" state
If you click on the job ID, you'll be navigated to a job details page, which includes a link titled "Log stream name" that points to the AWS CloudWatch logging results of your job.
After your job completes, check the AWS CloudWatch logs or your AWS EFS file system's
output_dir
folder for your results!Congratulations on finishing the tutorial!
Next Steps
Some recommendations on further steps you might want to consider:
Beta Was this translation helpful? Give feedback.
All reactions