Skip to content

Info about how to setup, use and work on UWA's Kaya system. Includes a tutorial with example code

License

Notifications You must be signed in to change notification settings

jamespblloyd-uwa/Kaya-ListerLab-Tutorial

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kaya Server Usage and SLURM Scheduling Tutorial for Lister Lab

This repository contains a tutorial on server usage and SLURM scheduling. The tutorial provides an introduction to server management and demonstrates how to use SLURM for job scheduling.

Table of Contents

Introduction

In this tutorial, you will learn the basics of server management and SLURM job scheduling. The tutorial is designed for beginners and covers the following topics:

  • What is a server?
  • Why use SLURM for job scheduling?
  • How to set up and configure a server.
  • How to submit and manage jobs using SLURM.
  • Best practices and tips for efficient server usage.

Kaya server architecture and storage schematic

Kaya nodes and storage

Prerequisites

Before starting with the tutorial, make sure you have the following prerequisites installed:

☑️ Access to UWA's Kaya server. Email David Grey at UWA for access.

  • Importantly, you'll need to have a description of your project and who else will have access to the data.

☑️ VPN access to UWA, including setup of MS Authenticator in case you work outside of the UNIFI network.

☑️ Test that you could successfully login to Kaya by opening the terminal and ssh into Kaya

ssh <username>@kaya.hpc.uwa.edu.au

This will allow you to login to Kaya's head node. It is paramount that you don't run programs on the head node as it is configured to install programs, write scripts and submit SLURM scripts to the SLURM scheduler. Even compressing a file on the head node can cause issues - it is not designed for any heavy duty tasks.

Installation

You do not have admin (sudo) access to Kaya so you are limited in what and how you can install and use software. There are a lot of programs already pre-installed on Kaya. They are called modules and you can access the available modules by typeing:

module avail

Available Modules on Kaya

You can work out what modules you have loaded with the command (by default you will not have any)

module list

You can load a module with the command

module load gcc/9.4.0

IMPORTANTLY, you'll need to load the gcc compiler for a lot of the programs. Consider adding this line of code to your ~/.bashrc file by login into Kaya and excecute nano ~/.bashrc. Paste module load gcc/9.4.0 in a new line and then save your ~/.bashrc before you exit. You can reload your profile by executing source ~/.bashrc.

To load samtools for example, type either

module load samtools/1.13

or

module load samtools

Try running module list to check if samtools have been successfully loaded.

If you must, you can also unload the modules (in case they clash with conda installs for example) with the command

module unload samtools

Conda installations

Conda is already pre-installed on Kaya. In order to use conda, load the module

module load Anaconda3/2021.05

It's recommended and important to create a new conda environment with the prefix to point to the group data. There is more space on the /group volume and you can easily share the conda installation with your team members, so they don't have to re-install everything themselves.

An example of how to create a new conda environment would be:

conda create -p /group/<your_project_name>/conda_environments/bioinfo -c conda-forge mamba

Mamba is really useful for quicker installations in conda by replacing conda with mamba. For example, after mamba is installed, you can install unicycler with mamba by typing

mamba install -c bioconda unicycler

Once you have setup your conda environment, it's as easy as loading it by executing

conda activate /group/<your_project_name>/conda_environments/bioinfo

You should consider adding a default conda environment to your ~/.bashrc so you can use your favourite programs right away (it is best practice to not install anything into the conda default enviroment base as this can cause unexpected behaviour that is hard to resolve).

Usage

1. Copy files over to Kaya

You can use either Filezilla or good old scp to copy over files to Kaya. For example, you could log into the old PEB servers and scp the tutorial folder.

scp -r /dd_groupdata/tutorial_kaya/ <username>@kaya.hpc.uwa.edu.au:~

2. Check resources on the Kaya HPC cluster

Kaya uses SLURM scheduling and you have a couple of options to check what's available and how the queues look like. The graphical interface that tracks usage of resources can be found here.

The Lister Lab server is reserved at node n035.

Therefore, you can check the resources that are being used with

scontrol show node n035

Resources Available

You can see from the picture above, that 4 cores are in use and therefore 92 cores are available at that time.

To check the queue for the ListerLab server, use the command

squeue -p ll

or for you own jobs

squeue -u <username>

Note at the time of writing, the ListerLab server has the partion variable ll assigned. This was peb but its useage is being phased out.

Check progress on your jobs

sacct

To check the the jobs over the last week and see the memory usage (maxRSS) in GB use

sacct --starttime $(date -d "1 week ago" +%Y-%m-%d) --format=JobID,JobName,State,Elapsed,MaxRSS | awk '{ if ($5 ~ /^[0-9]+K$/) { sub(/K$/, "", $5); printf "%s %s %s %s %.2fGB\n", $1, $2, $3, $4, $5/1024/1024 } else { print } }'

In case the ListerLab (LL; formamly PEB) node is fully utilized, you'll also have access to common Kaya nodes. To list them run

sinfo --noheader --format="%P"

Partitions are like tags on different nodes within Kaya that mark them as appropiate for specific types of jobs. Not all of the nodes have a GPU, so the GPU partitions only includes nodes that have them. The PEB partitions allows for a longer time limit for jobs because we bought the machine and specifically requested this feature. The available partitions have the following wall-time limits

Partition Time Limit (D-HH:MM:SS) Publicly Available Description
work 3-00:00:00 yes For fairly long tasks (TopHat read mapping)
long 7-00:00:00 yes For very long tasks (loop of TopHat read mapping)
gpu 3-00:00:00 yes For GPU intensive tasks
test 00:15:00 yes For very short test of programs or scripts
ll 14-00:00:00 no - ListerLab exclusive For very long tasks (like interactive R sessions)

3. Interactive command-line sessions

To test and develop your code/pipeline/environment, it's benefitial to request an interactive session. You can do so by running

srun \
--time=1:00:00 \
--account=<username> \
--partition=ll \
--nodes=1 \
--ntasks=1 \
--cpus-per-task=4 \
--mem-per-cpu=5G \
--pty /bin/bash -l

to request a 1h session with 4 cores and 20GB of RAM in total.
If you become disconnected and want to rejoin your interactive command-line session, you can find out the job ID, such as 449157.0, with sacct. Once you know it, you can rejoin by using sattach with the job ID, such as this: sattach 449157.0.

IMPORTANTLY exit the session by typing exit in the terminal to free up resources ❗

IMPORTANTLY Unlike the old PEB servers, this interactive session will wall off resources so even if you are not using them, no one else can, so be polite and exit when you don't need it ❗

4. Interactive command-line sessions

Using the ondemand partition, you can go to this website https://ondemand.hpc.uwa.edu.au/ and login with your username and password in order to launch an up to 4 hour long session in which you can launch a graphical desktop and from there launch Jupyter notebooks.

Examples

1. SLURM script syntax

SLURM scripts are essentially bash scripts with a header that indicates what the SLURM scheduler is supposed to do for you. For example it needs to know what resources your scripts require such as

  • time
  • CPUs
  • memory
  • which node(s) (partition) should handle your script.

The way you can specify this is by adding the following lines to your SLURM script:

#!/bin/bash --login

#SBATCH --job-name=<map_da_reads>
#SBATCH --partition=ll
#SBATCH --mem-per-cpu=10G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=4:00:00   
#SBATCH --export=NONE
#SBATCH --mail-user=<uwa_email_address>@uwa.edu.au
#SBATCH --mail-type=BEGIN,END

This is followed by setting some variable names so you can track which SLURM job has resulted in what output (above and below is all intended to be in one script).

# Start of job
echo $JOBNAME job started at  `date`

# To compile with the GNU toolchain
module load gcc/9.4.0
module load star/2.7.9a
module load Anaconda3/2020.11
conda activate /group/<your_project_name>/conda_environments/bioinfo

# leave in, it lists the environment loaded by the modules
module list

#  Note: SLURM_JOBID is a unique number for every job.
#  These are generic variables
JOBNAME=GRCm39_ERCC_STAR_index
SCRATCH=$MYSCRATCH/$JOBNAME/$SLURM_JOBID
RESULTS=$MYGROUP/$JOBNAME/$SLURM_JOBID

2. Example SLURM scripts

Now let's run some SLURM scripts and change them so you can see how they work.

☑️ Login to Kaya and download the tutorial by executing git clone https://github.com/cpflueger2016/Kaya-ListerLab-Tutorial.

3. Launch a SLURM script

sbatch -p ll SLURM_scripts/SLURM_tutorial_fastp.sh

4. Cancel SLURM scripts that have an issue or that are in the queue

scancel <SLURM_JOB_ID>

Contributing

Contributions to this tutorial are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

License

This project is licensed under the [License Name] - add a link to the license file if applicable.

Tutorial intitally written by Christian Pflueger, with edits from James Lloyd.

About

Info about how to setup, use and work on UWA's Kaya system. Includes a tutorial with example code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 96.3%
  • Shell 3.7%