Skip to content
Jeffrey Chang edited this page Aug 18, 2023 · 17 revisions

Polizzi lab dry lab wiki

Welcome! This wiki serves as

  1. A guide to getting set up using our lab's computational resources (workstation and o2)
  2. Information about shared software we have installed
  3. General tips and tricks for how to use computers

If you are new or just getting started, please follow the Guide: setting up workstation and cluster.

Shared softwares are installed on the workstation under /nfs/polizzi/shared/ and on o2 under /n/data1/hms/bcmp/polizzi/lab/. Click the side bar --> to view softwares in each category.

If you install a new piece of software, please:

  • Put it on the workstation and/or O2 under the corresponding category.
  • Add an entry to the wiki page of that software category (hit "edit" in the upper right)
  • Consider including example scripts or a tutorial for how to use the software.
  • Ping the other group members on the #computation slack channel

SLURM on o2

For GPU-dependent jobs, you can use either submit to the the gpu or gpu_quad (?) partition; the gpu_quad partition tends have more resources.

Using Slurm

Slurm is a workload manager that has been installed on our workstations. It helps manage compute resources by allowing users to submit jobs while requesting a certain amount of resources (such as CPU cores or GPUs). Allowing Slurm to schedule these jobs allows us allocate resources more efficiently and reduce manual management of jobs, i.e. checking when jobs to finish before starting new ones. Read more here: https://slurm.schedmd.com/quickstart.html

Currently Slurm is installed for 3 workstations (aka partitions): np-cpu-1, np-gpu-1, and np-gpu-2. Our main workstation npl1.in.hwlab is NOT accessible via Slurm right now. Even though it is possible to submit jobs without Slurm on these 3 partitions, it is not recommended because this may interfere with existing Slurm jobs. Therefore, everything submitted to np-cpu-1, np-gpu-1, and np-gpu-2 should be through Slurm. npl1.in.hwlab can be used 'interactively' for general development, prototyping, and shorter runs. Ideally, VScode should be run through npl1.in.hwlab as well.

On slurm we have two partitions:

  • np-cpu submits to np-cpu-1
  • np-gpu submits to np-gpu-1 or np-gpu-2 depending on which one has more available resources

Here is a commented Slurm bash script. All comments starting with #SBATCH are interpreted by Slurm, the order does not matter.

#!/bin/bash

#SBATCH -A polizzi # must include this for submit permissions
#SBATCH -p np-cpu # either np-gpu or np-cpu
#SBATCH -c 2 # number of cores
#SBATCH -J job_name # job name
#SBATCH -o slurm.%x.%A.out # standard out will be printed to this file. %A is the job ID number and %x is the job name
#SBATCH -e slurm.%x.%A.err # standard error will be printed to this file

#SBATCH [email protected] # email for updates 
#SBATCH --mail-type=ALL # include all types of updates - when the job starts, if it fails, when it finishes


#OPTIONAL commands:

#SBATCH -w np-gpu-1 # ONLY if using np-gpu - you can specify whether you want np-gpu-1 or np-gpu-2 here 
#SBATCH --mem=20G # memory requested - OPTIONAL for now since it's difficult to predict and your run will stop if you exceed this
#SBATCH --array=0-8 # submit an array of jobs. In this case the commands below will be run 9 times, each time the variable `${SLURM_ARRAY_TASK_ID}` will be different. 


#SCRIPT below: everything below is what you would typically type on the command line to submit a job.

python run_combs.py 

Other important notes:

  • to submit the above script, save it as slurm_script.sh and sbatch slurm_script.sh
  • sacct to view currently running jobs
  • scancel <jobid> to cancel a running or pending job
  • jobs can be submitted and viewed while ssh'ed into any partition including npl1.in.hwlab
Clone this wiki locally