Skip to content

Guide: setting up workstation and cluster

Nicholas Polizzi edited this page Jul 7, 2024 · 11 revisions

Compute resources

Welcome! Our lab uses the following computer resources:

  • Your personal computer for quick simple tasks
  • The main workstation npl1.in.hwlab for developing code and for relatively lightweight tasks
  • The CPU and GPU servers, np-cpu-1, np-gpu-1, and np-gpu-2 for a similar purpose, although these have much more compute than the workstation
  • The o2 compute cluster, for running more intensive jobs.

Our workstation and servers are shared by all lab members for interactively developing code. They are not intended for larger, more intense jobs! To make sure everybody gets to share the resources, please be mindful and monitor your resource use with htop. If you are running a compute-intensive or memory-intensive job, please submit them to o2. We have a guide below on how to use O2.

Once your have access to the workstations and o2, please configure your account to use our lab's shared python environments as follows:

  • On the workstation, copy the lines in our shared .bashrc onto the top of your personal bashrc at ~/.bashrc. This will set various environmental variables.
  • On o2, make sure that you are part of the polizzi user group so that you have read/write permissions to our lab shared directory. To check whether you do, type the command groups to list the user groups your account is part of. If polizzi is not in the list, please email o2 IT and ask them to add you to the UNIX user group polizzi.
  • Once you do, on o2, copy the lines in our shared .bashrc onto the top of your personal bashrc at ~/.bashrc.

Connecting to the workstation

Most lab members choose to use vscode as a powerful, intuitive, and graphical way to connect to the workstation. It is easy and convenient for running python notebooks, developing python scripts, and general file management.

Using vscode remote for the workstation

Install vscode on your personal computer and install then install the remote ssh extension. Use npl1.in.hwlab as the remote host. This will let you run commands and notebooks on the workstation.

While vscode is convenient, it is limited to simple tasks. You will need other tools for more sophisticated tasks, such as uploading/downloading lots of files, viewing .pdb files, submitting longer jobs (to continue running after exiting vscode), or extended text-based terminal commands.

Text-based commands

For running text-based commands, it is recommended to use ssh from a dedicated terminal app. This is a better alternative to the small default vscode terminal window!

  • On Mac, you can use the built-in Terminal app or download iTerm2; on Windows, download putty. The default colors and font sizes are rather ugly; please modify preferences to find something that suits you better.
  • Run the command ssh [email protected]. This will connect you to the workstation, log you in, and present you with a text-based "shell" to interfacing with the workstation. SSH stands for secure shell. If you are new to this way of using computers, please google "introduction to the unix shell" for some guides.
  • Under the hood, ssh is a protocol that your local computer uses to communicate with the workstation. Once the connection is established there are many things you can do. In fact, many other tools such as vscode and scp are built on top of an underlying ssh connection.

For submitting longer-running commands, it is recommended to use tmux. See our Guide to using tmux

File transfer

Transferring files to and from the workstation can be pretty clunky.

  • Some use vscode; some use command-line tools such as scp or rsync on the terminal (google these if curious).
  • To ease the friction of manually using rsync, I (jchang) like to use a helper script on my local machine as a wrapper to rsync. You can find it here.
  • Try this! If you want to drag-and-drop files on the workstation as if they were local files on your computer, you can use the tool sshfs. This uses an ssh to mount a workstation directory as a separate filesystem (fs) on your own computer (google "unix mount" if curious). Once you download sshfs on your computer, run this command on your computer: sudo sshfs -o allow_other,default_permissions {USER}@transfer.sbgrid.org:/nfs/polizzi/{USER} /PATH/TO/MOUNT/POINT. Here {USER} is your sbgrid username, and /PATH/TO/MOUNT/POINT is an empty folder on your computer. Your workstation files will then "magically" appear under that empty folder you specified. Note that you will have to reconnect each time you lose access to the internet because the underlying ssh connection will be terminated.

Viewing pdb files

  • One way is to transfer them to your local computer with the above methods and then open pymol.
  • Another way is to use the protein viewer extension in vscode. It works but it's a little clunky.
  • Finally, one way is to host a http server on the workstation and point your pymol to load files from there. Jody uses the script here. Just run the script and copy-paste the load commands into your pymol command window.

Connecting to o2

TODO

Some helpful links for now: