Skip to content

Latest commit

 

History

History
335 lines (251 loc) · 11.7 KB

README.mkd

File metadata and controls

335 lines (251 loc) · 11.7 KB

OSwitch

One-line access to other operating systems.

Mini-example:

mymacbook:~/2015-02-01-myproject> abyss-pe
    zsh: command not found: abyss-pe
mymacbook:~/2015-02-01-myproject> oswitch -l
    yeban/biolinux:8
    ubuntu:14.04
    ontouchstart/texlive-full
    ipython/ipython
    hlapp/rpopgen
mymacbook:~/2015-02-01-myproject> oswitch biolinux
    ###### You are now running: biolinux in container biolinux-7187. ######
biolinux-7187:~/2015-02-01-myproject> abyss-pe k=25 reads.fastq.gz
    [... just works on your files where they are...]
biolinux-7187:~/2015-02-01-myproject> exit
mymacbook:~/2015-02-01-myproject>
    [... output is where you expect it to be ...]

Detailed information:

Background

Genomic analyses require jumping back and forth between many bioinformatics tools. The data types are young, thus so are the tools. This leads to frequent updates of tools that are often challenging to install. Furthermore, it is challenging to keep different versions of software for different projects, yet changing versions can make analyses difficult to reproduce. To make matters worse, genomicists often lack the skills necessary to setup complex bioinformatics software, and systems administrators can be overwhelmed by large numbers of software installation requests.

Aim

We are developing oswitch to enable seamless switching from one operating system to another - providing access to diverse ranges of tools. This project grew from our own need to rapidly access diverse pieces of specific versions of software including those distributed as part of BioLinux on our MacBooks and our university HPC system. This was previously too difficult, but is important for reproducible research and agility. @bmpvieira made it clear early on that the docker could help make something happen. We first shared the resulting concept during a Balti & Bioinformatics presentation in the form of this (now out of date!) slide:

Slide from Balti

Docker images feel similar to using virtual machine images - but are much more flexible and light-weight. They are thus easily shared or published. We are extremely lucky to be able to build upon this amazing technology.

Features

oswitch is a wrapper facilitating access to docker images. Importantly, when switching operating systems inside a shell, most things remain unchanged:

  • Current working directory is maintained
  • User name, uid and gid are maintained
  • Login shell (bash/zsh/fish) is maintained
  • Home directory is maintained (thus all .dotfiles and config files are maintained).
  • read/write permissions are maintained
  • Paths are maintained whenever possible. Thus volumes (external drives, NAS, USB) mounted on the host are available in the container at the same path.

Example Usage

There are two broad usage scenarios: interactive use & non-interactive use.

Use a package interactively in a normal command-line

Minimalist example:

Yannick@n56-169 ~/myproject> uname -a
Darwin n56-169.sbcs.qmul.ac.uk 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64
Yannick@n56-169 ~/myproject> oswitch yeban/biolinux
### You are now running: biolinux_8, in container: biolinux_8-27182. ###
Yannick@biolinux_8-27182 ~/myproject> uname -a
Linux biolinux_8-27182 3.16.4-tinycore64 #1 SMP Thu Oct 23 16:14:24 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Biologically relevant example:

# Trying to run blast.
pixel:~/test/ $ ls 
mygene.fasta
pixel:~/test/ $ cat mygene.fa
>myfavoritegene isthisone
MNTLWLSLWDYPGKLPLNFMVFDTKDDLQAAYWRDPYSIPLAVIFEDPQPISQRLIYEIR
TNPSYTLPPPPTKLYSAPISCRKNKTGHWMDDILSIKTGESCPVNNYLHSGFLALQMITD
ITKIKLENSDVTIPDIKLIMFPKEPYTADWMLAFRVVIPLYMVLALSQFITYLLILIVGE
KENKIKEGMKMMGLNDSVF
pixel:~/test/ $ blastp -query mygene.fa -remote -db nr -outfmt 7 > mygene_blastp_nr.tab
zsh: command not found: blastp
# Indeed... blastp is missing from my MacBook. 

# Switch to BioLinux and run blastp.
pixel:~/test/ $ oswitch yeban/biolinux
###### You are now running: biolinux in container biolinux-7187. ######
biolinux-7187:~/test/ $ blastp -query mygene.fa -remote -db nr -outfmt 7 >  mygene_blastp_nr.tab
# BioLinux includes blastp, thus the command ran smoothly.

# View the result.
biolinux-7187:~/test/ $ head mygene_blastp_nr.tab
# BLASTP 2.2.28+
# Query: myfavoritegene isthisone
# RID: BJAHAHU9015
# Database: nr
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 501 hits found
myfavoritegene	gi|322796550|gb|EFZ19024.1|	100.00	199	0	0	1	199	1	199	2e-142	 407
myfavoritegene	gi|307183032|gb|EFN69988.1|	86.07	201	25	2	1	199	80	279	6e-115	 361
myfavoritegene	gi|572260155|ref|XP_006608402.1|	80.60	201	36	2	1	199	95	294	4e-108	 350
myfavoritegene	gi|328778864|ref|XP_397465.4|	80.60	201	36	2	1	199	95	294	5e-108	 350


# [... potentially run other analyses that require biolinux things...]

# Return to normal operating system
biolinux-7187:~/test/ $ exit
pixel:~/test/ $ ls 
mygene.fasta mygene_blastp_nr.txt
# our newly generated file is where we'd expect it to be.
Use a package non-interactively

Alternatively, single commands can be run directly in a container (e.g. BioLinux) without entering it interactively. This can be useful to test new tools, or to run a single piece of not-locally-installed software as part of a single command. The container terminates automatically once the command has been executed, output is printed to the terminal and can be redirected, and the exit status of the command run within container is returned.

# Run command directly in BioLinux and view results if success.
pixel:~/test/ $ oswitch yeban/biolinux blastp -remote -query mygene.fa -db nr > mygene_blastp_nr.txt
Listing available operating system containers

OSwitch can pull any image from docker hub. You can see the images you pulled from docker hub using oswitch as:

pixel:~ $ oswitch -l
yeban/biolinux:8
ubuntu:14.04
ontouchstart/texlive-full
ipython/ipython
hlapp/rpopgen
Availability

We have tested OSwitch on:

  • Mac OS X Yosemite
  • Ubuntu 14.04.1
  • CentOS 7
Caveats
  • Some features work only for Debian, Ubuntu, CentOS based docker images.
  • Host directories/volumes with paths conflicting with container paths are skipped.
  • SELinux must be disabled on CentOS for mounting volumes to work.
  • Volume mounting on Mac OS hosts is imperfect.

Installation

OSwitch first requires a [working docker install](#Install and setup docker).

Install oswitch

Mac
$ brew install https://raw.githubusercontent.com/yeban/oswitch/master/homebrew/oswitch.rb
Linux

Requirements: Ruby 2.0 or higher.

$ gem install oswitch

Test oswitch

$ oswitch ubuntu:14.04

Install and setup docker

Mac OS X

Installing docker is much easier than before - https://docs.docker.com/installation/mac/

Ubuntu

Installing docker - https://docs.docker.com/installation/ubuntulinux/

Add yourself to docker group so you can run docker client without sudo:

    $ sudo usermod -aG docker `whoami`
    
    # then logout and login again for the above command to take effect
CentOS

Installing docker - https://docs.docker.com/installation/centos/

Add yourself to docker group so you can run docker client without sudo:

    $ sudo usermod -aG docker `whoami`
    
    # then logout and login again for the above command to take effect

Disable SELinux as it gets in the way of mounting volumes within the container:

    $ sed -i .bak 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

    # then reboot your system

The above command backs up the original file to /etc/selinux/config.bak. If you are concerned about disabling SELinux, do note that we are trying to work out a better solution.

Test that docker is correctly installed

The following should give an encouraging message:

$ docker run hello-world

FAQ

Q. Directories mounted within container on Mac host are empty.

The problem is, on Mac boot2docker is the real host, not OS X. oswitch can mount only what's available to it from boot2docker. For example, /Applications.

Run boot2docker ssh ls /Applications and you will find it empty as well.

The workaround is to correctly mount the directories you want in boot2docker first.

boot2docker down
VBoxManage sharedfolder remove boot2docker-vm --name Applications
VBoxManage sharedfolder add boot2docker-vm --name Applications --hostpath /Applications
boot2docker up
boot2docker ssh "sudo mkdir -p /Applications && sudo mount -t vboxsf -o uid=1000,gid=50 Applications /Applications"
Q. cwd is empty in the container

This means the said directory was not mounted by oswitch, or was incorrectly mounted. On Linux host, directories that can conflict with paths within container are not mounted. On Mac, boot2docker can get in the way.

Please report this on our issue tracker. To help us debug, please include:

  1. the directory in question
  2. the operating system you are running
Q. oswitch does not work with my docker image

Please report this on our issue tracker with oswitch's output. If the image you are using is not available via docker hub or another public repository, please include the Dockerfile as well.

Q. How does all this work?

We create a new image on the fly that inherits from the given image. While creating the new image we execute a shell script that installs packages required for oswitch to work and creates a user in the image (almost) identical to that on the host.

Roadmap

  1. make it possible to use docker containers without inheriting our current baseimage
  2. gem distribution for easier installation
  3. brew recipe for Mac
  4. deb package
  5. test on QMUL's compute cluster
  6. make available images for common bioinformatics software
  7. deploy at RAL/JASMIN
  8. create an SELinux policy to run oswitch on CentOS without having to disable SELinux entirely
  9. rpm package

Contribute

$ git clone https://github.com/yeban/oswitch
$ cd oswitch
$ gem install bundler && bundle
$ bundle exec bin/oswitch biolinux

Contributors & Funding


Development funded as part of NERC Environmental Omics (EOS) Cloud at
Wurm Lab, Queen Mary University of London.