trackplot.R
is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots) and profile plots from bigWig files.
It has three main utilities:
track_extract()
->track_plot()
- for IGV style track plotsprofile_extract()
->profile_plot()
- for density based profile plotsextract_signal()
->pca_plot()
- PCA analysis based on genomic regions of interest or around TSS sites of reference transcripts.
trackplot.R
is a standalone R script and requires no installation. Just source it and you're good to go! See below for dependencies.
source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")
# OR
download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R')
# OR If you prefer to have it as package
remotes::install_github(repo = "poisonalien/trackplot")
Features:
- It's significantly fast since most of the heavy lifting is done by
bwtool
. Below examples took less than 2 minutes on my 5 year old macbook Pro - Automatically queries UCSC genome browser for gene models and cytobands, making analysis reproducible.
- Supports GTF and standard UCSC gene formats as well.
- Lightweight and minimal dependency
- data.table and bwtool are the only requirements.
- Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
- Customization: Each plot can customized for color, scale, width, etc.
- Tracks can be summarized per condition (by mean, median, max, min)
track_extract()
and track_plot()
are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize
can summarize tracks by condition.
#Path to bigWig files
bigWigs = c("CD34.bw", "EC.bw", "LC.bw", "CD4p.bw", "CD8p.bw")
#Step-1. Extract the siganl for your loci of interst
track_data = track_extract(bigWigs = bigWigs, loci = "chr3:187,715,903-187,752,003")
#Step-1a (optional). Summarize trcks by condition
track_data = track_summarize(summary_list = track_data, condition = c("A", "B", "B", "C", "D"), stat = "mean")
#Step-2.
#Basic Plot
track_plot(summary_list = track_data)
#With gene models (by default autoamtically queries UCSC genome browser for hg19 transcripts)
track_plot(summary_list = track_data, draw_gene_track = TRUE, build = "hg38")
#With GTF file as source for gene models
track_plot(summary_list = track_data, draw_gene_track = TRUE, gene_model = "hg38_refseq.gtf.gz", isGTF = TRUE)
#Heighlight regions of interest
markregions = data.frame(
chr = c("chr3", "chr3"),
start = c(187743255, 187735888),
end = c(187747473, 187736777),
name = c("Promoter-1", "Promoter-2")
)
track_plot(
summary_list = track_data,
draw_gene_track = TRUE,
show_ideogram = TRUE,
build = "hg38",
regions = markregions
)
profile_extract()
and profile_plot()
are functions to generate density based profile-plots from bigWig files.
- Below example for summarizing approx. 33,250 peaks for 5 bigWig files takes around 90 seconds on my 5 year old macbook Pro. This includes generating signal matrix, summarizing, and plotting
- Optionally, it can even query UCSC genome browser for refseq transcripts of desired assembly and summarize around TSS regions
- Replicates can be collapsed into single value per condition
#Example profile plot for a bed file with ~33,250 peaks, centered and extended 2500 bps
profile_data = profile_extract(
bigWigs = bigWigs,
bed = "CD34.bed",
startFrom = "center",
up = 2500,
down = 2500
)
profile_plot(profile_data)
#profile plot for refseq protein-coding genes (TSS +/2500)
profile_data = profile_extract(
bigWigs = bigWigs,
ucsc_assembly = "hg38",
startFrom = "tss",
up = 2500,
down = 2500
)
profile_plot(profile_data)
bwpcaplot()
is a function to perform PCA analysis based on genomic regions of interest or around TSS sites of reference transcripts. Plot data and region summaries returned to the user.
#PCA using UCSC protein coding reference transcripts (TSS+/- 2500 bp)
refseq_summary = extract_summary(
bigWigs = bigWigs,
ucsc_assembly = "hg38"
)
pca_plot(summary_list = refseq_summary)
#PCA using genomic regions of interest (BED file)
bed_summary = extract_summary(
bigWigs = bigWigs,
bed = "sample.bed",
custom_names = c("CD34", "EC", "LC", "CD4+", "CD8+")
)
pca_plot(summary_list = bed_summary)
- data.table R package - which itself has no dependency.
- bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g;
/usr/local/bin
). Or, you could also add the path where bwtool is located to R session with the below command.
#Example
Sys.setenv(PATH = paste("/Users/anand/Documents/bwtool_dir/", Sys.getenv("PATH"), sep=":"))
- If you have trouble compiling the tool, follow these instructions. Alternatively, you can download the pre-built binary for macOS or centOS
PSA If you find the tool useful, consider starrig this repository or upvoting this Biostars thread so that more poeple can find it :)
- Windows OS is not supported
If you find the script useful consider citing bwtool
Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365; PMCID: PMC4029031.
Joschka Hey for all the cool suggestions :)