-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
85841d8
commit f9a908a
Showing
10 changed files
with
319 additions
and
388 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
[deps] | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" | ||
XAM = "d759349c-bcba-11e9-07c2-5b90f8f05f7c" | ||
|
||
[compat] | ||
Documenter = "0.27" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# BAI files | ||
|
||
The `XAM` package supports the BAI index to fetch records in a specific range from a BAM file. | ||
[Samtools](https://samtools.github.io/) provides `index` subcommand to create an index file (.bai) from a sorted BAM file. | ||
|
||
```console | ||
$ samtools index -b SRR1238088.sort.bam | ||
$ ls SRR1238088.sort.bam* | ||
SRR1238088.sort.bam SRR1238088.sort.bam.bai | ||
``` | ||
|
||
The method `eachoverlap(reader, chrom, range)` returns an iterator of BAM records overlapping the query interval: | ||
|
||
```julia | ||
reader = open(BAM.Reader, "SRR1238088.sort.bam", index="SRR1238088.sort.bam.bai") | ||
for record in eachoverlap(reader, "Chr2", 10000:11000) | ||
# `record` is a BAM.Record object | ||
# ... | ||
end | ||
close(reader) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
```@meta | ||
CurrentModule = XAM | ||
DocTestSetup = quote | ||
using XAM | ||
end | ||
``` | ||
|
||
# BAM files | ||
|
||
|
||
## The `BAM.Record` | ||
|
||
The `XAM` package supports the following accessors for `BAM.Record` types. | ||
|
||
```@docs | ||
BAM.Record | ||
``` | ||
|
||
## `BAM.Reader` and `BAM.Writer` | ||
|
||
### Reference: | ||
```@docs | ||
BAM.Reader | ||
BAM.Writer | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
|
||
# Readers and writers | ||
|
||
## Reading SAM and BAM files | ||
|
||
Below is an example of a typical script iterating over all records in a file: | ||
|
||
```julia | ||
using XAM | ||
|
||
# Open a BAM file. | ||
reader = open(BAM.Reader, "data.bam") | ||
|
||
# Iterate over BAM records. | ||
for record in reader | ||
# `record` is a BAM.Record object. | ||
if BAM.ismapped(record) | ||
# Print the mapped position. | ||
println(BAM.refname(record), ':', BAM.position(record)) | ||
end | ||
end | ||
|
||
# Close the BAM file. | ||
close(reader) | ||
``` | ||
|
||
The size of a BAM file is often extremely large. | ||
The iterator interface demonstrated above allocates an object for each record and that may be a bottleneck of reading data from a BAM file. | ||
In-place reading reuses a pre-allocated object for every record and less memory allocation happens in reading: | ||
|
||
```julia | ||
reader = open(BAM.Reader, "data.bam") | ||
record = BAM.Record() | ||
while !eof(reader) | ||
empty!(record) | ||
read!(reader, record) | ||
# do something | ||
end | ||
``` | ||
|
||
## Writing SAM and BAM files | ||
|
||
In order to write a BAM or SAM file, you must first create a `SAM.Header`. | ||
|
||
A `SAM.Header` is constructed from a vector of `SAM.MetaInfo` objects. | ||
|
||
For example, to create the following simple header: | ||
|
||
``` | ||
@HD VN:1.6 SO:coordinate | ||
@SQ SN:ref LN:45 | ||
``` | ||
|
||
```julia | ||
julia> a = SAM.MetaInfo("HD", ["VN" => 1.6, "SO" => "coordinate"]) | ||
SAM.MetaInfo: | ||
tag: HD | ||
value: VN=1.6 SO=coordinate | ||
|
||
julia> b = SAM.MetaInfo("SQ", ["SN" => "ref", "LN" => 45]) | ||
SAM.MetaInfo: | ||
tag: SQ | ||
value: SN=ref LN=45 | ||
|
||
julia> h = SAM.Header([a, b]) | ||
SAM.Header(SAM.MetaInfo[SAM.MetaInfo: | ||
tag: HD | ||
value: VN=1.6 SO=coordinate, SAM.MetaInfo: | ||
tag: SQ | ||
value: SN=ref LN=45]) | ||
|
||
``` | ||
|
||
Then to create the writer for a SAM file, construct a `SAM.Writer` using the header and an `IO` type: | ||
|
||
```julia | ||
julia> samw = SAM.Writer(open("my-data.sam", "w"), h) | ||
SAM.Writer(IOStream(<file my-data.sam>)) | ||
|
||
``` | ||
|
||
To make a BAM Writer is slightly different, as you need to use a specific stream type from the [https://github.com/BioJulia/BGZFStreams.jl](https://github.com/BioJulia/BGZFStreams.jl) package: | ||
|
||
```julia | ||
julia> using BGZFStreams | ||
|
||
julia> bamw = BAM.Writer(BGZFStream(open("my-data.bam", "w"), "w")) | ||
BAM.Writer(BGZFStreams.BGZFStream{IOStream}(<mode=write>)) | ||
|
||
``` | ||
|
||
Once you have a BAM or SAM writer, you can use the `write` method to write `BAM.Record`s or `SAM.Record`s to file: | ||
|
||
```julia | ||
julia> write(bamw, rec) # Here rec is a `BAM.Record` | ||
330780 | ||
``` | ||
|
||
|
||
## Getting records overlapping genomic features | ||
|
||
The `eachoverlap` method also accepts the `Interval` type defined in [GenomicFeatures.jl](https://github.com/BioJulia/GenomicFeatures.jl). | ||
|
||
This allows you to do things like first read in the genomic features from a GFF3 file, and then for each feature, iterate over all the BAM records that overlap with that feature. | ||
|
||
```julia | ||
using GenomicFeatures | ||
using GFF3 | ||
using XAM | ||
|
||
# Load genomic features from a GFF3 file. | ||
features = open(collect, GFF3.Reader, "TAIR10_GFF3_genes.gff") | ||
|
||
# Keep mRNA features. | ||
filter!(x -> GFF3.featuretype(x) == "mRNA", features) | ||
|
||
# Open a BAM file and iterate over records overlapping mRNA transcripts. | ||
reader = open(BAM.Reader, "SRR1238088.sort.bam") | ||
for feature in features | ||
for record in eachoverlap(reader, feature) | ||
# `record` overlaps `feature`. | ||
# ... | ||
end | ||
end | ||
close(reader) | ||
``` |
Oops, something went wrong.