Skip to content

PlummerLab/P20150402DJ-GFFProcessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

P20150402DJ-GFFProcessing

Processing GFF output from CodingQuarry

Author: Darcy Jones
Contact: [email protected]
Date: 02/04/2015

CodingQuarry is a gene prediction program optimised for annotating fungal genomes. We were having a problem extracting ORF sequences from the GFF output of CodingQuarry because it appears to code some exons as 'CDS' types. This notebook converts the type of multiple 'CDS' features with the same parent to 'exon' and recodes the ID appropriately. Then coding sequences of genes for each genome is extracted and saved as fasta files as nucleotide and amino acid sequences.

View the Ipython notebook here.

Required software

This was run with Python 3.4.1 with the following packages:

  • bcbio-gff (0.6.2)
  • biopython (1.65)
  • certifi (14.5.14)
  • ipython (3.0.0)
  • Jinja2 (2.7.3)
  • jsonschema (2.4.0)
  • MarkupSafe (0.23)
  • pyzmq (14.5.0)
  • six (1.9.0)
  • tornado (4.1)
  • mistune (0.5.1)
  • Pygments (2.0.2)

Relevant literature

Testa, A. C., Hane, J. K., Ellwood, S. R., and Oliver, R. P. 2015. CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics. 16:1–12

About

Coding sequence extraction from CodingQuarry output

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages