All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog
- deprecated --species in favor of --format
- support for hg19_ensembl (which contains three_prime_utr and five_prime_utr features, unlike gencode
- untested support for dm6 (untested for a collaborator).
- fixed most_upstream_downstream_positions() and get_longest_transcripts() which did not have python3 compatibility (iteritems -> future.utils.iteritems)
- fixed a bug in create_region_bedfiles that was incorrectly passing arguments to get_all_exons_dict()
- added example priority text files to datasets/
- [miRNA] as a distinct noncoding transcript type as default to classify_transcript_type()
- Changed the priority to prioritize miRNA and noncoding exons over introns.
- create_AS_STRUCTURE creates AS_STRUCTURE files to be used with clipper
- added annotation_functions.py which will eventually contain shared funcs among scripts
- get_region_lengths now reports total lengths in addition to average length of each genomic region
- Transcript-level region functionality to exons
- Transcript-level region functionality to proximal and distal introns
- Transcript-level region functionality to CDS
- Transcript-level region functionality to 3' and 5' UTR regions
- Functionality to determine prox vs distal introns (500bp threshold)
- Classify_transcript_type() allow for finer control of annotating noncoding regions.
- Unit tests for proxdist functions.
- build_gffutils_db
- create_region_bedfiles
- gene_name2id
- miRNA_name2id
- datasets/*priority.txt to reflect prox and distal intron priorities
- is_protein_coding() in favor of classify_transcript_type()
- First sharable commit to github
- Lab slides that show usage
- README now contains examples and default priorities/params