Releases: KathrynCampbell/MADDOG
Version 3 - Return of the King
Updated Datasets and Introducing Reports
Lineage Cleaning
- A previous bug resulted in some lineages being designated despite having under 10 sequences. These lineages have now been removed, and any sequences assigned these lineages are now assigned the next available parental lineage.
- The code for designating new lineages has been updated to prevent this happening in future.
Dataset updates
- Some sequences were included only as N genes even when whole genomes available. This has now been rectified and all sequences are whole genome when available, with N gene sequences included only when no whole genome sequences available.
- Any missing publicly available whole genome and N gene sequences were added - these underwent lineage designation and newly discovered lineages from this were added to the reference.
Reports
- Automated html reports now generated to summarise all the outputs.
Areas to Investigate
- Updates to code to discover emerging/undersampled lineages and singletons of interest
- Individual singletons of interest now also reported
General command line running updates
- Users are prompted to pull github to incorporate any updates before starting a run
- Options to get new lineages verified
Version 2 - Electric Boogaloo
Major update to designation
Please note: All updates apply to the command line tool, not the R package. See README and Vignettes for full details of R package usage.
Features
Cross operating system compatibility
- Same commands for all platforms
- No need for separate windows_assignment command
Major update to designation
- Designation is now done using reference designations as a starting point
- Your sequences will first be assigned a lineage based on the reference set, and then checked to see if the inclusion of your sequences results in new lineages.
- Using the reference set as a starting point allows for much greater comparability and to see your sequences in the context of existing lineages.
- If you wish to run an entirely new designation, not including the reference sequences, you can run
sh new_designation.sh
which will produce designation relevant only to your study and not comparable to the global set.
More informative figures
- The updated designation figures now include a sunburst plot that shows not only the hierarchal relationship of the lineages for the input sequences, but also shows this in context; including all relevant parent and descendant lineages, making it much clearer to understand the evolutionary history
- A tree is produced showing the phylogenetic positions of all relevant lineages, along with a tree with tips coloured to indicate which are the input sequences. This allows much better idea of context, and to see where the new sequences place within existing phylogenies.
Tests for emerging lineages, undersampling and singletons of interest
- The lineage designation process now also checks fo lineages that are undersampled or emerging (all lineage defining requirements met, but only 5-9 sequences within a 5 year period) and provides information about these.
- Additionally, the process also identifies singletons of interest that may indicate sequencing errors, very undersampled lineages or very newly emerging lineages.
Updates to assignment, new functions etc
Many updates to fix bugs, improve assignment, new references and new functions for figure production
MADDOG initial bug fixes
Update to designation: Slight fix when checking if any nodes need removing due to not having enough tips descended and checking if lineages are ‘empty’
Update to naming: Allow for multiple initial parent lineages - e.g. B1 does not need to be descended from A1
Updates to assignment: -Remove accidental arg tests
-Prevent code running to infinity if sequences do not assign; just assign NA
Initial MADDOG release
initial release of MADDOG command line tool and R package for Unix, Mac and Windows (some features currently unavailable for Windows).