Updated Datasets and Introducing Reports
Lineage Cleaning
- A previous bug resulted in some lineages being designated despite having under 10 sequences. These lineages have now been removed, and any sequences assigned these lineages are now assigned the next available parental lineage.
- The code for designating new lineages has been updated to prevent this happening in future.
Dataset updates
- Some sequences were included only as N genes even when whole genomes available. This has now been rectified and all sequences are whole genome when available, with N gene sequences included only when no whole genome sequences available.
- Any missing publicly available whole genome and N gene sequences were added - these underwent lineage designation and newly discovered lineages from this were added to the reference.
Reports
- Automated html reports now generated to summarise all the outputs.
Areas to Investigate
- Updates to code to discover emerging/undersampled lineages and singletons of interest
- Individual singletons of interest now also reported
General command line running updates
- Users are prompted to pull github to incorporate any updates before starting a run
- Options to get new lineages verified