Skip to content

tackling the all-vs-all matrix

Compare
Choose a tag to compare
@ekg ekg released this 29 May 18:10
· 706 commits to main since this release
517e1bc

Buildable Source Tarball: wfmash-v0.14.0.tar.gz

This release provides support for subsetting the queries which are used in addition to the target subsetting. A list of queries can be offered. (We still work with only a single target though.) The idea is that this will make it possible for us to subdivide the all-versus-all alignment matrix and run many small jobs where multiple queries are aligned against a single target. However, running all queries against one target would be computationally infeasible, because there might be many hundreds of thousands of queries. There are some other bug fixes and updates as well, but the main difference that triggers a release is the change in the command line API.

changelog

Query filtering and specification improvements

  • Added support for specifying a comma-delimited list of query name prefixes to filter queries with the -Q/--query-prefix option.
  • Added -A/--query-list option to specify a file containing a list of query sequence names to use.
  • Updated internal sequence iteration and counting logic to properly apply the new query filtering options.

Target filtering option name changes

  • Renamed target prefix filtering option from -P/--target-prefix to -T/--target-prefix for consistency.
  • Renamed target list filtering option from -A/--target-list to -R/--target-list.

All-to-all alignment script improvements

  • Updated scripts/all2all_jobs.py to:
    • Support grouping by genome, haplotype, or contig.
    • Allow specifying different grouping levels for target and query sequences.
    • Directly generate wfmash command lines.
  • Added scripts/make_source_targball.sh to generate a source tarball for releases.

Build and testing updates

  • Added back rt library to CMake configuration.
  • Updated CI tests to run on the main branch.
  • Adjusted CI test cases for the subset of the LPA dataset.

Bug fixes

  • Fixed a heap-use-after-free error in wflign_affine_wavefront().