Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@fedarko fedarko released this 29 Dec 22:00
· 4 commits to main since this release

New features

  • Added the s1_name and s2_name parameters to viz_imshow() and viz_spy() (#17).

    • You can use these parameters to adjust the sequence name shown on the x or y axis, respectively.

Performance

  • Updated the default algorithm for finding shared k-mers between two strings. Rather than manually iterating through the suffix arrays produced by pydivsufsort.divsufsort() to identify shared k-mers, we now instead use pydivsufsort.common_substrings() -- which accomplishes the same thing much faster.

    • For example, creating a dot plot of two random 1 Mbp sequences (using k = 20) takes 26.52 seconds (max memory usage 563.09 MiB) using the old method and 2.16 seconds (max memory usage 617.53 MiB) using the new method.

    • The new method does have higher memory usage for long sequences -- to the point where some test cases crash Jupyter Notebook with the new method, but succeed (albeit after taking a long time) with the old method. If desired, you can choose to use the old method ("suff-only") by passing suff_only=True to the DotPlotMatrix() constructor.

  • When using the old shared-k-mer-finding method ("suff-only") and creating a self dot plot (i.e. comparing a sequence with itself), wotplot will now detect if the sequences are identical and reuse the sequence's suffix array (rather than unnecessarily creating the same suffix array twice).

    • Note that there are still other, more dramatic ways to speed up the creation of self dot plots; this is a relatively small improvement.

Maintenance

  • Breaking change: Removed the binary parameter of the DotPlotMatrix() constructor.

    • Now, all matrices are "not binary" by default. If you want to visualize a matrix in a binary way, you can now set binary=True when calling viz_imshow() or viz_spy() (this is analogous to the force_binary parameter in wotplot 0.3.0). Matrix visualization defaults to visualizing the matrix in color.

    • This change was motivated by benchmarking -- making the matrix "not binary" didn't seem to impact construction speed. It does impact visualization speed slightly, but the effects of this can be (entirely, I think?) offset by using binary=True during visualization.

    • Sorry for the breaking change. I think this makes the interface a lot more natural to use.

  • Abstracted code from the benchmarking notebook to a separate file in the docs/ folder, and tried to tidy it up.

  • Restructured and tested the SciPy version checking code.

  • Additional tests.

  • Added pytest-mock as a development dependency.

Bug fixes

  • Fixed a bug where, if every single cell in the dot plot matrix was a match cell, the resulting dot plot would be empty (#19).

Documentation

  • Add better documentation to wotplot._make._get_row() and wotplot._make._make().

  • Various updates to the README, tutorial notebook, and benchmarking notebook.

  • The tutorial now includes a more complex example of creating a grid of dot plots. The example (shown below) creates a grid figure comparing all pairs of sequences with each other.

Fancy screenshot of a grid of dot plots from the tutorial

grid