New features
-
Added the
s1_name
ands2_name
parameters toviz_imshow()
andviz_spy()
(#17).- You can use these parameters to adjust the sequence name shown on the x or y axis, respectively.
Performance
-
Updated the default algorithm for finding shared k-mers between two strings. Rather than manually iterating through the suffix arrays produced by
pydivsufsort.divsufsort()
to identify shared k-mers, we now instead usepydivsufsort.common_substrings()
-- which accomplishes the same thing much faster.-
For example, creating a dot plot of two random 1 Mbp sequences (using k = 20) takes 26.52 seconds (max memory usage 563.09 MiB) using the old method and 2.16 seconds (max memory usage 617.53 MiB) using the new method.
-
The new method does have higher memory usage for long sequences -- to the point where some test cases crash Jupyter Notebook with the new method, but succeed (albeit after taking a long time) with the old method. If desired, you can choose to use the old method ("suff-only") by passing
suff_only=True
to theDotPlotMatrix()
constructor.
-
-
When using the old shared-k-mer-finding method ("suff-only") and creating a self dot plot (i.e. comparing a sequence with itself), wotplot will now detect if the sequences are identical and reuse the sequence's suffix array (rather than unnecessarily creating the same suffix array twice).
- Note that there are still other, more dramatic ways to speed up the creation of self dot plots; this is a relatively small improvement.
Maintenance
-
Breaking change: Removed the
binary
parameter of theDotPlotMatrix()
constructor.-
Now, all matrices are "not binary" by default. If you want to visualize a matrix in a binary way, you can now set
binary=True
when callingviz_imshow()
orviz_spy()
(this is analogous to theforce_binary
parameter in wotplot 0.3.0). Matrix visualization defaults to visualizing the matrix in color. -
This change was motivated by benchmarking -- making the matrix "not binary" didn't seem to impact construction speed. It does impact visualization speed slightly, but the effects of this can be (entirely, I think?) offset by using
binary=True
during visualization. -
Sorry for the breaking change. I think this makes the interface a lot more natural to use.
-
-
Abstracted code from the benchmarking notebook to a separate file in the
docs/
folder, and tried to tidy it up. -
Restructured and tested the SciPy version checking code.
-
Additional tests.
-
Added
pytest-mock
as a development dependency.
Bug fixes
- Fixed a bug where, if every single cell in the dot plot matrix was a match cell, the resulting dot plot would be empty (#19).
Documentation
-
Add better documentation to
wotplot._make._get_row()
andwotplot._make._make()
. -
Various updates to the README, tutorial notebook, and benchmarking notebook.
-
The tutorial now includes a more complex example of creating a grid of dot plots. The example (shown below) creates a grid figure comparing all pairs of sequences with each other.