Skip to content

Commit

Permalink
Merge pull request #179 from waveygang/update_wfa2
Browse files Browse the repository at this point in the history
update WFA2-lib
  • Loading branch information
AndreaGuarracino authored Jun 15, 2023
2 parents b33491f + b0d92b2 commit 36b875e
Show file tree
Hide file tree
Showing 59 changed files with 1,976 additions and 1,062 deletions.
43 changes: 43 additions & 0 deletions src/common/wflign/deps/WFA2-lib/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
lib/
bin/
build/

# Prerequisites
*.d

# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Fortran module files
*.mod
*.smod

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app

# Test output files
tests/wfa.utest.log.correct
tests/wfa.utest.log.mem
tests/wfa.utest.log.time


21 changes: 17 additions & 4 deletions src/common/wflign/deps/WFA2-lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ if (${CMAKE_BUILD_TYPE} MATCHES Release)
endif()

if ((${CMAKE_BUILD_TYPE} MATCHES Release) OR (${CMAKE_BUILD_TYPE} MATCHES RelWithDebInfo))
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS} ${OPTIMIZE_FLAGS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS} ${OPTIMIZE_FLAGS}")
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OPTIMIZE_FLAGS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OPTIMIZE_FLAGS}")
endif ()

if (${CMAKE_BUILD_TYPE} MATCHES "Debug")
Expand Down Expand Up @@ -123,6 +123,9 @@ set(wfa2lib_SOURCE
wavefront/wavefront_sequences.c
wavefront/wavefront_slab.c
wavefront/wavefront_unialign.c
wavefront/wavefront_termination.c
wavefront/wavefront_extend_kernels_avx.c
wavefront/wavefront_extend_kernels.c
system/mm_stack.c
system/mm_allocator.c
system/profiler_counter.c
Expand Down Expand Up @@ -150,6 +153,11 @@ target_include_directories(wfa2_static PUBLIC . wavefront utils)
add_library(wfa2::wfa2 ALIAS wfa2)
add_library(wfa2::wfa2_static ALIAS wfa2_static)

if(OPENMP)
target_link_libraries(wfa2_static PRIVATE OpenMP::OpenMP_C)
target_link_libraries(wfa2 PRIVATE OpenMP::OpenMP_C)
endif(OPENMP)

# ---- C++ binding library

set(wfa2cpp_SOURCE
Expand All @@ -163,13 +171,18 @@ add_library(wfa2cpp SHARED ${wfa2cpp_SOURCE})
set_target_properties(wfa2cpp PROPERTIES SOVERSION 0)
set_target_properties(wfa2cpp_static PROPERTIES OUTPUT_NAME wfa2cpp)
target_link_libraries(wfa2cpp PUBLIC wfa2)
target_link_libraries(wfa2cpp_static PUBLIC wfa2)
target_link_libraries(wfa2cpp_static PUBLIC wfa2_static)
add_library(wfa2::wfa2cpp ALIAS wfa2cpp)
add_library(wfa2::wfa2cpp_static ALIAS wfa2cpp_static)

if(OPENMP)
target_link_libraries(wfa2cpp_static PRIVATE OpenMP::OpenMP_CXX)
target_link_libraries(wfa2cpp PRIVATE OpenMP::OpenMP_CXX)
endif(OPENMP)

# ---- Get version

file (STRINGS "VERSION" BUILD_NUMBER)
file (STRINGS "VERSION.txt" BUILD_NUMBER)
add_definitions(-DWFA2LIB_VERSION="${BUILD_NUMBER}")
add_definitions(-DVERSION="${BUILD_NUMBER}")

Expand Down
4 changes: 2 additions & 2 deletions src/common/wflign/deps/WFA2-lib/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ AR=ar
AR_FLAGS=-rsc

ifndef BUILD_EXAMPLES
BUILD_EXAMPLES=0
BUILD_EXAMPLES=1
endif
ifndef BUILD_TOOLS
BUILD_TOOLS=0
BUILD_TOOLS=1
endif
ifndef BUILD_WFA_PARALLEL
BUILD_WFA_PARALLEL=0
Expand Down
42 changes: 25 additions & 17 deletions src/common/wflign/deps/WFA2-lib/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,40 +12,43 @@ The wavefront alignment (WFA) algorithm is an **exact** gap-affine algorithm tha

### 1.2 What is WFA2-lib?

The WFA2 library implements the WFA algorithm for different distance metrics and alignment modes. It supports various [distance functions](#wfa2.distances): indel, edit, gap-lineal, gap-affine, and dual-gap gap-affine distances. The library allows computing only the score or the complete alignment (i.e., CIGAR) (see [Alignment Scope](#wfa2.scope)). Also, the WFA2 library supports computing end-to-end alignments (a.k.a. global-alignment) and ends-free alignments (including semi-global, glocal, and extension alignment) (see [Alignment Span](#wfa2.span)). In the case of long and noisy alignments, the library provides different [low-memory modes](#wfa2.mem) that significantly reduce the memory usage of the naive WFA algorithm implementation. Beyond the exact-alignment modes, the WFA2 library implements [heuristic modes](#wfa2.heuristics) that dramatically accelerate the alignment computation. Additionally, the library provides many other support functions to display and verify alignment results, control the overall memory usage, and more.
The WFA2 library implements the WFA algorithm for different distance metrics and alignment modes. It supports various [distance functions](#wfa2.distances): indel, edit, gap-linear, gap-affine, and dual-gap gap-affine distances. The library allows computing only the score or the complete alignment (i.e., CIGAR) (see [Alignment Scope](#wfa2.scope)). Also, the WFA2 library supports computing end-to-end alignments (a.k.a. global-alignment) and ends-free alignments (including semi-global, glocal, and extension alignment) (see [Alignment Span](#wfa2.span)). In the case of long and noisy alignments, the library provides different [low-memory modes](#wfa2.mem) that significantly reduce the memory usage of the naive WFA algorithm implementation. Beyond the exact-alignment modes, the WFA2 library implements [heuristic modes](#wfa2.heuristics) that dramatically accelerate the alignment computation. Additionally, the library provides many other support functions to display and verify alignment results, control the overall memory usage, and more.

### 1.3 Getting started

Git clone and compile the library, tools, and examples. By default use cmake:
Git clone and compile the library, tools, and examples (by default, use cmake).

```
git clone https://github.com/smarco/WFA2-lib
cd WFA2-lib
mkdir build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --verbose
ctest . --verbose
```

There are some flags that can be used:
There are some flags that can be used. For instance:

```
cmake .. -DOPENMP=TRUE
cmake .. -DCMAKE_BUILD_TYPE=Release -DEXTRA_FLAGS="-ftree-vectorizer-verbose=5"
```

To add vector optimization try
To build a shared library (static is the default).

```
cmake .. -DCMAKE_BUILD_TYPE=Release -DEXTRA_FLAGS="-ftree-vectorize -msse2 -mfpmath=sse -ftree-vectorizer-verbose=5"
cmake -DBUILD_SHARED_LIBS=ON
```

To build a shared library (static is the default)
Alternatively, the Makefile build system can be used.

```
cmake -DBUILD_SHARED_LIBS=ON
$> git clone https://github.com/smarco/WFA2-lib
$> cd WFA2-lib
$> make clean all
```

It is possible to build WFA2-lib in a GNU Guix container, for more information see [guix.scm](./guix.scm).
Also, it is possible to build WFA2-lib in a GNU Guix container, for more information see [guix.scm](./guix.scm).

### 1.4 Contents (where to go from here)

Expand Down Expand Up @@ -158,7 +161,7 @@ Display the result of the alignment.

```C++
// Display CIGAR & score
string cigar = aligner.getAlignmentCigar();
string cigar = aligner.getCIGAR();
cout << "CIGAR: " << cigar << endl;
cout << "Alignment score " << aligner.getAlignmentScore() << endl;
```
Expand All @@ -173,7 +176,7 @@ An example of how to use them is [here](./bindings/rust/example.rs).
## <a name="wfa2.features"></a> 3. WFA2-LIB FEATURES

* **Exact alignment** method that computes the optimal **alignment score** and/or **alignment CIGAR**.
* Supports **multiple distance metrics** (i.e., indel, edit, gap-lineal, gap-affine, and dual-gap gap-affine).
* Supports **multiple distance metrics** (i.e., indel, edit, gap-linear, gap-affine, and dual-gap gap-affine).
* Allows performing **end-to-end** (a.k.a. global) and **ends-free** (e.g., semi-global, extension, overlap) alignment.
* Implements **low-memory modes** to reduce and control memory consumption (down to `O(s)` using the `ultralow` mode).
* Supports various **heuristic strategies** to use on top of the core WFA algorithm.
Expand Down Expand Up @@ -417,7 +420,7 @@ The WFA2 library implements various memory modes: `wavefront_memory_high`, `wave

```C
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.memory_mode = wavefront_memory_med;
attributes.memory_mode = wavefront_memory_ultralow;
```

### <a name="wfa2.heuristics"></a> 3.5 Heuristic modes
Expand Down Expand Up @@ -480,7 +483,7 @@ WFA2's heuristics are classified into the following categories: ['wf-adaptive'](
attributes.heuristic.steps_between_cutoffs = 100;
```

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Z-drop** implements the Z-drop heuristic (as described in Minimap2). This heuristic halts the alignment process if the score drops too fast in the diagonal direction. Let $sw_{max}$ be the maximum observed score so far, computed at cell ($i'$,$j'$). Then, let $sw$ be the maximum score found in the last computed wavefront, computed at cell ($i$,$j$). The Z-drop heuristic stops the alignment process if $sw_{max} - sw > zdrop + gap_e·|(i-i')-(j-j')|$, being $gap_e$ the gap-extension penalty and $zdrop$ a parameter of the heuristic.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Z-drop** implements the Z-drop heuristic (as described in Minimap2). This heuristic halts the alignment process if the score drops too fast in the diagonal direction. Let $sw_{max}$ be the maximum observed score so far, computed at cell $(i',j')$. Then, let $sw$ be the maximum score found in the last computed wavefront, computed at cell $(i,j)$. The Z-drop heuristic stops the alignment process if $sw_{max} - sw > zdrop + gap_e·|(i-i')-(j-j')|$, being $gap_e$ the gap-extension penalty and $zdrop$ a parameter of the heuristic.


```C
Expand All @@ -497,7 +500,7 @@ WFA2's heuristics are classified into the following categories: ['wf-adaptive'](
<tr>
<td><p align="center">None</p></td>
<td><p align="center">X-drop(200,1)</p></td>
<td><p align="center">Y-drop(200,1)</p></td>
<td><p align="center">Z-drop(200,1)</p></td>
</tr>
<tr>
<td><img src="img/heuristics.drop.none.png" align="center" width="300px"></td>
Expand Down Expand Up @@ -555,11 +558,16 @@ WFA2's heuristics are classified into the following categories: ['wf-adaptive'](

### <a name="wfa2.other.notes"></a> 3.6 Some technical notes

- Thanks to Eizenga's formulation, WFA2-lib can operate with any match score. Although, in practice, M=0 is still the most efficient choice.
- Thanks to Eizenga's formulation, WFA2-lib can operate with any match score. In practice, M=0 is often the most efficient choice.


- Note that edit and LCS are distance metrics and, thus, the score computed is always positive. However, using weighted distances (e.g., gap-linear and gap-affine) the alignment score is computed using the selected penalties (i.e., the alignment score can be positive or negative). For instance, if WFA2-lib is executed using $M=0$, the final score is expected to be negative.


- All WFA2-lib algorithms/variants are stable. That is, for alignments having the same score, the all alignment modes always resolve ties (between M, X, I,and D) using the same criteria: M (highest prio) > X > D > I (lowest prio). Only the memory mode `ultralow` (BiWFA) resolves ties differently (although the results are still optimal).

- Note that edit and LCS are distance metrics and, thus, the score computed is always positive. However, weighted distances, like gap-linear and gap-affine, have the sign of the computed alignment evaluated under the selected penalties. If WFA2-lib is executed using $M=0$, the final score is expected to be negative.

- All WFA2-lib algorithms/variants are stable. That is, for alignments having the same score, the library always resolves ties (between M, X, I,and D) using the same criteria: M (highest prio) > X > D > I (lowest prio). Nevertheless, the memory mode `ultralow` (BiWFA) is optimal (always reports the best alignment) but not stable.
- WFA2lib follows the convention that describes how to transform the (1) Pattern/Query into the (2) Text/Database/Reference used in classic pattern matching papers. However, the SAM CIGAR specification describes the transformation from (2) Reference to (1) Query. If you want CIGAR-compliant alignments, swap the pattern and text sequences argument when calling the WFA2lib's align functions (to convert all the Ds into Is and vice-versa).

## <a name="wfa2.complains"></a> 4. REPORTING BUGS AND FEATURE REQUEST

Expand Down
1 change: 0 additions & 1 deletion src/common/wflign/deps/WFA2-lib/VERSION

This file was deleted.

1 change: 1 addition & 0 deletions src/common/wflign/deps/WFA2-lib/VERSION.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
v2.3
Loading

0 comments on commit 36b875e

Please sign in to comment.