Skip to content

Latest commit

 

History

History
109 lines (77 loc) · 4.21 KB

README.md

File metadata and controls

109 lines (77 loc) · 4.21 KB

strings

license Swift Xcode SPM Swift

Just playing around with strings... It started as a collection of string algorithms implemented in order to learn a bit more about them.

Making it a package so people can import if they find useful.

Algorithms Available

Levenshtein

An implementation of Levenshtein distance algorithm with a few algorithms and swift optimization. Step by step:

  • An optimization by trimming equal prefixes and suffixes from source and destination and since it doesn't affect the result and we can reduce the cost. Example "abcd" -> "auid" == "bc" -> "ui".

  • Early exit for empty cases.

  • Allocate two rows (current and previous) which at the initial stage of the algorithm is first and second row initialized accordingly the first row is 0...destination.count and second row is 1 at initial position and all zeros. Note that the original algorithm pseudo code describes a matrix source.count x destination.count but since through the algorithm iterations it only operates in terms of the current and the previous rows a size allocation optimization can be done.

  • Iterate through "all the rows" calculating the minimum of insertion, deletion and substitution + 1. Pseudo code:

if s[i] = t[j]:
    substitutionCost := 0
else:
    substitutionCost := 1

d[i, j] := minimum(d[i-1, j] + 1,                   // deletion
                   d[i, j-1] + 1,                   // insertion
                   d[i-1, j-1] + substitutionCost)  // substitution

Given that we don't have the whole matrix, the iteration is done by at the end of operating the current row we swap the previous and current so the previous row becomes the current a we reuse the area which was the previous(that will be discarded) for operate the new current initializing the first position of the current accordingly.

Jaro and Jaro-Winkler

Hamming

  • The Hamming distance between two strings of equal length is the count of positions in which the caracter is different in the between the two.

TODO

  • Implement other algorithms e.g. Damerau–Levenshtein(differs from the Levenshtein by including also transponsitions on the calculation in addition to the insertion, substitution and deletion).

Usage

Is very simple to use...

import strings

"friend".levenshteinDistance(to: "fresh") // 3
"adlsajdlsa".jaroDistance(to: "asv") // ~= 0.6222
"friend".hammingDistance(to: "frinnd") // 1

Instalation

You can use The Swift Package Manager to install by adding the proper description to your Package.swift

import PackageDescription

let package = Package(
    name: "YOUR_PROJECT_NAME",
    targets: [],
    dependencies: [
        .package(url: "https://github.com/LucianoPAlmeida/strings.git", from: "0.0.1")
    ]
)

Then add the target as dependency

.target(
    name: "YOUR_TARGET_NAME",
    dependencies: [
        "strings", ... 
    ]
),

Or you can manually just copy the implementations into your project.

Benchmarks

Benchmark was there to make us strive to the most performant implementation given the knowledge of the algorithms and of the swift language.

References

Licence

strings is released under the MIT License.