Similarity metrics (New) #53

parantak · 2020-05-12T17:02:52Z

Needleman-Wunsch Algorithm
Smith-Waterman Algorithm
These algorithms were originally developed for DNA sequencing but I read on SO, that they are at times used as string similarity metrics as well as they account for mismatches and gaps (spaces). Moreover, we can penalize gaps and mismatches according to a value the users choice.
Should we implement this, @rajaswa and @someshsingh22? If yes, then I'll do it in some time.

rajaswa · 2020-05-12T18:27:58Z

It'd be great if we have a source for this. It won't make sense putting efforts into something which won't be used eventually.

parantak · 2020-05-12T23:37:45Z

https://math.mit.edu/classes/18.417/Slides/alignment.pdf
http://sepwww.stanford.edu/public/docs/sep112/bob2/paper_html/node3.html
http://web.archive.org/web/20081224234350/http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#needleman

someshsingh22 · 2020-05-13T00:10:50Z

@parantak As much as I remember especially Smith-Waterman Algorithm isn't relevant for string, in case of DNA it matches a particular sequence, so if the minority sentence was a discount union of subsets of another string it would give 100% similairty-
"My name" and "My name is Somesh Singh" will be 100% same (Smith-Waterman looks after exclusion) while Needle-Wunshch looks for inclusion. So personally I don't think it would be that useful. Check what we used for BIOF110 here and you will notice the difference

parantak · 2020-05-13T05:30:06Z

@someshsingh22 Right, sorry. I vaguely remembered them both. I searched a bit, and I believe Smith-Waterman is a local alignment algorithm whereas Needle-Wunsch is a global alignment algorithm. So, I guess Smith-Waterman might not be as relevant. However, I am sure Needlman-Wunsch should be a good metric because unlike static penalties in Levenshtein, the algorithm implements different penalties for matches, mismatches, and gaps.
As for Smith-Waterman, I'll look into it soon to gain a better understanding and to make sure we aren't missing out on anything.

someshsingh22 · 2020-05-13T06:03:33Z

Yes, that was my point, I don't remember Needleman-Wunsch well either. Do look for some supporting literature in similar domains of NLP before you move on though.

parantak · 2020-05-13T06:15:54Z

@someshsingh22 Yeah, of course.

rajaswa added the question Further information is requested label May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarity metrics (New) #53

Similarity metrics (New) #53

parantak commented May 12, 2020

rajaswa commented May 12, 2020

parantak commented May 12, 2020

someshsingh22 commented May 13, 2020

parantak commented May 13, 2020 •

edited

Loading

someshsingh22 commented May 13, 2020

parantak commented May 13, 2020

Similarity metrics (New) #53

Similarity metrics (New) #53

Comments

parantak commented May 12, 2020

rajaswa commented May 12, 2020

parantak commented May 12, 2020

someshsingh22 commented May 13, 2020

parantak commented May 13, 2020 • edited Loading

someshsingh22 commented May 13, 2020

parantak commented May 13, 2020

parantak commented May 13, 2020 •

edited

Loading