-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Similarity metrics (New) #53
Comments
It'd be great if we have a source for this. It won't make sense putting efforts into something which won't be used eventually. |
@parantak As much as I remember especially Smith-Waterman Algorithm isn't relevant for string, in case of DNA it matches a particular sequence, so if the minority sentence was a discount union of subsets of another string it would give 100% similairty- |
@someshsingh22 Right, sorry. I vaguely remembered them both. I searched a bit, and I believe Smith-Waterman is a local alignment algorithm whereas Needle-Wunsch is a global alignment algorithm. So, I guess Smith-Waterman might not be as relevant. However, I am sure Needlman-Wunsch should be a good metric because unlike static penalties in Levenshtein, the algorithm implements different penalties for matches, mismatches, and gaps. |
Yes, that was my point, I don't remember Needleman-Wunsch well either. Do look for some supporting literature in similar domains of NLP before you move on though. |
@someshsingh22 Yeah, of course. |
These algorithms were originally developed for DNA sequencing but I read on SO, that they are at times used as string similarity metrics as well as they account for mismatches and gaps (spaces). Moreover, we can penalize gaps and mismatches according to a value the users choice.
Should we implement this, @rajaswa and @someshsingh22? If yes, then I'll do it in some time.
The text was updated successfully, but these errors were encountered: