Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stemming unknown proper nouns #9

Open
meliksahturker opened this issue Jul 2, 2021 · 1 comment
Open

Stemming unknown proper nouns #9

meliksahturker opened this issue Jul 2, 2021 · 1 comment

Comments

@meliksahturker
Copy link

Hey Olga, good work with zeyrek.

I have a small improvement suggestion. Zeyrek is capable of providing stem of known proper nouns where inflections are attached with apostrophe. Example:
"istanbul'daki" -> "İstanbul"

but merges the inflection with the stem in case of unknown proper noun without parsing the inflections. Example:
"melik'in" -> "melikin"

So my suggestion is it should return the part before apostrophe. I'm not sure about if it should parse the inflection after apostrophe though. I might be missing some other case with apostrophe but here I am pointing out to something with unknown proper nouns and their inflections.

@obulat
Copy link
Owner

obulat commented Jul 2, 2021

Thank you for opening the first issue here, @meliksahturker :)
Zemberek-nlp has the functionality of parsing unknown words, and I was planning to port it to Zeyrek as well, but didn't have time for it, sorry. I would greatly appreciate any contributions in this area. Other than that, you are free to fork Zeyrek, and just add the 'remove the part after apostrophe' functionality, if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants