Benchmarking for the different algorithms #26

assem-ch · 2018-03-05T12:37:23Z

No description provided.

sneetsher · 2018-06-01T06:36:39Z

@assem-ch , It will be much work for one man. I see many separate reports for different words, it is not practical to follow each manually. Stemmer is never be 100% perfect as manual learned method by human.

Better to switch to coverage based tests, using manually processed data. At least using word root/derivatives without attached pronouns & conjunctions which may be already available in the web with free use license. (Ex. harfbuzz coverage testing was done on Wikipedia data rendered in multiple browsers. https://github.com/harfbuzz/harfbuzz-testing-wikipedia). Possibly, I can implement them if you could find for me some data-sets on the web, I can use/crap them.
We merge all reports about specific words in one or multiple reports (each report for one algorithms and one version). I can help in this, just write short, debugging/reporting instructions in readme file. like reporting release number and how can user know which algorithm is used.

sneetsher · 2018-06-01T06:44:28Z

Btw, not all release packages have mentioned version.

sneetsher · 2018-06-01T07:11:55Z

@assem-ch If you think the Arabic stemmer is too worthy for Alfanous & too many Arabic project that I put considerable time in it.

I will see If I go with separate project for stop-word list and derivatives list build by crowd-sourced verification to get high quality test data for the stemmer. Because, I see few arabic stop-words list and basically few persons efforts.

All depend on the way how to reach the interested/effective Arab community.

assem-ch · 2018-06-05T20:52:28Z

@sneetsher there is already a project for testing data https://github.com/ibnmalik/golden-corpus-arabic, it can be exposed with the stemmer to get new suggests from users.

This is my phd project and I should really focus on it... I will work on a demo/review web app for it to welcome feedback and improve the visibility.

For alfanous its not too worthy, but it fix a gap for stemming words that dont exist in quran exactly but in other forms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking for the different algorithms #26

Benchmarking for the different algorithms #26

assem-ch commented Mar 5, 2018

sneetsher commented Jun 1, 2018 •

edited

Loading

sneetsher commented Jun 1, 2018

sneetsher commented Jun 1, 2018 •

edited

Loading

assem-ch commented Jun 5, 2018

Benchmarking for the different algorithms #26

Benchmarking for the different algorithms #26

Comments

assem-ch commented Mar 5, 2018

sneetsher commented Jun 1, 2018 • edited Loading

sneetsher commented Jun 1, 2018

sneetsher commented Jun 1, 2018 • edited Loading

assem-ch commented Jun 5, 2018

sneetsher commented Jun 1, 2018 •

edited

Loading

sneetsher commented Jun 1, 2018 •

edited

Loading