Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve detection of fixed snippets. #70

Open
eggpi opened this issue Jan 8, 2017 · 1 comment
Open

Improve detection of fixed snippets. #70

eggpi opened this issue Jan 8, 2017 · 1 comment

Comments

@eggpi
Copy link
Owner

eggpi commented Jan 8, 2017

The way we estimate fixed snippets right now works well for the most part, but it has some interesting possibilities for false positives.

For example, right now the database contains the following snippet for Frankfurt:

Frankfurt has the State Institution of Higher Learning for Artistic Education known as the Städelschule, founded in 1817 by Johann Friedrich Städel. It was taken over by the city in 1942 and turned into a state art school

It existed in the page when the en database was last built, but has since been removed.

A couple of hours before the removal, someone had clicked I got this for that snippet, which caused us to start watching the page for 3 hours and check whether the snippet was fixed. Eventually that caused a false positive as the snippet was removed.

I think the same problem motivated 0a97cf7, though I didn't investigate as deeply in that case.

@eggpi
Copy link
Owner Author

eggpi commented Apr 27, 2017

Another interesting case: for ruwiki, the expansion of {{ fact }} contains the number of days since the template was added to the page.

This means that if we naively hash and compare snippets, they will always be different, so we always get false positives.

So yeah, the comparison should use more robust metrics such as the number of {{ fact }} templates present in the snippets, rather than their contents.

eggpi added a commit that referenced this issue Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant