Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality control statistics #83

Open
myrmoteras opened this issue May 6, 2020 · 0 comments
Open

Quality control statistics #83

myrmoteras opened this issue May 6, 2020 · 0 comments

Comments

@myrmoteras
Copy link

thanks to a super new stats by @gsautter we now have a better way to look at what we do, get an idea of the errors and last but not least a way to understand where we should invest in the future to minimize errors in production, but also to communicate the limitation of this utterly "stupid" thing we do, liberation imprisoned biodiversity data.
.....................................

Hi Donat,

in order to get some more numbers on our QC efforts than that little specialized tool could previously provide, I spent the past two days finally building some dedicated stats for the error protocols ... see
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323760693&sdata=FBIFRNR5MSeC7C2tXuZBzp%2BDXLf850OlE5IwWH6ofpw%3D&reserved=0

Here for instance an overview of all the error categories:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats%2Fstats%3FoutputFields%3DerrorCat.name%2BerrorCat.label%2BerrorCat.errorsRemoved%2BerrorCat.falsePosAdded%2BerrorCat.typeCount%26groupingFields%3DerrorCat.name%26FA-errorCat.label%3Dmax%26FA-errorCat.typeCount%3Dmax%26format%3DHTML&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323760693&sdata=km1uWTodZlv55TFp%2BDisdoJu09YzNKEz2%2B4s5Xk%2BnFQ%3D&reserved=0

The same for the error types:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats%2Fstats%3FoutputFields%3DerrorType.name%2BerrorType.label%2BerrorType.parentCat%2BerrorType.errorsRemoved%2BerrorType.falsePosAdded%26groupingFields%3DerrorType.name%2BerrorType.parentCat%26orderingFields%3DerrorType.parentCat%26FA-errorType.label%3Dmax%26format%3DHTML&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323760693&sdata=pVxmoD48%2F8931Xk7EAZl7072YqezgGdA%2F5kiPlvVzIU%3D&reserved=0

Here an overview of what the individual users did:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats%2Fstats%3FoutputFields%3Ddoc.docId%2Bdoc.subjectDocId%2Bdoc.updateUser%2Bdoc.prevDocId%2Bdoc.errorsRemoved%2Bdoc.falsePosAdded%26groupingFields%3Ddoc.updateUser%26orderingFields%3Ddoc.subjectDocId%26FP-doc.prevDocId%3D0-%26format%3DHTML&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323760693&sdata=szc9S%2BSkMChh6VVn1oQVjFuBWsbdJ3D80l2DVXLA2AE%3D&reserved=0

Note1: "Replaces Error Protocol UUID" not being empty indicates there is an earlier error protocol for the same document, before the respective users did their QC work. The "Number of Errors Fixed" and the "Number of Errors Marked as False Positives" are deltas to the respective predecessor error protocol, so the sums of said deltas reflect what the individual users did.
Note2: Keep in mind that computing these deltas is only possible if there are at least two error protocols for a document. So if anyone runs the batch on their desktop machine and does the QC before even uploading the IMF to the server, there is no way of telling the amount of QC work done.

This now is an overview of all the errors extant in (the QCed part of) our data
collection:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats%2Fstats%3FoutputFields%3Ddoc.docId%2Bdoc.subjectDocId%2Bdoc.nextDocId%2Bdoc.errors%2Bdoc.falsePos%2Bdoc.errorsBlocker%2Bdoc.falsePosBlocker%2Bdoc.errorsCritical%2Bdoc.falsePosCritical%2Bdoc.catCount%2Bdoc.typeCount%26orderingFields%3Ddoc.subjectDocId%26FP-doc.nextDocId%3D-0%26FA-doc.nextDocId%3Dmin%26FA-doc.catCount%3Dmax%26FA-doc.typeCount%3Dmax%26format%3DHTML&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323760693&sdata=xSMAw8ihXwYUscj8MPLLXODLpFOEbB8fZ5sAVcj16W4%3D&reserved=0

Note1: "Replaced by Error Protocol UUID" being empty indicates there is no later error protocol for a specific document, i.e., this restricts the numbers to the error protocols indicating the current status of their respective documents.
Note2: The number under "Document UUID" also indicates the number of documents we have QCed so far.

Same as the above, expanded by the individual error types:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftb.plazi.org%2FGgServer%2FephStats%2Fstats%3FoutputFields%3Ddoc.nextDocId%2BerrorType.name%2BerrorType.label%2BerrorType.parentCat%2BerrorType.errors%2BerrorType.falsePos%2BerrorType.errorsBlocker%2BerrorType.falsePosBlocker%2BerrorType.errorsCritical%2BerrorType.falsePosCritical%26groupingFields%3DerrorType.name%2BerrorType.parentCat%26orderingFields%3DerrorType.parentCat%26FP-doc.nextDocId%3D-0%26FA-doc.nextDocId%3Dmin%26FA-errorType.label%3Dmax%26format%3DHTML&data=02%7C01%7Cagosti%40amnh.org%7Ce37058e912b544c3b86908d7f1406d58%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637243131323770686&sdata=fiSEINnRY3vc%2B1xFQD771DnfBLi13ypcd13wV4AgXpM%3D&reserved=0

Best,
Guido

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant