-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost records #162
Comments
dt-162-ghost-agents-report.xlsx contains three tabs:
Due to the amount of data in play, dt-162-ghost-agents-query.js.txt had to be run in three modes. The list numbers do not correlate to the above list numbers.
@clarkepeterf and @azaroth42, below is the technique that was used to find the disjoint of IRIs found in the triple store and URIs of documents, where starter plan included the
Because the above does not also incorporate the URI lexicon, I'm left to believe the IRI lexicon is populated by the IRIs of the documents in the database, as opposed to all IRIs in the triple store. See the attached query for additional context/details. |
apologies for coming back to this two months later. I blame the holidays. I'm going to check the rest of the missing People to see if it's the same issue. Again, haven't deduced the 404 issue yet. |
update after some research: the majority of the 65 ghost People records are date issues (57 records, 56 of which are YCBA, one is YUL). The rest are a mixture of "I see no reason why these are not merging" and "there's no contributing records and this shouldn't exist" and "one supremely bizarre mystery". None of this research solves the problem of why these are returning 404s, because they should still be creating records, even with the wonky data issues. Breaking them down below: No contributing records https://lux.collections.yale.edu/data/person/32be3d65-95e7-4d2a-a58d-a5f047d24498 These two only have timestamps in their idmaps (both from December runs, not most recent January), no contributing records. I can only assume they are not meant to exist and will go away. Why aren't these merging? These should be merging. They're all YUL contributing records, with LC equivalents in the YUL URI data. The "real" LUX records for these people ALSO have that LC. There's no timespans to toss out the reconciliation/merging. So, I have no idea what's happening here. One supremely bizarre mystery This looks like a great record in Idmap. Tons of equivalents and two contributing YUL uris. One of those YUL uris actually also belongs to the "real" LUX record. The other seems like it is contributing to an overmerge issue in this record, because it has the wrong VIAF equiv. I created a unit-data ticket for that fix. However, I don't know if that will solve the problem. Timespan mismatches causing the collector to throw out the equivalencies during reconciliation What is truly weird, pipeline-wise, is that in some cases (if not all, did not check them all), information from the suppressed YCBA records is still ending up in the "real" LUX records. has the death date from the contributing Wikidata record, but the birth from the YCBA: that is NOT attached to that record and a casual observer would have no way of knowing was extant. So that seems like a real bug, but a challenge. csv with my research |
tldr: This is a challenge. Some possible to dos are below, but also a more long term plan to refactor all this code may help as well. Unit-data:
Pipeline tasks:
|
Pipeline is losing some Agent records, which are being reidentified but not linked together properly.
Example:
This object:
https://lux.collections.yale.edu/view/object/ccca43ea-1fd7-4449-9f3f-fb026edf7b07
was published by Martinus van den Enden:
(ycba rec vended)
https://ycba-lux.s3.amazonaws.com/v3/person/a4/a4d1963c-d3cc-4f57-bb49-0204574106ca.json
(lux rec, which returns a 404):
https://lux.collections.yale.edu/data/person/0133a1e2-998e-447b-bd33-657d36941876
There's a live Martinus van den Enden in LUX:
https://lux.collections.yale.edu/view/person/e2990454-a285-4b92-bb4f-dcd8b62a344b
which doesn't have the YCBA as a contributor.
Brent to attach a list of 65 unique missing agents with this issue.
The text was updated successfully, but these errors were encountered: