Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve DE-Sol1 holdings from Aleph, add them to lobid and allow updates #1

Open
acka47 opened this issue Dec 12, 2024 · 14 comments
Open

Comments

@acka47
Copy link
Contributor

acka47 commented Dec 12, 2024

This ticket is for implementing a simple successor system for DE-Sol1 to

  1. have all their existing holding data in lobid
  2. add new holdings to existing lobid records
  3. provide a search UI based on a filtered lobid version

Detailed background can be found in https://dienst-wiki.hbz-nrw.de/x/wIjsX (hbz-internal wiki).

Here is the implementation plan:

  • On 2024-12-16 Aleph cataloging will stop and a last export of Aleph data will be created.
  • We will have a meeting next week with DE-Sol1 partners to find out which holdings data needs to be preserved,
  • We pull out the holdings data from the Aleph export and store it somewhere, preferably as a csv file that can be modified by DE-Sol1 staff.
  • We generate lobid holding information (in JSON-LD) from the csv and add it to existing lobid records (weekly updates should suffice here)
  • We set up a lobid search filtered on titles held by DE-Sol1 available through a dedicated URL. (I guess we can do this with some Apache config).
@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 12, 2024

Quick sidenote: we could already use this for setting the dedicated URL before presenting it.
https://aleph.lobid.org/resources/search?owner=http%3A%2F%2Flobid.org%2Forganisations%2FDE-Sol1%23!&aggregations=owner

The query in ALMA is different since the aggregations is called differently: http://lobid.org/resources/search?owner=http%3A%2F%2Flobid.org%2Forganisations%2FDE-Sol1%23%21&aggregations=hasItem.heldBy.id

@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 16, 2024

Old aleph:

{
      "id": "http://lobid.org/items/HT013786250:DE-Sol1:GA%203472#!",
      "type": [
        "Item"
      ],
      "heldBy": {
        "id": "http://lobid.org/organisations/DE-Sol1#!",
        "label": "lobid Organisation"
      },
      "callNumber": "GA 3472",
      "label": "GA 3472"
    }

Current ALMA JSON:

   {
      "label": "lobid Bestandsressource",
      "type": [
        "Item",
        "PhysicalObject"
      ],
      "callNumber": "GA 3472",
      "serialNumber": "keinBarcode",
      "currentLibrary": "Z9035",
      "currentLocation": "kA",
      "heldBy": {
        "isil": "DE-Sol1",
        "id": "http://lobid.org/organisations/DE-Sol1#!",
        "label": "Stadtarchiv Solingen, Bibliothek"
      },
      "inCollection": [
        {
          "id": "http://lobid.org/organisations/DE-Sol1#!",
          "label": "Stadtarchiv Solingen, Bibliothek"
        }
      ],
      "id": "http://lobid.org/items/990121068670206441:DE-Sol1:236120120007830#!"
    }

@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 16, 2024

Had a first look at the secured aleph data from Verbundgruppe. They provided three aleph seq and three mab files each named hbz01, hbz18 and hbz60. I am not sure what the difference is.

The mab files are line separated MAB records.
The aleph seq files have a lot of records in each file.

mab files

With metafacture i am able to open hbz01 and hbz60 as mab with decode-mab see: https://github.com/TobiasNx/metafacture_workflows/blob/master/mab2DE-Sol1Holdings/mab2De-Sol1Holdings.flux . hbz18 cannot be opend with decode-mab. It creates an exception
But I can open this file with decode-marc21. Not sure why.

PS: hbz18 is GND Marc that explains why it can be opened with the marc decoder.

seq files

While I am able to process all three files with: https://github.com/TobiasNx/metafacture_workflows/blob/master/mab2DE-Sol1Holdings/mab2De-Sol1Holdings_seq.flux

It seems that decode-aseq can only process single records and not multiple ones in one file. decode-aseq merges multiple records in one file..

I am not sure how to process each of the records in a file individually.

@dr0i
Copy link
Member

dr0i commented Dec 17, 2024

It seems that decode-aseq can only process single records and not multiple ones in one file. decode-aseq merges multiple records in one file..

It would (half) work with lines-to-records and a regexp, but that would exclude the LDR:

@TobiasNx
Copy link
Contributor

Thanks I had another working idea on my bike this morning: reading it as-lines and merge-same-ids at stream level

@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 17, 2024

HBZ60 contains the holdings.

I checked the provided fields with list-fix-paths /I document the element according to https://service-wiki.hbz-nrw.de/display/WAL/Linksammlung+und+Literatur+zu+Datenthemen+und+Format+MARC+21?preview=%2F647888944%2F647888945%2FM2M_Konkordanz_Holdings_Aleph.20200605_aktuell.pdf

Field German Definition
001.a Identifikationsnummer des Datensatzes
002a.a Datum der Ersterfassung
003.a Datum der letzten Korrektur
004.a Erstellungsdatum des Datensatzes beim Export
010.a Identifikationsnummer des direkt übergeordneten
012.a Identifikationsnummer des Titeldatensatzes
020a.a Identifikationsnummer eines gelieferten Datensatzes(Überregionale Identifikationsnumme
r)
025z.a Überregionale Identifikationsnummer (035 in Marc)
030.a Codierte Angaben zum Datensatz
050.a Datenträger
057.a Materialspezifische Codes für Mikroformen
070.a Identifizierungsmerkmale der bearbeitenden Institution
070a.a nicht definiert
070b.a nicht definiert
071.a Identifizierungsmerkmale der besitzenden Institution
071a.a Identifizierungsmerkmale der besitzenden Institution
071d.a Identifizierungsmerkmale der besitzenden Institution
072.a Codierte Angaben zur besitzenden Institution>
075.a ZDB-Prioritätszahl
076a.a nicht definiert
076g.a Lokale Bemerkungen der Zeitschriften - Datenbank / Benutzungsbeschränkungen, Ausleihstatus, Entleihbarkeit
077b.a ZDB-Lokal-Identifikationsnummer, ZDB-Titel-Identifikationsnummer
100.a Signatur
105.a Standort
107.a Zusätzliche Signatur
115.a Akzessionsnummer
120.a Buchungsnummer
125a.a Bemerkungen
125b.a Bemerkungen
200.0 Zusammenfassende Bestandsangaben L $$01$$aEinleitenderText$$cLückenangabe$$dLücken(Desiderat)$$eKommentar$$hSonderstandort$$kKommentarGrundsignatur$$b1.1969(1970)-39.2007$$f36J2$$g04
200.a s.o.
200.b s.o.
200.c s.o.
200.e s.o.
200.f s.o.
210a.*.d / 210a.d Normierte Bestandsangabe, nicht weiter definiert
210a.*.j / 210a.j Normierte Bestandsangabe, nicht weiter definiert
210a..k / 210a.k. Normierte Bestandsangabe, nicht weiter definiert
210a.*.n / 210a.n Normierte Bestandsangabe, nicht weiter definiert
210b.d nicht definiert
210b.j / 210b.j.* nicht definiert
220.a Erste Signatur
220.l Erste Signatur
710.*.a Schlagwörter und Schlagwortketten
953.*.a / 953.a Lizenzinformation über den Gebrauch der Metadaten
FMT Format
LDR Leader

It seems that some elements are repeatable AND THIS will make it difficult to transform them to a csv.

Checking the index of these paths it shouds that there are up to 35 repetition.

001.a
002a.a
003.a
004.a
010.a
012.a
020a.a
025z.a
030.a
050.a
057.a
070.a
070a.a
070b.a
071.a
071a.a
071d.a
072.a
075.a
076a.a
076g.a
077b.a
100.a
105.a
107.a
115.a
120.a
125a.a
125b.a
200.0
200.a
200.b
200.c
200.e
200.f
210a.1.d
210a.1.j
210a.1.k
210a.1.n
210a.10.d
210a.10.j
210a.10.k
210a.10.n
210a.11.d
210a.11.j
210a.11.k
210a.11.n
210a.12.d
210a.12.j
210a.12.k
210a.12.n
210a.13.d
210a.13.j
210a.13.k
210a.13.n
210a.14.d
210a.14.j
210a.14.k
210a.14.n
210a.15.d
210a.15.j
210a.15.k
210a.16.d
210a.16.j
210a.16.k
210a.17.d
210a.17.j
210a.18.d
210a.18.j
210a.19.d
210a.19.j
210a.2.d
210a.2.j
210a.2.k
210a.2.n
210a.20.d
210a.20.j
210a.20.k
210a.21.d
210a.21.j
210a.21.k
210a.22.d
210a.22.j
210a.23.d
210a.23.j
210a.24.d
210a.24.j
210a.25.d
210a.25.j
210a.26.d
210a.26.j
210a.27.d
210a.27.j
210a.27.k
210a.28.d
210a.28.j
210a.29.j
210a.3.d
210a.3.j
210a.3.k
210a.3.n
210a.30.j
210a.31.j
210a.32.j
210a.33.j
210a.34.j
210a.35.j
210a.36.j
210a.37.j
210a.38.j
210a.39.j
210a.4.d
210a.4.j
210a.4.k
210a.4.n
210a.5.d
210a.5.j
210a.5.k
210a.5.n
210a.6.d
210a.6.j
210a.6.k
210a.6.n
210a.7.d
210a.7.j
210a.7.k
210a.7.n
210a.8.d
210a.8.j
210a.8.k
210a.8.n
210a.9.d
210a.9.j
210a.9.k
210a.9.n
210a.d
210a.j
210a.k
210a.k.1
210a.k.2
210a.k.3
210a.n
210b.d
210b.j
210b.j.1
210b.j.2
210b.j.3
220.a
220.l
710.1.a
710.2.a
953.1.a
953.2.a
953.3.a
953.4.a
953.5.a
953.a
FMT
LDR

@TobiasNx
Copy link
Contributor

Also there seems to be a lot more info than our old morph transformation could transform element 088 was the basis. This element does not exist in our hbz01 and the holding information in hbz60 is a lot more encompassing.

@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 17, 2024

First draft to transform the data on basis of HBZ60 seq to lobid

@TobiasNx TobiasNx transferred this issue from hbz/lobid-resources Dec 18, 2024
TobiasNx added a commit that referenced this issue Dec 18, 2024
to be able to map from hbzId 2 alma
@TobiasNx
Copy link
Contributor

TobiasNx commented Dec 18, 2024

TODO:

  • specify holding type.
  • find out which elements are really catalogued from DE-Sol1 and which are only enriched.
  • Combine holdings for the same record.

@TobiasNx
Copy link
Contributor

TobiasNx commented Jan 9, 2025

I adjusted the transformation.

The transformation now results into records with one or more holdings and the id is the id of the lobid record. We still need to adjust the holding ids.

{
"id" : "http://lobid.org/resources/990110486750206441#!",
"hasItem" : [ {
"id" : "http://lobid.org/items/990110486750206441:ITEMMMSIDMUSSERSETZTWERDEN#!",
"callNumber" : "MA 3264/[1]",
"inCollection" : [ {
"id" : "http://lobid.org/organisations/DE-Sol1#!",
"label" : "Stadtarchiv Solingen, Bibliothek"
} ],
"heldBy" : {
"isil" : "DE-Sol1",
"id" : "http://lobid.org/organisations/DE-Sol1#!"
},
"label" : "Stadtarchiv Solingen, Bibliothek",
"type" : [ "Item", "PhysicalObject" ]
}, {
"id" : "http://lobid.org/items/990110486750206441:ITEMMMSIDMUSSERSETZTWERDEN#!",
"callNumber" : "MA 3264/[1] (b)",
"inCollection" : [ {
"id" : "http://lobid.org/organisations/DE-Sol1#!",
"label" : "Stadtarchiv Solingen, Bibliothek"
} ],
"heldBy" : {
"isil" : "DE-Sol1",
"id" : "http://lobid.org/organisations/DE-Sol1#!"
},
"label" : "Stadtarchiv Solingen, Bibliothek",
"type" : [ "Item", "PhysicalObject" ]
} ]
}

@acka47
Copy link
Contributor Author

acka47 commented Jan 15, 2025

On 1/15/25 14:41, Stadtarchiv wrote:

In dem Feld 115 (wenn ich richtig liege) haben wir (häufig) unsere Zugangsnummer / Inventarisierung vermerkt. Wenn dies auch zukünftig möglich wäre, würde mich dies sehr erfreuen.

This will be the "Akzessionsnummer" which should also be added to the item data.

@acka47
Copy link
Contributor Author

acka47 commented Jan 15, 2025

Also, we should take account of multiple items which are only recorded in the Aleph export data:

On 1/15/25 14:41, Stadtarchiv wrote:

Bei Mehrfachexemplaren haben wir, wenn ich mich richtig erinnere, bisher bei gleicher Signatur (z. GA 2251) die jweils vorhanden Mehrfachexemplare einzeln aufgenommen (z.B. in der Form GA 2251 (a), GA 2251, (b), GA 2251 (z).

@TobiasNx
Copy link
Contributor

Also, we should take account of multiple items which are only recorded in the Aleph export data:

On 1/15/25 14:41, Stadtarchiv wrote:

Bei Mehrfachexemplaren haben wir, wenn ich mich richtig erinnere, bisher bei gleicher Signatur (z. GA 2251) die jweils vorhanden Mehrfachexemplare einzeln aufgenommen (z.B. in der Form GA 2251 (a), GA 2251, (b), GA 2251 (z).

See: #1 (comment)
It should already combine multiple items for one resource into one record with one hasItem Array.

@TobiasNx
Copy link
Contributor

I compared the ITM of the bridge with an ALEPH holding entry. Coming from the ALEPH Data the following elements are represented:

002 -> ITM $D (Inventory Date)
003 -> ITM $Y (Date Updated)
076 -> ITM $p (Item policy subfield)
100 -> ITM $c (Call Number)
115 -> ITM $B (Inventory number)

Of these ITM elements we only map the ITM $c in lobid as call Number. DE-Sol1 also requested the Inventory Number. The inventory date as well as the update date could be relevant to us and lobid. Perhaps we can also map them too. They would be a nice addon to lobid. Tracking aquisition histories. In context of hbz/lobid-resources#2128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

3 participants