-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shorter interval for updates #350
Comments
@acka47 says we have to check if dnb offers 10 min updates as rdf |
|
" Der Abfragezeitraum sollte nicht zu weit reichen, um eine Treffermenge über 100.000 Datensätzen zu vermeiden. Empfehlung bei nicht zeitkritischen Verfahren für Abfragezeitraum/Frequenz: 30 Minuten. Bei kleinen Sets (z. B. Online-Dissertationen) reicht ein einmal tägliches oder einmal wöchentliches Harvesting aus, da dadurch ein Datensatz, der in diesem Zeitraum mehrfach geändert wurde, nur einmal bezogen und die Treffermenge trotzdem nicht zu groß wird. From: https://www.dnb.de/DE/Professionell/Metadatendienste/Datenbezug/OAI/oai_node.html |
Also in Der Linked-Data-Service der Deutschen Nationalbibliothek: Auslieferung der Metadaten it reads:
So we should just try out shorter update intervals, I guess. |
Hourly seems to be possible: |
This follows the recommendation of the DNB. This enables a bit of overlapping to ensure to get all the data. See #350 (comment).
If #355 is merged the cron scheduler can be adjusted to get the data e.g. every 10 minutes. |
As getting data every 10 minutes often results in an empty data set we disable sending emails that warns about empty data sets for now. We may want to furtherdiscuss this, e.g. implement a daily report or getting the OAI-PMH's server header resp. answer and work on these (i.e. ignore if the server reports |
Scheduled to get data every 10 minutes. |
We checked this and it seems to work. Got 7 new resources in the last 20 minutes ! |
We should also update http://lobid.org/gnd/dataset:
(hm, wondering why the data seems to be updated only every hour (not every 10 minutes, even if we try to get data all 10 minutes). Maybe the RDF dumps are not provided as often as the PICA-data ? If that's the case we should decrease getting data interval @acka47 .) |
As we have just discussed in the review, we will schedule hourly updates. |
Done scheduling hourly. Every_hour:40m. 👍 |
None is willing to write the blog. As this is not mandatory, I am closing this issue here. |
I think we still should do this. We could keep it short:
|
I've deployed it, see https://blog.lobid.org/. |
By W.G. from UB Müster came the request to use a shorter interval for the updates. Daily updates would not be suffixient for them and DNB provides updates all 10 min.
Perhaps we adjust our updates updates even if we do not meet 10 min.
The text was updated successfully, but these errors were encountered: