Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for minutely/hourly/daily updates #26

Open
missinglink opened this issue Nov 25, 2014 · 3 comments
Open

Support for minutely/hourly/daily updates #26

missinglink opened this issue Nov 25, 2014 · 3 comments

Comments

@missinglink
Copy link
Member

missinglink commented Nov 25, 2014

Once the schema is stable enough we can consider doing partial updates to the index instead of importing the whole OSM planet every time.

ref: http://planet.openstreetmap.org/replication/

Note: some investigation needs to be done about how these diffs work, for example:

  • are only new tags shown, are deletions notated?
  • how to handle node/way/relation deletions?
  • are records complete? ie. do they contain a snapshot of the record or only a diff of the changes?
  • etc..

ref: http://wiki.openstreetmap.org/wiki/OsmChange

We also need to come up with a strategy for elasticsearch as it does not support partial updates to documents (without scripting enabled) so we will most likely need to GET the record, modify it and PUT it back; which has potential to cause problems related to idempotency/atomicity/race-conditions.

Eg.

var osmium = require('osmium');
var file = new osmium.File("/tmp/209.osc.gz");
var reader = new osmium.Reader(file, { node: true, way: false });

var handler = new osmium.Handler();
handler.on('node', function(object) {
  console.log( JSON.stringify({
    type: 'node',
    id: object.id,
    lat: object.lat,
    lon: object.lon,
    tags: object.tags()
  }, null, 2));
});

osmium.apply(reader, handler);
@masterpropper
Copy link

any updates on this? how do you handle partial updates or could someone share a update script to do the full import and switch es index using aliases?

@orangejulius
Copy link
Member

Hi @masterpropper,
It's looking less and less likely that we will support minutely updates, for the following reasons:

  • The complexity of running Pelias would increase greatly: right now the data in Elasticsearch can be considered static once built. This means we can do things like save a copy of the database and know it's current, and nothing related to the importers needs to be running while Pelias services are started
  • Elasticsearch performance suffers quite a bit when it is both adding new data and handling queries
  • With cloud hardware, launching a separate cluster to do background builds is very easy

We will eventually share some scripts for doing this update. Unfortunately the previously used scripts for this went down with the Mapzen ship and can't be used or shared.

The Elasticsearch Index Aliases and Snapshot and Restore pages are where you should start for now. There's a good description of the process over in pelias/pelias#412 (comment)

@bboure
Copy link
Member

bboure commented Jun 23, 2019

@orangejulius Are these scripts for background import available somewhere?
Eventually, I would be interested in automating the process and get a fresh build every month or something.

P.S.: I am using AWS

michaelkirk pushed a commit to michaelkirk-pelias/openstreetmap that referenced this issue Jun 14, 2023
…mbered_streets

Add cleanup module with numbered street handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants