Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip backslash from OSM names (and fix provisioning and change analysis parameters) #947

Merged
merged 7 commits into from
May 15, 2023

Conversation

KlaasH
Copy link
Contributor

@KlaasH KlaasH commented May 12, 2023

Overview

The most important part of this PR is the smallest—a pair of lines in import_osm.sh that clears any backslash characters from the converted.osm file before we feed it to osm2pgrouting to import into the database. It turns out osm2pgrouting really doesn't like backslash characters. The error looks like this:
image

That certainly looks like an error, but I had assumed it would only affect the actual segments with bad names. But it turns out where it says "Vertices inserted: 0 Split ways inserted 0" after the errors, that's an indication that it lost a lot more than the affected segments. Apparently it processes the input in chunks, and whatever other segments happen to be in the same chunk as the bad segments just get dropped on the floor.

I don't believe there's any functional purpose for which we would want backslashes in our OSM input. They're not used for encoding (the file is in UTF-8) and we don't actually use segment names for anything in the analysis anyway. So I just did a simple sed 's/\\/backslash/' to make them go away.

Other items:

  • The next most important thing this does is increase the memory available to the analysis task from 32GB to 61GB. It only matters for Los Angeles, I think, but it does seem to matter for Los Angeles. And we're using i3.2xlarge instances anyway, so we might as well allocate all the memory.
  • I also adjusted some of the parameters used to set PostgreSQL memory and disk usage limits. We have lots of memory and lots of disk, so it makes sense to be generous, to get all the speed we can and avoid any failures in the largest jobs.
  • Speaking of those parameters, I made the staging ones match production. They had been set lower, presumably to make repeated testing cheaper, but we only very occasionally test on staging these days, so it makes sense to make it as similar to production as possible.
  • There were a few errors in the build when I tried to spin my development instance back up. I tried to find the path of least resistance for getting around them, which meant upgrading the base image in the case of the tilegarden container but just adjusting the Debian package list to be able to keep using the old image for the angularjs container.

Demo

I found the location of one of the segments named "\" and made a tiny boundary centered on it. Here's are the before-and-after shots showing the affect of losing chunks from the OSM import:
Before:
image

After:
image

Notes

The GitHub Actions deployment to staging isn't working (see #937), but I pushed this branch to staging manually.

Testing Instructions

I don't think it seems necessary to spin up a Batch worker on staging and do more runs there, but it probably makes sense to test this locally to the point of running a small analysis.

  • Spin up a dev instance per the README instructions
  • Make a neighborhood for this boundary (via the "Neighborhoods" management page): backslash_location.zip (use FIPS code 3651000)
  • Run an analysis for it (click "Run Analysis" on the Jobs page then copy-paste the command that gets printed into the Docker log into a shell within the VM).
  • Confirm that it runs successfully and you can see the results by clicking on it in the All places list once it has finished.

Checklist

  • Add entry to CHANGELOG.md

Resolves #946

KlaasH added 7 commits May 5, 2023 15:56
Fixes a few errors that have showed up in the container provisioning due to
everything being pretty old. Specifically:
- Debian 'stretch' is out of support.
  - I changed the tilegarden container to use buster, because it's already on a
    version of Node for which there's a 'buster' base image
  - The AngularJS container wouldn't build with the newer Node versions I tried,
    so I left it on 'stretch' and updated the apt sources list to "archive", where
    the packages live now.
- The Bower registry is using an SSL cert based on a root certificate that 'stretch'
  doesn't have. I briefly tried updating the certs, but settled on just setting Bower
  to "strict-ssl: false"
This is something that we started doing on more recent projects and it's
nice to have, since uploading the modified tfvars file is something we need to
do frequently and composing the command by hand is kind of annoying.
We've been running this job with 8 vCPUs but only 31 GiB of RAM, which doesn't
make a ton of sense because that's all the CPU but only half the RAM of an
i3.2xlarge instance. To be fair, we didn't think we would actually get close to
filling up even the 31 GiB, but apparently we are on the largest jobs, so it's
time to allocate the rest.

I also:
- Increased the parameters we apply to the PostgreSQL instance within the
  analysis container, to make more use of the copious resources we're providing
  to these tasks. Some of the values are probably overkill, but there's no reason
  not to use the resources, and some could make the jobs faster or more robust.
- Matched all the parameters in the staging job definiton to the ones in
  production. We seldom run staging jobs, so making them smaller doesn't appreciably
  help with keeping costs down, and it seems valuable to make the test environment
  as similar to production as we practically can.
Sets the size of /dev/shm (a shared-memory RAM drive built into Linux) to 4GB
via the Batch task definition. Apparently Postgres uses it for some tasks (like
VACUUM ANALYZE) and has been running into space constraints (that's what the
"ERROR: could not resize shared memory segment" errors are about).
It turns out osm2pgrouting *really* doesn't like it if you have a backslash in
a segment name.  It produces an error that knocks out not just the segment in
question, but everything that the tool would have otherwise loaded in the chunk
that it's processing.
Here's a Stack Overflow about it: https://gis.stackexchange.com/a/12873
but the recommendation there is to modify the osm2pgrouting code. I just added
a command to the import_osm.sh script to use 'sed' to strip out any backslash
characters.
@@ -39,7 +39,7 @@ ALTER SYSTEM SET maintenance_work_mem TO '${PFB_MAINTENANCE_WORK_MEM}';
ALTER SYSTEM SET max_wal_size TO '${PFB_MAX_WAL_SIZE}';
ALTER SYSTEM SET shared_buffers TO '${PFB_SHARED_BUFFERS}';
ALTER SYSTEM SET synchronous_commit TO ${PFB_SYNCHRONOUS_COMMIT};
ALTER SYSTEM SET temp_file_limit TO ${PFB_TEMP_FILE_LIMIT};
ALTER SYSTEM SET temp_file_limit TO '${PFB_TEMP_FILE_LIMIT}';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been a (large) numerical value, but changing it to "100GB" made it start breaking without quotes.

@@ -1 +1 @@
boto3==1.4.4
boto3==1.26.132
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the requirements file for the little script that updates the batch task definition. Long-term we want to get rid of it and handle the task definition in Terraform, but for now we're still using it. This upgrade was necessary because the very-old version of boto3 didn't know about the linuxParameters option.

@KlaasH KlaasH requested a review from ddohler May 12, 2023 20:54
Copy link
Contributor

@ddohler ddohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to be working great! I wasn't able to actually view the details for the job due to an apparent CORS error, but the job did run successfully (once I got my VM resuscitated).

}, {
"name": "PFB_WORK_MEM",
"value": "3072MB"
"value": "3GB"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's the same as it was, just in GB rather than MB -- was that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was going to raise that one,too, but it's the one that gets allocated for every thread in parallel processes so it can actually balloon pretty fast. So I left it, but still changed the format to match the others.

@KlaasH KlaasH merged commit 54b27a9 into develop May 15, 2023
@KlaasH KlaasH deleted the feature/kjh/strip-backslash-from-osm-names branch May 15, 2023 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Brooklyn analysis results are missing some road segments
2 participants