ifremer-sync

This repo contains the scripts and documentation needed to re-sync the entire Argo dataset from ifremer, and re-establish regular imports of this data to Argovis.

Rebuilding from scratch

If you have no files downloaded from ifremer and nothing in your argo or argoMeta collections (but have defined those collections per https://github.com/argovis/db-schema) (ie you are rebuilding from nothing):

Start by rsyncing ifremer's argo data: rsync -avzhi --delete --omit-dir-times --no-perms vdmzrs.ifremer.fr::argo/ /ifremer.
Follow the instruction in the 'Rebuild mongo argo collections without repeating rsync' section to load these results in to MongoDB
Build the image defined in Dockerfile, and run it as the image in the Kube cron job described in ifremer-cron.yaml if you're orchestrating with Kube, or as a regular cronjob via ifremer-cron.sh on Swarm or a bare container server. Note the storage requirements assumed in both cases.

Note the first two steps together can take weeks, depending on resourcing. From there, if all goes well, your cron script of choice will update your MongoDB instance with new data nightly. Check the logs periodically, as edge cases do appear in the Argo data, and decisions may have to be made on how you'd like your Argovis instance to handle them.

Rebuild mongo argo collections without repeating rsync

If for some reason the rsync'ed mirror is intact but the mongo collections need to be rebuilt from scratch (irrecoverably corrupted or a schema change), see freshrebuild.sh. This assumes the rsync mirror can be found in the filesystem at /ifremer (ie mount ifremer-mirror PVC at /ifremer), and will leave you with a file /tmp/profiles2translate appropriate for feeding to testload.sh to rebuild the argo and argoMeta collections. This workflow was confirmed to produce exactly the same amount of lines in /tmp/profiles2translate as there were argo profiles concurrently in mongo, as it should; future runs should verify this where possible.

Consider parallelizing this by slicing /tmp/profiles2translate into equal parts and running one per pod, for example as in devpod.yaml.

Note also testload.sh has some simple fault tolerance built in, and will try to keep track of progress and resume after an interrupt; checking profiles immediately before and after these breakpoints is the first place to look if an unexpected(ly small) number of profiles appear in the final collection rebuild.

Manually redo a failed nightly update

If something interrupts a nightly update that finished rsync'ing and parsing the rsync log but was interrupted during the mongo load, best to suspend the cronjob and redo that evening's update; see redo-sync.yaml for a pod to manage ifremer-sync-redo.sh, which takes the existing updatedprofiles list in the logging directory you must specify in the yaml file's command, and reruns the corresponding uploads to mongo.

Integrity checking

roundtrip.[py|yaml] and Dockerfile-roundtrip define a pod that will randomly pick profiles from mongo, redownload the ifremer source that defines them, and double checks the collection contents are correct. This is meant to run as a background process to flag errors and demonstrate robustness.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
audit		audit
parameters		parameters
tests		tests
util		util
Dockerfile		Dockerfile
Dockerfile-roundtrip		Dockerfile-roundtrip
LICENSE.txt		LICENSE.txt
README.md		README.md
devpod.yaml		devpod.yaml
freshrebuild.py		freshrebuild.py
freshrebuild.sh		freshrebuild.sh
ifremer-cron.sh		ifremer-cron.sh
ifremer-cron.yaml		ifremer-cron.yaml
ifremer-sync-redo.sh		ifremer-sync-redo.sh
ifremer-sync.sh		ifremer-sync.sh
process-rsync-result.py		process-rsync-result.py
redo-sync.yaml		redo-sync.yaml
roundtrip.py		roundtrip.py
roundtrip.yaml		roundtrip.yaml
summary-computation.py		summary-computation.py
testload.sh		testload.sh
translateProfile.py		translateProfile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ifremer-sync

Rebuilding from scratch

Rebuild mongo argo collections without repeating rsync

Manually redo a failed nightly update

Integrity checking

About

Releases

Packages

Languages

License

argovis/ifremer-sync

Folders and files

Latest commit

History

Repository files navigation

ifremer-sync

Rebuilding from scratch

Rebuild mongo argo collections without repeating rsync

Manually redo a failed nightly update

Integrity checking

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages