The code and queries etc here are unlikely to be updated as my process evolves. Later repos will likely have progressively different approaches and more elaborate tooling, as my habit is to try to improve at least one part of the process each time around.
All the relevant metadata now lives in config.json: ideally nothing will need tweaked after this. We need to be careful here to get the history of Wikidata IDs for the constituency correct.
jq -r .wikipedia config.json | xargs bundle exec ruby scraper.rb | tee wikipedia.csv
xsv search -v -s party 'Q' wikipedia.csv
One "Glow Bowling Party" from 1997. Skipping this for now, but I might fix it up later.
xsv search -v -s election 'Q' wikipedia.csv | xsv select electionLabel | uniq
Nothing missing.
xsv search -v -s id 'Q' wikipedia.csv | xsv select name | tail +2 |
sed -e 's/^/"/' -e 's/$/"@en/' | paste -s - |
xargs -0 wd sparql find-candidates.js |
jq -r '.[] | [.name, .item.value, .election.label, .constituency.label, .party.label] | @csv' |
tee candidates.csv
xsv join -n --left 2 wikipedia.csv 1 candidates.csv | xsv select '10,1-8' | sed $'1i\\\nfoundid' | tee combo.csv
bundle exec ruby generate-qs.rb config.json | tee commands.qs
Then sent to QuickStatements as https://tools.wmflabs.org/editgroups/b/QSv2T/1598003214078