Skip to content

every-politician-scrapers/bristol-northwest-elections-wikipedia

Repository files navigation

The code and queries etc here are unlikely to be updated as my process evolves. Later repos will likely have progressively different approaches and more elaborate tooling, as my habit is to try to improve at least one part of the process each time around.


Step 1: Configure config.json

All the relevant metadata now lives in config.json: ideally nothing will need tweaked after this. We need to be careful here to get the history of Wikidata IDs for the constituency correct.

Step 1: Scrape the results

jq -r .wikipedia config.json | xargs bundle exec ruby scraper.rb | tee wikipedia.csv

Step 2: Check for missing party IDs

xsv search -v -s party 'Q' wikipedia.csv

Two aligned independents, but also one "Compaign for a More Prosperous Britain" which could have been a Wikipedia but wasn't, so I've linked that up.

Ste 3: Check for missing election IDs

xsv search -v -s election 'Q' wikipedia.csv | xsv select electionLabel | uniq

Nothing missing.

Step 4: Generate possible missing person IDs

xsv search -v -s id 'Q' wikipedia.csv | xsv select name | tail +2 |
  sed -e 's/^/"/' -e 's/$/"@en/' | paste -s - |
  xargs -0 wd sparql find-candidates.js |
  jq -r '.[] | [.name, .item.value, .election.label, .constituency.label, .party.label] | @csv' |
  tee candidates.csv

Step 5: Combine Those

xsv join -n --left 2 wikipedia.csv 1 candidates.csv | xsv select '10,1-8' | sed $'1i\\\nfoundid' | tee combo.csv

Step 6: Generate QuickStatements commands

bundle exec ruby generate-qs.rb config.json | tee commands.qs

Then sent to QuickStatements as https://tools.wmflabs.org/editgroups/b/QSv2T/1598002289195

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published