Skip to content
This repository has been archived by the owner on Jun 9, 2022. It is now read-only.

uniformize older scrapers #18

Open
alexn11 opened this issue Dec 21, 2020 · 1 comment
Open

uniformize older scrapers #18

alexn11 opened this issue Dec 21, 2020 · 1 comment
Assignees
Labels
maybe Possible improvement

Comments

@alexn11
Copy link
Collaborator

alexn11 commented Dec 21, 2020

Some of the scrapers have different columns (the-bfd.py, cato-institute.py, co2-coalition.py) or missing source column (bbc-non-climate, breibart-defense, the-onion-politics). If these are to be used again, should change them (and remove the scripts in the normalizer directory which are intended to correct that).

@alexn11 alexn11 self-assigned this Dec 21, 2020
@alexn11 alexn11 added the maybe Possible improvement label Dec 21, 2020
@ricjhill
Copy link
Collaborator

This scraping script works on most data sources. The output is standardized

Just build the docker file and have look. The differences between the datasources we can fix in the filter function for each datasource and by custom extractions from the HTML . The output is standardized

https://github.com/ClimateMisinformation/Scrapers/tree/create-container-climatediscussionnexus.com/infrastructure/docker/climatediscussionnexus-scrape

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
maybe Possible improvement
Projects
None yet
Development

No branches or pull requests

2 participants