Tools to Explore Price Transparency Data

This repository is part of a personal project. I'm exploring the price transparency data posted by large insurance companies.

Insurance companies must post transparency data on their websites, but they don't make it easy to access and retrieve the data in an aggregate form.

collectRawDataURLs.py is a script to retrieve the URLs where the data lives. This tool will fetch thousands of URLs to .json.gz files

parseRecords.py is a tool to unzip and parse the large files themselves

Each file is built around a schema recommended, but not required, by CMS. Here's a diagram of part of it, generated by JSON Crack.

Figure 1 - Part of the extremely complex schema defined by CMS for JSON files containing price transparency data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
collectRawDataURLs.py		collectRawDataURLs.py
parseRecords.py		parseRecords.py

Provide feedback