This is a repository for storing and querying Office for National Statistics geographies within the geoparquet file format and importing into a DuckDB database.
DuckDB is a fast in-process database system. It is designed for analytical workloads and can be used as a library in other applications. We are compiling a number of scripts to import ONS geographies into DuckDB, where they will then be available to embed in applications.
The end goal will be to quickly be able to generate a duckdb database file (or multiple files) that can be used to query ONS geographies.
- Duck DB CLI - installed via brew
- PlanetLab's GPQ - installed via brew
- GDAL - installed via brew
brew install duckdb
brew install planetlabs/gpq
brew install gdal
Census boundaries and the ONS postcode directory are downloaded from the ONS Geoportal.
./download.sh
When the downloads are done the data is processed to create a number of geoparquet files.
./process.sh
These are pregenerated as part of this repository, and can be found in the data
directory.
lsoas.parquet
- Lower Super Output Areasmsoas.parquet
- Middle Super Output Areasons_postcode_directory.parquet
- A selection of columns from the ONS Postcode Directory
The geoparquet files can be imported into DuckDB using a shell script.
./createpostcodesdb.sh
This creates a file named ons_postcodes.duckdb
which can be used to query the data.
When this repository is released the duckdb database file will be added to the release page as a build item, so there is no need to run the above commands if you simply want the database file.
See the releases page for the latest release.
With duckdb installed the database can be launched:
duckdb ons_postcodes.duckdb
Whenever loading the database the following commands should be run to enable the geospatial functions:
LOAD spatial;
The database can be queried using SQL.
Find the postcode for a given point:
SELECT postcode, date_of_termination, county_code,county_electoral_division_code, local_authority_district_code,ward_code, easting, northing, country_code, region_code, westminster_parliamentary_constituency_code, output_area_11_code, lower_super_output_area_11_code, middle_super_output_area_11_code, built_up_area_24_code, rural_urban_11_code, index_multiple_deprivation_rank, output_area_21_code, lower_super_output_area_21_code, middle_super_output_area_21_code, longitude, latitude
FROM(
SELECT
st_distance(ST_Point(-2.250, 51.346), geometry) as distance,
*
FROM postcodes
WHERE ST_Within(geometry, ST_Buffer(ST_Point(-2.250, 51.346), 0.01))
AND date_of_termination IS NULL
ORDER BY distance ASC LIMIT 1);
Find the postcode for a given postcode:
SELECT postcode, date_of_termination, county_code,county_electoral_division_code, local_authority_district_code,ward_code, easting, northing, country_code, region_code, westminster_parliamentary_constituency_code, output_area_11_code, lower_super_output_area_11_code, middle_super_output_area_11_code, built_up_area_24_code, rural_urban_11_code, index_multiple_deprivation_rank, output_area_21_code, lower_super_output_area_21_code, middle_super_output_area_21_code, longitude, latitude
FROM postcodes where replace(postcode, ' ', '') = 'BA151DS';