Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TP2000-1493 - TAP driven quota open data export #1302

Draft
wants to merge 40 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9f1eee5
added task for quotas open data export - WIP
dougmills-DIT Sep 20, 2024
0865312
added task for quotas open data export - WIP
dougmills-DIT Sep 25, 2024
fd2010d
added config for S3
dougmills-DIT Sep 26, 2024
26eac10
added unit tests for storages
dougmills-DIT Sep 27, 2024
cf2cfdb
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
eb334b8
added task for quotas open data export - WIP
dougmills-DIT Sep 20, 2024
cc3d7f2
added task for quotas open data export - WIP
dougmills-DIT Sep 25, 2024
46db08a
added config for S3
dougmills-DIT Sep 26, 2024
a1edf32
added unit tests for storages
dougmills-DIT Sep 27, 2024
80f5e87
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
9697913
Merge remote-tracking branch 'origin/TP2000-1493_quota_open_data_expo…
dougmills-DIT Oct 4, 2024
33723ac
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
ea8bc08
added descriptions
dougmills-DIT Oct 7, 2024
5abd5e3
added descriptions
dougmills-DIT Oct 7, 2024
08ad865
added descriptions
dougmills-DIT Oct 24, 2024
8518de0
fix tests
dougmills-DIT Nov 13, 2024
5e44da3
fix tests
dougmills-DIT Nov 13, 2024
e308050
fix tests
dougmills-DIT Nov 13, 2024
38ce45a
added task for quotas open data export - WIP
dougmills-DIT Sep 20, 2024
8ca560d
added task for quotas open data export - WIP
dougmills-DIT Sep 25, 2024
d0f5b43
added config for S3
dougmills-DIT Sep 26, 2024
d4a9ab3
added unit tests for storages
dougmills-DIT Sep 27, 2024
20e5af4
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
eed621d
added task for quotas open data export - WIP
dougmills-DIT Sep 20, 2024
56c6cae
added task for quotas open data export - WIP
dougmills-DIT Sep 25, 2024
a945d7f
added config for S3
dougmills-DIT Sep 26, 2024
1c2bcde
added unit tests for storages
dougmills-DIT Sep 27, 2024
2871e1e
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
15b8a0c
added unit tests for storages and quotas exporter
dougmills-DIT Oct 4, 2024
d04f938
added descriptions
dougmills-DIT Oct 7, 2024
20bc8ca
added descriptions
dougmills-DIT Oct 7, 2024
ef83567
added descriptions
dougmills-DIT Oct 24, 2024
ac2f71d
fix tests
dougmills-DIT Nov 13, 2024
ab9dfd9
fix tests
dougmills-DIT Nov 13, 2024
b279cba
fix tests
dougmills-DIT Nov 13, 2024
6e681ec
test updates
dougmills-DIT Dec 31, 2024
198eb72
Merge remote-tracking branch 'origin/TP2000-1493_quota_open_data_expo…
dougmills-DIT Dec 31, 2024
d7cdfef
lint updates
dougmills-DIT Dec 31, 2024
41e399f
Merge branch 'master' into TP2000-1493_quota_open_data_export
dougmills-DIT Jan 8, 2025
41310fe
minor change to exporter query
dougmills-DIT Jan 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,4 @@ _dumped_cache.pkl
# Database dumps
*.sql
/.vscode/settings.json
/quotas_export/**
50 changes: 50 additions & 0 deletions exporter/management/commands/export_quotas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import logging
from typing import Any
from typing import Optional

from django.core.management import BaseCommand
from django.core.management.base import CommandParser

from exporter.quotas.tasks import export_and_upload_quotas_csv

logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = (
"Create a CSV of quotas for use within data workspace to produce the "
"HMRC tariff open data CSV. The filename take the form "
"quotas_export_<yyyymmdd>.csv. Care should be taken to ensure that "
"there is sufficient local file system storage to accommodate the "
"CSV file (although it should not be very large, less than 5MB "
"(1.8MB at time of creation) - if you choose to target remote S3 "
"storage, then a temporary local copy of the file will be created "
"and cleaned up."
)

def add_arguments(self, parser: CommandParser) -> None:
parser.add_argument(
"--asynchronous",
action="store_const",
help="Queue the CSV export task to run in an asynchronous process.",
const=True,
default=False,
)
parser.add_argument(
"--save-local",
help=(
"Save the quotas CSV to the local file system under the "
"(existing) directory given by DIRECTORY_PATH."
),
dest="DIRECTORY_PATH",
)
return super().add_arguments(parser)

def handle(self, *args: Any, **options: Any) -> Optional[str]:
logger.info(f"Triggering quotas export to CSV")

local_path = options["DIRECTORY_PATH"]
if options["asynchronous"]:
export_and_upload_quotas_csv.delay(local_path)
else:
export_and_upload_quotas_csv(local_path)
43 changes: 43 additions & 0 deletions exporter/quotas/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""
quotas Export
=============

The quotas export system will query the TAP database for published quota data and store in a CSV
file.

The general process is:

1. query the TAP database for the correct dataset to export.
2. Iterate the query result and create the data for the output.
3. Write the data to the CSV file
4. Upload the result to the designated storage (S3 or Local)


This process has been chosen to optimise for:

- Speed, query and data production speed will be a lot faster when processed at source.
- Testability, We have the facility to test the output and process within TAP effectively
- Adaptability, With test coverage highlighting any issues caused by database changes etc., the adaptability
if this implementation is high
- Data Quality, Using TAP to produce the data will improve the quality of the output as it's using the same filters
and joins as TAP its self does - removing the need to run queries in SQL which has been problematic, and is
difficult to maintain.
"""

import os
import shutil
from itertools import chain
from pathlib import Path
from tempfile import NamedTemporaryFile

import apsw
from django.apps import apps
from django.conf import settings

from exporter.quotas import runner
from exporter.quotas import tasks


def make_export(quotas_csv_named_temp_file: NamedTemporaryFile):
quota_csv_exporter = runner.QuotaExport(quotas_csv_named_temp_file)
quota_csv_exporter.run()
Loading
Loading