Skip to content

Commit

Permalink
Fixed holiday date import. Added date samples.
Browse files Browse the repository at this point in the history
  • Loading branch information
Tim Howgego committed Jan 31, 2021
1 parent 47ee509 commit d1cd0bd
Show file tree
Hide file tree
Showing 8 changed files with 43 additions and 16 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include samples/*
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Latest Version](https://img.shields.io/pypi/v/atcociftogtfs.svg)](https://pypi.org/project/atcociftogtfs/) [![Test Status](https://github.com/timhowgego/atcociftogtfs/workflows/test_atcociftogtfs/badge.svg)](https://github.com/timhowgego/atcociftogtfs/actions?query=workflow%3Atest_atcociftogtfs)

Converts ATCO.CIF (ATCO-CIF) public transport schedule files to [static GTFS format](https://gtfs.org/reference/static). ATCO (Association of Transport Coordinating Officers) CIF (Common Interface File) was the United Kingdom standard for bus schedule data transfer for the first decade of the 2000s, but has since been largely replaced by [TransXchange](https://www.gov.uk/government/collections/transxchange). ATCO-CIF differs from [the CIF format used by UK railways](https://wiki.openraildata.com/index.php/CIF_File_Format).
Converts ATCO.CIF (ATCO-CIF) public transport schedule files to [static GTFS format](https://gtfs.org/reference/static). ATCO (Association of Transport Coordinating Officers) CIF (Common Interface File) was the United Kingdom standard for bus timetable data transfer for the first decade of the 2000s, but has since been largely replaced by [TransXchange](https://www.gov.uk/government/collections/transxchange). ATCO-CIF differs from [the CIF format used by UK railways](https://wiki.openraildata.com/index.php/CIF_File_Format).

The converter supports ATCO-CIF version 5 (the only version ever deployed) but the current implementation focuses only on the core schedule/stop information that characterises most networks: There is no support for interchange (transfers), clustering (stop parents), journey associations (blocks), or most AIM data extensions (including hail-and-ride). By default, bank (public) holiday variations are ignored, and all dates are assumed to be in school term-time - but both assumptions can be overridden if the user provides bespoke lists of dates (via command line arguments `-b` and `-s`). Stop grid coordinate conversion is included, but the (EPSG) grid must be defined (via command line argument `-e`).

Expand All @@ -22,17 +22,16 @@ followed by one or more space-separated ATCO.CIF data sources (ATCO.CIF file, di

If you do not understand the data you are importing, initially add two switches: `-u` (which protects against common _gotchas_, such as one bus operator with two identically numbered routes in different places) and `-v` (which gives feedback on processing and data).

To output comprehensive GTFS information you will need to specify `-b` (with a file listing bank holidays), `-e` (`29903` in Ireland, `27700` in Great Britain), and `-s` (with a file listing school term time periods) - all detailed below.
To output comprehensive GTFS information you will need to specify `-b` (with a file listing bank holidays), `-e` (`29903` in Ireland, `27700` in Great Britain), and `-s` (with a file listing school term time periods) - all detailed under Command Line Usage below.

## Usage
Example files containing Northern Ireland [bank holiday](https://www.nidirect.gov.uk/articles/bank-holidays) and [school term](https://www.education-ni.gov.uk/articles/school-holidays) dates can be found in the `samples` subdirectory. These files are provided as examples only, and may now not be accurate.

Command prompt usage:
## Command Line Usage

python -m atcociftogtfs [optional arguments] source [source ...]

where `source` is one or more ATCO.CIF data sources: directory, cif, url, zip (mixed sources, or sources containing a mixture, are fine). Possible optional arguments:

* `-h`, `--help`: Show help.
* `-b [BANK_HOLIDAYS]`, `--bank_holidays [BANK_HOLIDAYS]`: Filename (directory optional) for text file containing `yyyymmdd` bank (public) holidays, one per line. Optional, defaults to treating all days as non-holiday.
* `-d`, `--directional_routes`: Uniquely identify inbound and outbound directions as different routes. Optional, defaults to combining inbound and outbound into the same route.
* `-e [EPSG]`, `--epsg [EPSG]`: EPSG Geodetic Parameter Dataset code. For Ireland, `29903`. For Great Britain, `27700`. Optional, but GTFS stop lat and lon will be 0 if argument is omitted.
Expand All @@ -43,11 +42,12 @@ where `source` is one or more ATCO.CIF data sources: directory, cif, url, zip (m
* `-m [MODE]`, `--mode [MODE]`: GTFS mode integer code. Optional, defaults to `3` (bus).
* `-u`, `--unique_ids`: Force IDs for operators, routes and stops to be unique to each ATCO-CIF file processed within a multi-file batch. Safely reconciles files from different sources, but creates data redundancies within the resulting GTFS file. Optional, defaults to the identifiers used in the original ATCO-CIF files.
* `-v`, `--verbose`: Verbose feedback of all progress to log or console. Optional, defaults to warnings and errors only.
* `-V`, `--version`: Prints atcociftogtfs version and exits.
* `-s [SCHOOL_TERM]`, `--school_term [SCHOOL_TERM]`: Filename (directory optional) for text file containing `yyyymmdd,yyyymmdd` (startdate,enddate) school term periods, one comma-separated pair of dates per line. Optional, defaults to treating all periods as school term-time.
* `-t [TIMEZONE]`, `--timezone [TIMEZONE]`: Timezone in IANA TZ format. Optional, defaults to `Europe/London`.

## Module
Single arguments `-h` or `--help` show help, while `-V` or `--version` shows version.

## Module Usage

The converter can also be integrated into any Python script as a module, for example:

Expand All @@ -62,6 +62,10 @@ Such an instance can be initialised with an `args` Namespace, in which values ar

The instance's internal Sqlite database can be queried directly using a cursor created as `my_instance.db.cursor()`. The structure of this database mimics that of the GTFS output, except table names are filenames stripped of their `.txt` (detailed by `_gtfs_structure` in `atcocif.py`).

## Northern Ireland Railways

At the time of writing, [Northern Ireland Railways timetable open data](https://www.opendatani.gov.uk/dataset/nir20160126v2) is officially labelled ATCO.CIF, but is not: The feed is a railway CIF - a lightweight version of the format used by the Rail Delivery Group (and previously _ATOC_) in Great Britain. NIR's `.CIF` file is equivalent to RDG's `.MCA` file. Instead of using this converter, use software intended for ATOC/RDG feeds, but spoof most of the other expected filenames as empty text files. A valid Master Station Name File (`.MSN`) is important - [a basic version is available here](https://gist.github.com/timhowgego/abf52c70edfabc3601f1d09dfe1fc4db). Note that any station opened since 2021 will need to be added manually. Since Ireland uses a different grid system, coordinates cannot be processed as if in Great Britain, so the coordinates in that dummy file are all zeros. GTFS creators can provide stop geography by adding [this stops.txt file](https://gist.github.com/timhowgego/90dd8a7c276f49e4217445701c5fb3f1) to any NIR GTFS file produced. That GTFS's `agency.txt` will likely also need to be hacked to add a complete "NI" record.

## Bugs and Contributions

Error reports and code improvements/extensions [are welcome](https://github.com/timhowgego/atcociftogtfs/issues). The current code should be functional, but is far from optimal. Please attach a copy of the relevant ATCO.CIF source file to reports about unexpected errors.
2 changes: 1 addition & 1 deletion atcociftogtfs/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2021.1.26"
__version__ = "2021.1.31"
5 changes: 4 additions & 1 deletion atcociftogtfs/atcocif.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,10 @@ def arguments(self, args=None):
for key, value in arguments.items():
if key in self._arg_vars:

if key in ["bh", "stt"] and value is not None:
if (
key in ["bank_holidays", "school_term"]
and value is not None
):
setattr(
atcocif,
key,
Expand Down
14 changes: 7 additions & 7 deletions atcociftogtfs/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,12 @@ def arguments():
)
parser.version = __version__

parser.add_argument(
"-V",
"--version",
action="version",
help="""Show version.""",
)
parser.add_argument(
"source",
nargs="+",
Expand Down Expand Up @@ -188,12 +194,6 @@ def arguments():
help="""Verbose feedback of all progress to log or console. Optional,
defaults to warnings and errors only.""",
)
parser.add_argument(
"-V",
"--version",
action="version",
help="""Prints atcociftogtfs version and exits.""",
)
parser.add_argument(
"-s",
"--school_term",
Expand Down Expand Up @@ -250,7 +250,7 @@ def walk(source, processor):
else:
status = processor.file(filename=source)
if status == 0:
logging.info("Completed %s", os.path.basename(source))
logging.info("Processed %s", os.path.basename(source))

else:

Expand Down
10 changes: 10 additions & 0 deletions samples/bank_holiday_ni.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
20210101
20210317
20210402
20210405
20210503
20210531
20210712
20210830
20211227
20211228
8 changes: 8 additions & 0 deletions samples/school_term_ni.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
20200901,20201028
20201031,20201221
20210102,20210217
20210220,20210316
20210318,20210331
20210410,20210502
20210504,20210530
20210601,20210630
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,5 @@ def read_version():
"pyproj",
],
python_requires='>=3.3', # According to vermin
include_package_data=True,
)

0 comments on commit d1cd0bd

Please sign in to comment.