Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(basemaps): New cli listing jobs for topo raster map standardised workflow. BM-1127 #1145

Draft
wants to merge 47 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
934394c
WIP
Wentao-Kuang Nov 22, 2024
fb1ab61
New topo-list-job commands to list the jobs for topo raster import
Wentao-Kuang Nov 24, 2024
722fe74
Minor fixes and unit tests.
Wentao-Kuang Nov 24, 2024
b6a8d7a
merge conficts
Wentao-Kuang Nov 29, 2024
50ef9ca
Process standardising in the argo-tasks for topo map
Wentao-Kuang Nov 26, 2024
b5499e3
Build on a dgal container
Wentao-Kuang Nov 26, 2024
5f21781
WIP
Wentao-Kuang Nov 26, 2024
9dcf630
Add stac creation and download for gdal
Wentao-Kuang Nov 27, 2024
8768af9
Update to use basemaps/shared fsa
Wentao-Kuang Nov 27, 2024
83db0cb
update the tmp folder for each cog processing
Wentao-Kuang Nov 27, 2024
11d3fc5
Group the task to process cogs in seperated nodes
Wentao-Kuang Nov 28, 2024
a283ab9
Ouput to tmp for tile.json
Wentao-Kuang Nov 28, 2024
342e6b2
Check exist before download
Wentao-Kuang Nov 28, 2024
c57dd2d
Remove the slice for test.
Wentao-Kuang Nov 28, 2024
ef9d78a
refined Stac item.json and collection.json creation processes
tawera-manaena Nov 29, 2024
86a6711
added new task to append file:checksum info after cog creation
tawera-manaena Nov 29, 2024
edac1c1
added 'include-latest' flag for validation task (wip)
tawera-manaena Nov 29, 2024
63e2a58
refined the topo-stac-creation processes and moved its functions into…
tawera-manaena Dec 2, 2024
8fa54f3
Add alpha layer for the created cogs
Wentao-Kuang Dec 2, 2024
d676df2
updated the StacItem creation process to extract the epsg from each T…
tawera-manaena Dec 3, 2024
1186935
Reproject chatham islands
Wentao-Kuang Dec 3, 2024
5ac61ab
Add checksum for source in cog creation.
Wentao-Kuang Dec 3, 2024
ec3ce26
Stop thrown for broken epsg tiles
Wentao-Kuang Dec 3, 2024
d8028e7
Write the broken file out
Wentao-Kuang Dec 3, 2024
6c8c790
Set same target resolution for NZTM
Wentao-Kuang Dec 3, 2024
5677c04
updated cog creation process to match gsd based on projection
tawera-manaena Dec 4, 2024
659fd61
Remove checksum for collection json item links
Wentao-Kuang Dec 4, 2024
b19e45d
refactor: updated processes to be epsg agnostic
tawera-manaena Dec 8, 2024
3994738
fix: updated processes to passthrough tiff metadata
tawera-manaena Dec 9, 2024
3800b9d
fix: update target.json artifact format to work with Argo 'withParam'
tawera-manaena Dec 9, 2024
6ef612f
refactor: split the gdal processes into functions and added option su…
tawera-manaena Dec 9, 2024
f62220e
fix: updated targets.json to include epsg code with url
tawera-manaena Dec 9, 2024
e14d1af
Add the trim pixels
Wentao-Kuang Dec 10, 2024
db4fab3
Don't error if no size
Wentao-Kuang Dec 10, 2024
849f52e
refactor: remove unused gdal command options
tawera-manaena Dec 16, 2024
6214183
refactor: type safety for additional Stac Item properties
tawera-manaena Dec 17, 2024
fadb57c
Revert "Build on a dgal container"
tawera-manaena Dec 17, 2024
ae4a30e
Give ulid for tmp job folder and remove gdalwrap
Wentao-Kuang Dec 18, 2024
243c4ea
Fix the path of temp folder
Wentao-Kuang Dec 18, 2024
daa9c08
Add docker.io to the container
Wentao-Kuang Dec 18, 2024
da8db62
feature: append basemaps options to generated Stac Item files
tawera-manaena Jan 7, 2025
35c7343
Add more information to the stac item
Wentao-Kuang Jan 8, 2025
34373a2
Update the tile.json structure for cogify to parse
Wentao-Kuang Jan 8, 2025
7b8b839
Remove tileMatrix
Wentao-Kuang Jan 9, 2025
75dc131
feat: append slug and tile matrix metadata to generated Stac Item files
tawera-manaena Jan 9, 2025
32edd5c
Remove topo-cog-creation cli and use cogify creat cog instead.
Wentao-Kuang Jan 12, 2025
6f1f8b0
Remove docker in the container.
Wentao-Kuang Jan 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5,214 changes: 4,717 additions & 497 deletions package-lock.json

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@
"@aws-sdk/credential-providers": "^3.438.0",
"@aws-sdk/lib-storage": "^3.440.0",
"@basemaps/config": "^7.7.0",
"@basemaps/config-loader": "^7.12.0",
"@basemaps/geo": "^7.5.0",
"@basemaps/shared": "^7.12.0",
"@chunkd/fs": "^10.0.9",
"@chunkd/source-aws-v3": "^10.1.3",
"@cogeotiff/core": "^9.0.3",
Expand Down
5 changes: 5 additions & 0 deletions src/cli.info.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import * as ulid from 'ulid';

export const CliInfo = {
package: '@linzjs/argo-tasks',
// Git version information
Expand All @@ -7,3 +9,6 @@ export const CliInfo = {
// Github action that the CLI was built from
buildId: process.env['GITHUB_RUN_ID'] ? `${process.env['GITHUB_RUN_ID']}-${process.env['GITHUB_RUN_ATTEMPT']}` : '',
};

/** Unique Id for this instance of the cli being run */
export const CliId = ulid.ulid();
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import assert from 'node:assert';
import { describe, it } from 'node:test';

import { extractMapCodeAndVersion } from '../extractors/extract-map-code-and-version.js';

describe('extractMapCodeAndVersion', () => {
const FakeDomain = 's3://topographic/fake-domain';
const FakeFiles = [
{ input: `${FakeDomain}/MB07_GeoTifv1-00.tif`, expected: { mapCode: 'MB07', version: 'v1-00' } },
{ input: `${FakeDomain}/MB07_GRIDLESS_GeoTifv1-00.tif`, expected: { mapCode: 'MB07', version: 'v1-00' } },
{ input: `${FakeDomain}/MB07_TIFFv1-00.tif`, expected: { mapCode: 'MB07', version: 'v1-00' } },
{ input: `${FakeDomain}/MB07_TIFF_600v1-00.tif`, expected: { mapCode: 'MB07', version: 'v1-00' } },
{
input: `${FakeDomain}/AX32ptsAX31AY31AY32_GeoTifv1-00.tif`,
expected: { mapCode: 'AX32ptsAX31AY31AY32', version: 'v1-00' },
},
{
input: `${FakeDomain}/AZ36ptsAZ35BA35BA36_GeoTifv1-00.tif`,
expected: { mapCode: 'AZ36ptsAZ35BA35BA36', version: 'v1-00' },
},
];

it('Should parse the correct MapSheet Names', async () => {
for (const file of FakeFiles) {
const output = extractMapCodeAndVersion(file.input);
assert.equal(output.mapCode, file.expected.mapCode, 'Map code does not match');
assert.equal(output.version, file.expected.version, 'Version does not match');
}
});

it('Should not able to parse a version from file', async () => {
const wrongFiles = [`${FakeDomain}/MB07_GeoTif1-00.tif`, `${FakeDomain}/MB07_TIFF_600v1.tif`];
for (const file of wrongFiles) {
assert.throws(() => extractMapCodeAndVersion(file), new Error('Version not found in the file name'));
}
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { Bounds } from '@basemaps/geo';
import { Tiff } from '@cogeotiff/core';

import { logger } from '../../../log.js';
import { findBoundingBox } from '../../../utils/geotiff.js';

/**
* Attempts to extract bounds from the given Tiff object.
*
* @param tiff: The Tiff object from which to extract bounds
*
* @returns if succeeded, a Bounds object. Otherwise, null.
*/
export async function extractBoundsFromTiff(tiff: Tiff): Promise<Bounds | null> {
try {
const bounds = Bounds.fromBbox(await findBoundingBox(tiff));

logger.info({ found: true }, 'extractBoundsFromTiff()');
return bounds;
} catch (e) {
logger.info({ found: false }, 'extractBoundsFromTiff()');
return null;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import { Epsg } from '@basemaps/geo';
import { Tiff, TiffTagGeo } from '@cogeotiff/core';

import { logger } from '../../../log.js';
import { extractEpsg } from '../../generate-path/path.generate.js';

const projections = [
['Universal Transverse Mercator Zone', Epsg.Wgs84],
['Chatham Islands Transverse Mercator 2000', Epsg.Citm2000],
['New Zealand Transverse Mercator 2000', Epsg.Nztm2000],
] as const;

export function extractEpsgFromTiff(tiff: Tiff): Epsg | null {
// try to extract the epsg directly from the tiff
try {
const epsg = Epsg.get(extractEpsg(tiff));
if (epsg != null) {
logger.info({ found: epsg.code }, 'extractEpsgFromTiff()');
return epsg;
}
} catch {
// try to extract the epsg from the tiff's projected citation geotag
const tag = tiff.images[0]?.valueGeo(TiffTagGeo.ProjectedCitationGeoKey);

for (const [citation, epsg] of projections) {
if (tag?.startsWith(citation)) {
logger.info({ found: epsg.code }, 'extractEpsgFromTiff()');
return epsg;
}
}
}

logger.info({ found: false }, 'extractEpsgFromTiff()');
return null;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import path from 'path';

import { logger } from '../../../log.js';
import { tryParseUrl } from '../../common.js';

/**
* Extract the map code and version from the provided path.
* Throws an error if either detail cannot be parsed.
*
* @param file: filepath from which to extract the map code and version
*
* @example
* file: "s3://linz-topographic-upload/topographic/TopoReleaseArchive/NZTopo50_GeoTif_Gridless/CJ10_GRIDLESS_GeoTifv1-00.tif"
* returns: { mapCode: "CJ10", version: "v1-00" }
*
* @returns an object containing the map code and version
*/
export function extractMapCodeAndVersion(file: string): { mapCode: string; version: string } {
const url = tryParseUrl(file);
const filePath = path.parse(url.href);
const fileName = filePath.name;

// extract map code from head of the file name (e.g. CJ10)
const mapCode = fileName.split('_')[0];
if (mapCode == null) throw new Error('Map sheet not found in the file name');

// extract version from tail of the file name (e.g. v1-00)
const version = fileName.match(/v(\d)-(\d\d)/)?.[0];
if (version == null) throw new Error('Version not found in the file name');

logger.info({ mapCode, version }, 'extractMapCodeAndVersion()');
return { mapCode, version };
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import { Size } from '@basemaps/geo';
import { Tiff } from '@cogeotiff/core';

import { logger } from '../../../log.js';

/**
* Attempts to extract a size from the given Tiff object.
*
* @param tiff: The Tiff object from which to extract the size
*
* @returns if succeeded, a Size object. Otherwise, null.
*/
export function extractSizeFromTiff(tiff: Tiff): Size | null {
try {
const size = tiff.images[0]?.size ?? null;

logger.info({ found: true }, 'extractSizeFromTiff()');
return size;
} catch (e) {
logger.info({ found: false }, 'extractSizeFromTiff()');
return null;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import { Tiff } from '@cogeotiff/core';

import { logger } from '../../../log.js';
import { extractBoundsFromTiff as extractBoundsFromTiff } from '../extractors/extract-bounds-from-tiff.js';
import { extractEpsgFromTiff } from '../extractors/extract-epsg-from-tiff.js';
import { extractMapCodeAndVersion } from '../extractors/extract-map-code-and-version.js';
import { extractSizeFromTiff as extractSizeFromTiff } from '../extractors/extract-size-from-tiff.js';
import { brokenTiffs } from '../topo-stac-creation.js';
import { ByDirectory } from '../types/by-directory.js';
import { TiffItem } from '../types/tiff-item.js';

/**
* We need to assign each tiff to a group based on its map code (e.g. "AT24").
* For each group, we then need to identify the latest version and set it aside from the rest.
* The latest version will have special metadata, whereas the rest will have similar metadata.
*
* @param tiffs: The tiffs to group by epsg, and map code
* @returns a `ByDirectory<TiffItem>` promise
*/
export async function groupTiffsByDirectory(tiffs: Tiff[]): Promise<ByDirectory<TiffItem>> {
// group the tiffs by directory, epsg, and map code
const byDirectory = new ByDirectory<TiffItem>();

// create items for each tiff and store them into 'all' by {epsg} and {map code}
for (const tiff of tiffs) {
const source = tiff.source.url;
const { mapCode, version } = extractMapCodeAndVersion(source.href);

const bounds = await extractBoundsFromTiff(tiff);
const epsg = extractEpsgFromTiff(tiff);
const size = extractSizeFromTiff(tiff);

if (bounds == null || epsg == null || size == null) {
if (bounds == null) {
brokenTiffs.noBounds.push(`${mapCode}_${version}`);
logger.warn({ mapCode, version }, 'Could not extract bounds from tiff');
}

if (epsg == null) {
brokenTiffs.noEpsg.push(`${mapCode}_${version}`);
logger.warn({ mapCode, version }, 'Could not extract epsg from tiff');
}

if (size == null) {
brokenTiffs.noSize.push(`${mapCode}_${version}`);
logger.warn({ mapCode, version }, 'Could not extract width or height from tiff');
}

continue;
}

const item = new TiffItem(tiff, source, mapCode, version, bounds, epsg, size);

// push the item into 'all' by {epsg} and {map code}
byDirectory.all.get(epsg.toString()).get(mapCode, []).push(item);
}

// for each {epsg} and {map code}, identify the latest item by {version} and copy it to 'latest'
for (const [epsg, byMapCode] of byDirectory.all.entries()) {
for (const [mapCode, items] of byMapCode.entries()) {
const sortedItems = items.sort((a, b) => a.version.localeCompare(b.version));

const latestItem = sortedItems[sortedItems.length - 1];
if (latestItem == null) throw new Error();

// store the item into 'latest' by {epsg} and {map code}
byDirectory.latest.get(epsg).set(mapCode, latestItem);
}
}

logger.info(
byDirectory.all.entries().reduce((obj, [epsg, byMapCode]) => {
return { ...obj, [epsg]: byMapCode.entries().length };
}, {}),
'numItemsPerEpsg',
);

return byDirectory;
}
27 changes: 27 additions & 0 deletions src/commands/basemaps-topo-import/mappers/map-epsg-to-slug.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import { EpsgCode } from '@basemaps/geo';

import { logger } from '../../../log.js';

const slugs: { [key in EpsgCode]?: string } = {
[EpsgCode.Nztm2000]: 'new-zealand-mainland',
[EpsgCode.Citm2000]: 'chatham-islands',
};

/**
* Attempts to map the given EpsgCode enum to a slug.
*
* @param epsg: The EpsgCode enum to map to a slug
*
* @returns if succeeded, a slug string. Otherwise, null.
*/
export function mapEpsgToSlug(epsg: EpsgCode): string | null {
const slug = slugs[epsg];

if (slug == null) {
logger.info({ found: false }, 'mapEpsgToSlug()');
return null;
}

logger.info({ found: true }, 'mapEpsgToSlug()');
return slug;
}
79 changes: 79 additions & 0 deletions src/commands/basemaps-topo-import/stac/create-base-stac-item.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import { TileMatrixSet } from '@basemaps/geo';
import { GeoJSONPolygon } from 'stac-ts/src/types/geojson.js';

import { CliId, CliInfo } from '../../../cli.info.js';
import { logger } from '../../../log.js';
import { MapSheetStacItem } from '../types/map-sheet-stac-item.js';
import { TiffItem } from '../types/tiff-item.js';

const CLI_DATE = new Date().toISOString();
const DEFAULT_TRIM_PIXEL_RIGHT = 1.7;

/**
* This function creates a base StacItem object based on the provided parameters.
*
* @param fileName: The map sheet's filename
* @example "CJ10" or "CJ10_v1-00"
*
* @param tiffItem TODO
*
* @returns a StacItem object
*/
export function createBaseStacItem(fileName: string, tiffItem: TiffItem, tileMatrix: TileMatrixSet): MapSheetStacItem {
logger.info({ fileName }, 'createBaseStacItem()');

const item: MapSheetStacItem = {
type: 'Feature',
stac_version: '1.0.0',
id: fileName,
links: [
{ rel: 'self', href: `./${fileName}.json`, type: 'application/json' },
{ rel: 'collection', href: './collection.json', type: 'application/json' },
{ rel: 'parent', href: './collection.json', type: 'application/json' },
{ rel: 'linz_basemaps:source', href: tiffItem.source.href, type: 'image/tiff; application=geotiff' },
],
assets: {
source: {
href: tiffItem.source.href,
type: 'image/tiff; application=geotiff',
roles: ['data'],
},
},
stac_extensions: ['https://stac-extensions.github.io/file/v2.0.0/schema.json'],
properties: {
datetime: CLI_DATE,
map_code: tiffItem.mapCode,
version: tiffItem.version.replace('-', '.'), // e.g. "v1-00" to "v1.00"
'proj:epsg': tiffItem.epsg.code,
'source.width': tiffItem.size.width,
'source.height': tiffItem.size.height,
'linz_basemaps:options': {
tileId: fileName,
tileMatrix: tileMatrix.identifier,
preset: 'webp',
blockSize: 512,
bigTIFF: 'no',
compression: 'webp',
quality: 100,
overviewCompress: 'webp',
overviewQuality: 90,
overviewResampling: 'lanczos',
sourceEpsg: tiffItem.epsg.code,
addalpha: true,
noReprojecting: true,
srcwin: [0, 0, tiffItem.size.width - DEFAULT_TRIM_PIXEL_RIGHT, tiffItem.size.height],
},
'linz_basemaps:generated': {
package: CliInfo.package,
hash: CliInfo.hash,
version: CliInfo.version,
datetime: CLI_DATE,
},
},
geometry: { type: 'Polygon', coordinates: tiffItem.bounds.toPolygon() } as GeoJSONPolygon,
bbox: tiffItem.bounds.toBbox(),
collection: CliId,
};

return item;
}
Loading
Loading