Skip to content

Latest commit

 

History

History
86 lines (75 loc) · 4.75 KB

README.md

File metadata and controls

86 lines (75 loc) · 4.75 KB

DEPRECATED

That repository is deprecated and not supported anymore.

Please switch to the current version of the library: https://github.com/agile-data-engine/adenotifier

adenotifier

This is a Python library for using Agile Data Engine Notify API. See Notify API documentation.

Modules:

  • manifest
  • notifier

The repository is provided for reference purposes only and the solution may require modifications to fit your use case. Note that this solution is not part of the Agile Data Engine product. Please use at your own caution.

Installation

Install with pip (define version):

pip install git+https://github.com/solita/[email protected]

Include in requirements.txt (define version):

git+https://github.com/solita/[email protected]

Usage

notifier

  • Use the add_to_manifest function to add a new entry into a manifest for a given data source. The function will find an open manifest or, if one does not exist, it will create a new one. Arguments:
  • Use the notify_manifests function to notify all open manifests for a given data source. Arguments:

Note that the base URL, key and key secret are specific to your ADE Runtime environment. Also the IP address range of your solution has to be allowed in the ADE service.

The functions expect a data source configuration JSON object in the following format:

{
    "id": "example_source/example_entity",
    "attributes": {
        "ade_source_system": "example_source",
        "ade_source_entity": "example_entity",
        "batch_from_file_path_regex": "batch\\.(\\d*)\\.csv\\.gz",
        "path_replace": "https://",
        "path_replace_with": "azure://",
        "single_file_manifest": false,
        "max_files_in_manifest": 1000
    },
    "manifest_parameters": {
        "columns": ["test1", "test2"],
        "compression": "GZIP",
        "delim": "SEMICOLON",
        "format": "CSV",
        "fullscanned": false,
        "skiph": 1
    }
}

where:

Attribute Mandatory Description
id x Unique identifier for the data source.
ade_source_system x Source system defined for the source entity in ADE.
ade_source_entity x Source entity name in ADE.
batch_from_file_path_regex Regular expression for parsing a batch number from a file name. Supports capturing groups, which are concatenated before casting to integer.
path_replace Old string value to be replaced in the source file path.
path_replace_with New string value the source file path will be replaced with.
single_file_manifest Add_to_manifest calls notify_manifest after file has been added.
max_files_in_manifest Max files to be added to single manifest.
columns ADE manifest parameter, see Notify API documentation
compression ADE manifest parameter, see Notify API documentation
delim ADE manifest parameter, see Notify API documentation
format x ADE manifest parameter, see Notify API documentation
fullscanned ADE manifest parameter, see Notify API documentation
skiph ADE manifest parameter, see Notify API documentation

In addition to these attributes, you may include other details in the configuration for the purposes of your application. For example, add the storage account name, container name and folder path (Azure) or bucket name and folder path (AWS, GCP) to identify the data source from a file url in a file created event before calling add_to_manifest or notify_manifests.

manifest

The manifest module contains a class for managing manifests in the ADE Notify API. The Manifest class is used by the functions in the notifier module. Use the Manifest class for custom solutions, see comments in the code.