Skip to content

Commit

Permalink
Merge pull request #131 from lanl/danieljohnson/doc_changes
Browse files Browse the repository at this point in the history
New plugin reader documentation
  • Loading branch information
jpulidojr authored Dec 17, 2024
2 parents b60599b + 6f84d29 commit 5288553
Show file tree
Hide file tree
Showing 3 changed files with 100 additions and 0 deletions.
90 changes: 90 additions & 0 deletions docs/contributing_readers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
====================================
Making a Reader for Your Application
====================================

DSI readers are the primary way to transform outside data to metadata that DSI can ingest. Readers are Python classes that must include a few methods, namely ``__init__``, ``pack_header``, and ``add_rows``.

Initializer: ``__init__(self) -> None:``
-------------------------------------------
``__init__`` is where you can include all of your initialization logic, just make sure to initialize your superclass.
Note: ``__init__`` can also take whatever parameters needed for a given application.

Example ``__init__``: ::

def __init__(self) -> None:
super().__init__() # see "plugins" to determine which superclass your reader should extend

Pack Header: ``pack_header(self) -> None``
---------------------------------------------

``pack_header`` is responsible for setting a schema, registering which columns
will be populated by the reader. The ``set_schema(self, table_data: list, validation_model=None) -> None`` method
is available to subclasses of ``StructuredMetadata``, which allows one to simply give a list of column names to register.
``validation_model`` is an pydantic model that can help you enforce types, but is completely optional.

Example ``pack_header``: ::

def pack_header(self) -> None:
column_names = ["foo", "bar", "baz"]
self.set_schema(column_names)

Add Rows: ``add_rows(self) -> None``
-------------------------------------

``add_rows`` is responsible for appending to the internal metadata buffer.
Whatever data is being ingested, it's done here. The ``add_to_output(self, row: list) -> None`` method is available to subclasses
of ``StructuredMetadata``, which takes a list of data that matches the schema and appends it to the internal metadata buffer.

Note: ``pack_header`` must be called before metadata is appended in ``add_rows``. Another helper method of
``StructuredMetadata`` is ``schema_is_set``, which provides a way to tell if this restriction is met.

Example ``add_rows``: ::

def add_rows(self) -> None:
if not self.schema_is_set():
self.pack_header()

# data parsing can go here (or abstracted to other functions)
my_data = [1, 2, 3]

self.add_to_output(my_data)

*Alternate* Add Rows: ``add_rows(self) -> None``
-------------------------------------
If you are confident that the the data you read in ``add_rows`` is in the form of an OrderedDict (the data structure used to store all ingested data), you can bypass the use of ``pack_header`` and ``add_to_output`` with an alternate ``set_schema`` function.

This function, ``set_schema_2(self, collection, validation_model=None) -> None``, directly assigns the data you read in ``add_rows`` to the internal DSI abstraction layer, provided that the data you pass as the ``collection`` variable is an OrderedDict. This method allows you to quickly append data to the abstraction wholesale, rather than row-by-row.

Example alternate ``add_rows``: ::

def add_rows(self) -> None:

# data is stored as an OrderedDict so can use set_schema2
my_data = OrderedDict()
my_data["jack"] = 10
my_data["joey"] = 20
my_data["amy"] = 30

self.set_schema2(my_data)

Implemented Examples
--------------------------------
If you want to see some full reader examples in-code, some can be found in
`dsi/plugins/env.py <https://github.com/lanl/dsi/blob/main/dsi/plugins/env.py>`_.
``Hostname`` is an especially simple example to go off of.

Loading Your Reader
-------------------------
There are two ways to load your reader, internally and externally.

- Internally: If you want your reader loadable internally with the rest of the provided implementations (in `dsi/plugins <https://github.com/lanl/dsi/tree/main/dsi/plugins>`_), it must be registered in the class variables of ``Terminal`` in `dsi/core.py <https://github.com/lanl/dsi/blob/main/dsi/core.py>`_. If this is done correctly, your reader will be loadable by the ``load_module`` method of ``Terminal``.
- Externally: If your reader is not along side the other provided implementations, possibly somewhere else on the filesystem, your reader will be loaded externally. This is done by using the ``add_external_python_module`` method of ``Terminal``. If you load an external Python module this way (ex. ``term.add_external_python_module('plugin','my_python_file','/the/path/to/my_python_file.py')``), your reader will then be loadable by the ``load_module`` method of ``Terminal``.


Contributing Your Reader
--------------------------
If your reader is helpful and acceptable for public use, you should consider making a pull request (PR) into DSI.

Please note that any accepted PRs into DSI should satisfy the following:
- Passes all tests in ``dsi/plugins/tests``
- Has no ``pylama`` errors/warnings (see `dsi/.githooks <https://github.com/lanl/dsi/tree/main/.githooks>`_)
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The Data Science Infrastructure Project (DSI)
plugins
backends
core
contributing_readers
tiers
examples

Expand Down
9 changes: 9 additions & 0 deletions docs/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,12 @@ Note that any contributed plugins or extension should include unit tests in ``p

.. automodule:: dsi.plugins.env
:members:

Optional Plugin Type Enforcement
==================================

Plugins take data in an arbitrary format, and transform it into metadata which is queriable in DSI. Plugins may enforce types, but they are not required to enforce types. Plugin type enforcement can be static, like the Hostname default plugin. Plugin type enforcement can also be dynamic, like the Bueno default plugin.


.. automodule:: dsi.plugins.plugin_models
:members:

0 comments on commit 5288553

Please sign in to comment.