Skip to content

Commit

Permalink
doc: (dataset-)path resolution pattern for command implementations
Browse files Browse the repository at this point in the history
This aims to illustrate the necessary boilerplate to add the
conventional DataLad path resolution  behavior to a command.

Importantly, this is done completely outside the implementation
of the actual command. It is all parameterization of the
`@datalad_command` decorator.
  • Loading branch information
mih committed Oct 28, 2024
1 parent 257f35e commit a380d68
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 2 deletions.
10 changes: 8 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
The `datalad-core` documentation
================================

Package overview
----------------
Package API overview
--------------------

Also see the :ref:`modindex`.

Expand All @@ -17,6 +17,12 @@ Also see the :ref:`modindex`.
repo
runners

Implementation patterns
-----------------------

.. toctree::
patterns/dataset_paths


Indices and tables
==================
Expand Down
57 changes: 57 additions & 0 deletions docs/patterns/dataset_paths.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
(Dataset-specific) path resolution
----------------------------------

If a command operates on an item on the file system that is identified by its
path, and can (optionally) operate in the context of a specific dataset, this
is achieved with two arguments: ``dataset``, and ``path``. It can be the case
that only one of these parameters is needed (and occasionally even neither one
is required) for a specific command invocation, hence they are typically both
used with ``None`` default values.

The following parameter preprocessor setup for ``@datalad_command`` can be used
to guarantee that the function body of a command will always receive:

- a ``pathlib`` path object instance via the ``path`` argument
- a ``Dataset`` instance via the ``dataset`` argument from which an
(optional) dataset context can be identified.

>>> from datalad_core.commands import (
... EnsureDataset,
... JointParamProcessor,
... )
>>> from datalad_core.constraints import EnsurePath
>>> preproc=JointParamProcessor(
... {
... 'dataset': EnsureDataset(),
... 'path': EnsurePath()
... },
... proc_defaults={'dataset', 'path'},
... tailor_for_dataset={
... 'path': 'dataset',
... }
... ),

The above pattern takes care of resolving any path given to ``path``
(including ``None``) to an actual path. This path is adequately determined,
also considering the value of any ``dataset`` parameter. This follows
the conventions of path resolution in DataLad:

- any absolute path is taken as-is
- any relative path is interpreted as relative to CWD, unless there is
a dataset context based on a :class:`Worktree` instance. Only in the
latter case a relative path is interpreted as relative to the
worktree root

Technically, this pattern works, because

- even ``None`` defaults will be processed for ``dataset`` and ``path``
- the ``tailored_for_dataset`` mapping causes the ``dataset`` argument
to be resolved first (before ``path``)
- and the :class:`EnsurePath` constraint is tailored to any :class:`Dataset`
given to ``dataset``, by generating a matching :class:`EnsureDatasetPath`
constraint for the ``path`` parameter internally

Taken together, this pattern simplifies command implementation, because
a number of checks are factored out into common implementation following
accepted conventions. Command implementations can inspect the specifics
of a user-provided dataset context via :attr:`Dataset.pristine_spec`.

0 comments on commit a380d68

Please sign in to comment.