Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordination: RIA annex remote requirements, Implementation alternatives, and status #107

Open
11 of 15 tasks
christian-monch opened this issue Apr 12, 2024 · 0 comments
Open
11 of 15 tasks

Comments

@christian-monch
Copy link

christian-monch commented Apr 12, 2024

This issue is intended to serve as a coordination hub for RIA annex remote requirements, a description of implementation alternatives, and the selection of implementation options. (I am using "RIA annex remote" instead of "ORA" here to reduce the name space a little).

Requirements for the annex remote

The following lists contain the identified functional and non-functional requirements. Check-marked requirements apply. Un-checked requirements are identified but do not need to be fulfilled. Add new requirements by editing this issue and leaving a notification about the changes in the changelog.

Functional requirements

  • Compatible with the RIA implementation in datalad core:
    • support archives
  • Support side-channel git annex access on ria+ssh:-stores (git annex should be able to locally access the objects in the RIA store)
  • Support side-channel git annex access on ria+file:-stores
  • Support side-channel git annex access on ria+http:-stores
  • Support for ria+ssh:
  • Support for ria+file:
  • Support for ria+sftp: (from issue Support for SFTP as a RIA store? #100)
  • Read-only support for ria+https:
  • Write support for ria+https:
  • Support for POSIX-hosted RIA Stores
  • Support for Windows-hosted RIA Stores

Non-functional requirements

  • Correct (not negotiable)
  • Maintainable (not negotiable)
  • Efficient

Implementation alternatives and status for the annex remote

IO abstraction vs multi-flavor RIA annex-remote implementation

In issue #99 we concluded that it is too restrictive to base the RIA annex-remote implemented on a file-system paradigm. It turned out that this abstraction layer is a logical bottleneck that works well for file-based access but does not translate easily to HTTP-based access. It is also unlikely to work for general object stores (it would require to extend the abstraction layer with object store-specific operations and switching between them in the higher-level implementation). See alse #30.

The chosen alternative is an implementation that uses object-store specific handler to implement the basic annex-remote operations, e.g. TRANSFER RETRIEVE, TRANSFER STORE, CHECKPRESENT, and DELETE.

This is currently done in PR #106. An abstract base class defines transfer_store, transfer_retrieve, checkpresent, and remove. ssh-, file-, and http-specific subclasses implement the abstract methods for the respective store.

Current choice: multi-flavor RIA annex-remote implementation

URL-operations vs. individual implementations

Generally, URL-operations map nicely onto annex remote-operations, e.g. TRANSFER RETRIEVE maps onto download. So it seems natural to completely rely on UrlOperations to implement the RIA annex remote (for supported URL-schemes).
But issue #102 (atomicity) and issue #103 (ensure_writable) highlight that annex remotes might not be fully supported yet.

There might also be an efficiency issue, at least for SshUrlOperations. SshUrlOperations set up a new ssh-connection for each operation. Therefore PR #106 uses the new persistent shell from datalad_next.shell (which is not yet merged into the main branch of datalad-next). The persistent shell supports arbitrary shell commands, which allows for efficient implementations of atomicity and ensure_writable (it also allows the remote execution of scripts, which can improve the efficiency of complex operations like ensure writable).

Current choice: individual implementations, using UrlOperations and persistent shells

Requirements for datalad create-sibling-ria

The "datalad create-sibling-ria"-commands should move from datalad-core to datalad-ria. The commands use the io-abstraction. If we drop the io-abstraction (as argued above), the commands should probably be re-implemented to remove the io-abstraction layer.

  • move datalad create-sibling-ria from datalad-core to datalad-ria
  • implement datalad create-sibling-ria without the io-abstraction. That means, base is on UrlOperations, datalad_next.shell and other existing mechanisms.

Changelog

2024-04-12: @christian-monch: created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant