You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is intended to serve as a coordination hub for RIA annex remote requirements, a description of implementation alternatives, and the selection of implementation options. (I am using "RIA annex remote" instead of "ORA" here to reduce the name space a little).
Requirements for the annex remote
The following lists contain the identified functional and non-functional requirements. Check-marked requirements apply. Un-checked requirements are identified but do not need to be fulfilled. Add new requirements by editing this issue and leaving a notification about the changes in the changelog.
Functional requirements
Compatible with the RIA implementation in datalad core:
support archives
Support side-channel git annex access on ria+ssh:-stores (git annex should be able to locally access the objects in the RIA store)
Support side-channel git annex access on ria+file:-stores
Support side-channel git annex access on ria+http:-stores
Implementation alternatives and status for the annex remote
IO abstraction vs multi-flavor RIA annex-remote implementation
In issue #99 we concluded that it is too restrictive to base the RIA annex-remote implemented on a file-system paradigm. It turned out that this abstraction layer is a logical bottleneck that works well for file-based access but does not translate easily to HTTP-based access. It is also unlikely to work for general object stores (it would require to extend the abstraction layer with object store-specific operations and switching between them in the higher-level implementation). See alse #30.
The chosen alternative is an implementation that uses object-store specific handler to implement the basic annex-remote operations, e.g. TRANSFER RETRIEVE, TRANSFER STORE, CHECKPRESENT, and DELETE.
This is currently done in PR #106. An abstract base class defines transfer_store, transfer_retrieve, checkpresent, and remove. ssh-, file-, and http-specific subclasses implement the abstract methods for the respective store.
Current choice: multi-flavor RIA annex-remote implementation
URL-operations vs. individual implementations
Generally, URL-operations map nicely onto annex remote-operations, e.g. TRANSFER RETRIEVE maps onto download. So it seems natural to completely rely on UrlOperations to implement the RIA annex remote (for supported URL-schemes).
But issue #102 (atomicity) and issue #103 (ensure_writable) highlight that annex remotes might not be fully supported yet.
There might also be an efficiency issue, at least for SshUrlOperations. SshUrlOperations set up a new ssh-connection for each operation. Therefore PR #106 uses the new persistent shell from datalad_next.shell (which is not yet merged into the main branch of datalad-next). The persistent shell supports arbitrary shell commands, which allows for efficient implementations of atomicity and ensure_writable (it also allows the remote execution of scripts, which can improve the efficiency of complex operations like ensure writable).
Current choice: individual implementations, using UrlOperations and persistent shells
Requirements for datalad create-sibling-ria
The "datalad create-sibling-ria"-commands should move from datalad-core to datalad-ria. The commands use the io-abstraction. If we drop the io-abstraction (as argued above), the commands should probably be re-implemented to remove the io-abstraction layer.
move datalad create-sibling-ria from datalad-core to datalad-ria
implement datalad create-sibling-ria without the io-abstraction. That means, base is on UrlOperations, datalad_next.shell and other existing mechanisms.
This issue is intended to serve as a coordination hub for RIA annex remote requirements, a description of implementation alternatives, and the selection of implementation options. (I am using "RIA annex remote" instead of "ORA" here to reduce the name space a little).
Requirements for the annex remote
The following lists contain the identified functional and non-functional requirements. Check-marked requirements apply. Un-checked requirements are identified but do not need to be fulfilled. Add new requirements by editing this issue and leaving a notification about the changes in the changelog.
Functional requirements
ria+ssh:
-stores (git annex should be able to locally access the objects in the RIA store)ria+file:
-storesria+http:
-storesria+ssh:
ria+file:
ria+sftp:
(from issue Support for SFTP as a RIA store? #100)ria+https:
ria+https:
Non-functional requirements
Implementation alternatives and status for the annex remote
IO abstraction vs multi-flavor RIA annex-remote implementation
In issue #99 we concluded that it is too restrictive to base the RIA annex-remote implemented on a file-system paradigm. It turned out that this abstraction layer is a logical bottleneck that works well for file-based access but does not translate easily to HTTP-based access. It is also unlikely to work for general object stores (it would require to extend the abstraction layer with object store-specific operations and switching between them in the higher-level implementation). See alse #30.
The chosen alternative is an implementation that uses object-store specific handler to implement the basic annex-remote operations, e.g.
TRANSFER RETRIEVE
,TRANSFER STORE
,CHECKPRESENT
, andDELETE
.This is currently done in PR #106. An abstract base class defines
transfer_store
,transfer_retrieve
,checkpresent
, andremove
. ssh-, file-, and http-specific subclasses implement the abstract methods for the respective store.Current choice: multi-flavor RIA annex-remote implementation
URL-operations vs. individual implementations
Generally, URL-operations map nicely onto annex remote-operations, e.g.
TRANSFER RETRIEVE
maps ontodownload
. So it seems natural to completely rely on UrlOperations to implement the RIA annex remote (for supported URL-schemes).But issue #102 (atomicity) and issue #103 (
ensure_writable
) highlight that annex remotes might not be fully supported yet.There might also be an efficiency issue, at least for
SshUrlOperations
.SshUrlOperations
set up a new ssh-connection for each operation. Therefore PR #106 uses the new persistent shell fromdatalad_next.shell
(which is not yet merged into the main branch of datalad-next). The persistent shell supports arbitrary shell commands, which allows for efficient implementations of atomicity andensure_writable
(it also allows the remote execution of scripts, which can improve the efficiency of complex operations like ensure writable).Current choice: individual implementations, using
UrlOperations
and persistent shellsRequirements for
datalad create-sibling-ria
The "datalad create-sibling-ria"-commands should move from datalad-core to datalad-ria. The commands use the
io
-abstraction. If we drop theio
-abstraction (as argued above), the commands should probably be re-implemented to remove theio
-abstraction layer.datalad create-sibling-ria
from datalad-core to datalad-riadatalad create-sibling-ria
without theio
-abstraction. That means, base is onUrlOperations
,datalad_next.shell
and other existing mechanisms.Changelog
2024-04-12: @christian-monch: created
The text was updated successfully, but these errors were encountered: