Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop replacement for ConfigManager #397

Closed
5 tasks
mih opened this issue Jun 1, 2023 · 5 comments
Closed
5 tasks

Develop replacement for ConfigManager #397

mih opened this issue Jun 1, 2023 · 5 comments

Comments

@mih
Copy link
Member

mih commented Jun 1, 2023

This is a key component of any datalad code. It is present in any datalad session datalad.cfg and part of any Dataset or *Repo instance.

The implementation in datalad-core is old, has various spaghetti pieces, and comprises at least 3 different APIs.

I believe this code needs an update for a number of reasons:

  • complex logic that is hard to understand from the given implementation
  • origin information of config items is complicated to obtain (both internally and from a user perspective), but for security reasons it is often desirable to be able to ignore sources that can be manipulated "externally" (e.g. via a "pull")
  • the scope setup does not match those of git-config -- ie. no support for worktree
  • the notion of overrides is clashing with Git's own command scope -- see Adjust ConfigManager to post overrides into the ENV #325 for the struggle to sort this out

Given the essential nature of this code, I think a replacement must

  • be initially API compatible with the present ConfigManager
  • shrink the API to what is needed (not what is possible; right now it is dict, and ConfigParser in Python and ConfigParser in GitPython, and some other pieces) via deprecations resulting actual removal
  • be better aligned with git-config in terms of scopes and concepts
  • be as fast or faster
  • come with minimal dependencies

TODO:

  • Assess actual API usage
  • Look for code duplication with other datalad code, and (re)use of (internal) helpers (e.g., for parsing .gitmodules) to inform refactoring
  • (Re)evaluate principle behavior (right now we read all config at once, and write individual items more-or-less one-by-one, but other behaviors are possible)
  • Propose design
    • better separation of read and write?
    • type support?
    • recognize scopes distinct from origins of config items
    • optimal internal/runtime representation of configuration items? presently optimized for repeated reads within a Python session, with reloads and writes being expensive
    • double-check API to support all desirable operations (e.g. something like replace-all within a particular scope...)
    • no longer rely on generic CommandError to communicate permission issues when setting (e.g. system) configuration
  • Implement and benchmark

Thoughts

Maybe have a dataclass

GitConfigItem:
   scope: GitConfigScopeType
   origin: str
   key: str
   value: str | Any

maybe with additional accessors like

  • section
  • name
  • asbool()
  • asint()

The internal representation within ConfigManager could be a list of these items, with amend/overwrite/replace rules applied on access. Or a series of lists, grouped by scope in a dict (likely scope removed from GitConfigScopeType then), or even grouped by (scope, origin) to facilitate reloads.

It could also make sense to have an explicit GitConfigMultiValueItem, that exposes a get_all() equivalent, rather than (only) having ConfigManager.get(getall: bool)

@adswa
Copy link
Member

adswa commented Jun 1, 2023

linking a few config manager related issues from core:

@mih
Copy link
Member Author

mih commented Jun 1, 2023

More corner/use cases are:

  • be able to turn on/off include directives
  • knowing which values (of multi-value items) came from which source
  • honor "protected configuration" different than other (Git treats these scopes as if they are controlled by the user or a trusted administrator. This is because an attacker who controls these scopes can do substantial harm without using Git, so it is assumed that the user’s environment protects these scopes against attackers.
  • honor GIT_CONFIG_NOSYSTEM
  • be aware of GIT_CONFIG[_GLOBAL|_SYSTEM]

mih added a commit to mih/datalad-next that referenced this issue Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
mih added a commit to mih/datalad-next that referenced this issue Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
mih added a commit to mih/datalad-next that referenced this issue Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
@mih
Copy link
Member Author

mih commented Jun 10, 2023

The ping in datalad/datalad#5383 reminded me that the present config manager also has no support for git annex config. And this one is special, because it has a fixed set it items, and rules of "engagement" are more complex

@mih
Copy link
Member Author

mih commented Sep 16, 2024

Making a start for this effort now.

mih added a commit to mih/datalad-next that referenced this issue Sep 17, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 17, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 23, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 23, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 25, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 26, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 26, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 26, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 26, 2024
mih added a commit to mih/datalad-next that referenced this issue Sep 27, 2024
mih added a commit to datalad/datasalad that referenced this issue Sep 29, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Sep 29, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to mih/datalad-next that referenced this issue Sep 29, 2024
mih added a commit to datalad/datasalad that referenced this issue Sep 30, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Sep 30, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Sep 30, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to mih/datalad-next that referenced this issue Oct 1, 2024
mih added a commit to mih/datalad-next that referenced this issue Oct 1, 2024
mih added a commit to mih/datalad-next that referenced this issue Oct 1, 2024
mih added a commit to mih/datalad-next that referenced this issue Oct 1, 2024
mih added a commit to datalad/datasalad that referenced this issue Oct 2, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 3, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 3, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 3, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 7, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 7, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 8, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
mih added a commit to datalad/datasalad that referenced this issue Oct 8, 2024
The new sub-package `datasalad.settings` provides a framework for
implementing a system where information items can be read from and
written to any number of sources, and sources are ordered to implement a
simple precedence rule.  An example of such a system is the layered Git
config setup, which system, global, local and other scopes.

This can serve as the basis for a revamped configuration manager for
DataLad.

This changeset is complete with tests and documentation.

Refs: datalad/datalad-next#397
@mih
Copy link
Member Author

mih commented Oct 14, 2024

datalad/datalad-core#19 provides a new ConfigManager implementation, built to support all aspects gathered in this issue. Even though, it may not actually support all proposed features at this point.

#760 has an exploration whether it would be possible to implement an adapter for the new manager with the API of the old one. This seems possible, except for some edge cases (some of which a more like bugs than features).

I am closing this issue here. Feature requests can be posted to https://github.com/datalad/datalad-core

@mih mih closed this as completed Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants