Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Platform-Level Security Enhancements #5052

Open
cwperks opened this issue Jan 22, 2025 · 2 comments
Open

[RFC] Platform-Level Security Enhancements #5052

cwperks opened this issue Jan 22, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request untriaged Require the attention of the repository maintainers and may need to be prioritized

Comments

@cwperks
Copy link
Member

cwperks commented Jan 22, 2025

Platform-Level Security Enhancements

This RFC is a collection of several efforts pertinent to Security for OpenSearch and Plugins. Collectively, these RFCs are about positioning OpenSearch and the security plugin to better manage security across the entire platform of OpenSearch and the plugins.

The RFCs in this list seek to:

  1. Move towards a zero-trust model for plugins and empower cluster administrators with controls to explicitly grant plugin's access instead of relying on implicit trust
  2. Remove the 2 most common reasons why plugins have dependencies on common-utils
    • Create a consistent and re-usable mechanism for resource sharing and authorization that empowers a resource owner to specify the level of access when sharing
    • Provide a replacement for Roles Injection that puts the security plugin into critical paths and remove the reliance on plugins telling security what roles to use when evaluating actions

RFCs:

Below is a high-level overview of each RFC. See the links above for further details.

1. JSM Replacement

The Java Security Manager (JSM) was once a core feature of Java that allows a system administrator to control the access that code running within the JVM has to system resources like the file system, network and other sensitive operations. OpenSearch uses JSM to sandbox plugins to prevent the plugins from performing system operations without explicit approval. A cluster administrator agrees to JSM policies when installing a plugin. In JDK 17, JSM was deprecated (See JEP 411) without a replacement. Below are 2 snippets from the JEP:

It is not a goal to provide a replacement for the Security Manager.

In the quarter-century since the Security Manager was introduced, adoption has been low. Only a handful of applications ship with policy files that constrain their own operations (e.g., ElasticSearch).

While JSM was deprecated, it was still functional until JDK 23. In JDK 24 it will be permanently disabled (See JEP 486).

The first RFC is about providing a replacement that provides sufficient enough security to sandbox OpenSearch and plugins.

2. ThreadContext.stashContext deprecation and replacement - Strengthen System Index Protection in the Plugin Ecosystem

This project is about making sure that plugins have access to perform the actions they need, but are restricted otherwise.

Current Model

Currently, when plugins want to access a system index, they wrap index operations around a block similar to:

try (ThreadContext.StoredContext ctx = threadContext.stashContext()) {
     // system index operations here
}

This executes the wrapped action in a fresh context and is done without authorization checks (analogous to sudo).

Proposed Model

With the goal of moving to zero-trust for plugins, this RFC seeks to create a replacement for ThreadContext.stashContext which provides the context necessary for the security plugin to perform authorization checks. With the replacement, plugins will be able to perform actions directly to their own system indices but are prevented from performing other actions on the cluster w/o explicit approval. In order to perform non-system index actions, a cluster administrator would need to accept the terms at installation time, similar to JSM policies.

With the concept of a Subject introduced into the IdentityPlugin extension point in core, the replacement for ThreadContext.stashContext() will involve using a subject associated with the plugin. Plugins that utilize system indices can request a subject by extending the IdentityAwarePlugin extension point and use this subject to run code. For example:

pluginSubject.runAs(() -> {
    // system index operations here
});

Using this replacement will inject the necessary information for security to authorize actions in code wrapped by this block.

3. Resource Sharing + Authorization

At a high-level, this seeks to solve 2 problems:

  1. Provide a consistent and re-usable mechanism from the security plugin - this aims to remove a lot of code duplication across plugins with sharable resources and ensures more consistent behavior across these plugins
  2. Provide a richer authorization scheme that empowers resource owners to specify the access level when sharing their own resources

Current Model

Comment explaining problems with current sharing and authz model for sharable plugin resources: opensearch-project/OpenSearch#16030 (comment)

There is widespread use amongst plugins for a setting called filter_by_backend_role. This setting provides a crude implementation of sharing where a resource is shared with other users on the platform if they share backend roles with the creator of the resource. For example, the creator of an anomaly detector automatically shares a detector with users that they share a backend role with.

What the end user can do with that detector is based off the roles they are mapped to. The user who creates the detector has no mechanism to specify the level of access when sharing.

For example, if the end-user is mapped to anomaly_detection_read_only then they cannot modify the detector, but if they are mapped to anomaly_detection_full_access then they have full access on the detector.

The steps that plugin developers take are:

  1. Setup a REST API that will create a resource as part of handling the request (i.e. REST API to create a model group - ml commons use-case)
  2. When a user calls this API with credentials, security will authenticate the request and inject user info into the ThreadContext
  3. Plugins parse the authenticated user from the ThreadContext using common-utils. Like this.
  4. The plugin will store a copy of the user (including their backend roles) with the resource metadata in a system index
  5. When searching through the resource index, plugins will find resources owned by the current user or owned by a user they share a backend role with

For a mental model, imagine the example of a Searchable Photo Album Plugin (relevant to me as I got married recently and we had to allow guests access to upload photos, but want to prohibit removing)

Let's say this plugin defines 3 access levels: 1) full_access, 2) comment and 3) read_only (note: comment implies read access as well)

In the current security model, when the creator of an album shares it with other users they cannot specify that the target user only has read access, comment access or full access. They can share it with a target user, but the target user's level of access is determined by their roles mapping.

For any Amazonians reading this, imagine if quip didn't allow the creator of a doc to specify the level of access when sharing. Instead, quip administrators would give blanket full access, blanket read access or blanket read + comment access to a user and then the user would get that level of access over any doc shared with them.

Proposed Model

At a high-level, security will introduce a method to register sharable resource indices with the security plugin. Since security will be conscious of the indices where sharable resources are stored, it can intercept calls and ensure that security is applied consistently. This removes a lot of code duplication across plugins in the default distribution that are left reimplementing the same pattern described above.

To re-iterate, there are a couple of things this RFC is trying to solve:

  1. Centralize the logic for filter_by_backend_role so that the security plugin determines which resources are accessible to the authenticated user
  2. Ultimately, provide fine-grained access for plugin resources where the sharer can specify the level of access when sharing

4. Roles Injection Deprecation + Replacement

To understand the motivation for this effort, its helpful to first articulate the current model for async task security:

Current Model

See comment on the Pull-Based Ingestion PR that describes the current paradigm for async task (scheduled job) security: opensearch-project/OpenSearch#16958 (comment)

The conventional pattern currently used across the system is:

  1. Setup a REST API that will create a resource as part of handling the request (i.e. REST API to create an anomaly detector - anomaly detection use-case)
  2. When a user calls this API with credentials, security will authenticate the request and inject user info into the ThreadContext
  3. Plugins parse this user from the ThreadContext using common-utils. Like this.
  4. The plugin will store a copy of the user (including their mapped roles) with the resource metadata in a system index

^ The steps above are all about setup. These steps are to get security roles that are injected back into the ThreadContext when an async task executes

The steps below are for using the user that was persisted above and instructing the security plugin how to authorize any actions:

  1. Obtain the user info that was persisted above
  2. Use common-utils InjectSecurity to inject roles into the thread context (Example). This is done bc async tasks run in a fresh thread context, there are no creds to authenticate.
  3. When security authorizes transport actions it will use the roles that were injected into the ThreadContext.

Proposed Model

This RFC seeks to put security in the middle and centrally store the permissions that a job should run with in an index owned and managed by the security plugin. As part of this effort, we plan an interim milestone of introducing API Tokens which introduces key capabilities to the security plugin that can be used to formally deprecate Roles Injection and provide a replacement. In the proposed model, security will centrally store the authorization info necessary for a job's runtime and hook into the Job Scheduler before runJob is invoked to ensure that the security plugin has necessary information to authorize any actions that are executed as part of the running of the job.

@cwperks cwperks added enhancement New feature or request untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jan 22, 2025
@cwperks
Copy link
Member Author

cwperks commented Jan 22, 2025

I'm confident that tackling these areas both enhances the overall security of the platform and would allow us to publish an SDK for plugin developers (+ docs) that allows plugin developers to focus on the core functionality of their plugin and not worry about the intricacies of interfacing with security. There is very tight coupling in the ecosystem today and this seeks to bring greater consistency and re-use.

@kkhatua
Copy link
Member

kkhatua commented Jan 22, 2025

Considering the challenge in making a clean switch to a more secure model can be difficult and very disruptive, I like the fact that this is not only doing that with 1,2 and 4; but i also like that this is forward thinking with # 3 (Resource Sharing + Authorization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged Require the attention of the repository maintainers and may need to be prioritized
Projects
None yet
Development

No branches or pull requests

2 participants