Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function allowlist for Executor #1146

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Function allowlist for Executor #1146

wants to merge 2 commits into from

Conversation

diptanu
Copy link
Collaborator

@diptanu diptanu commented Jan 5, 2025

Making the executors only run functions of specified compute graphs.
The functions are specified as
indexify-cli executor --function <namespace>:<workflow>:<compute_func>:<version> --function ....

The allowlist is used with Executors in production setups. E.g. only run
a certain function in a Kubernetes POD with an Executor.
This is a default production setup because each function has different
resource usage and it’s easier to size containers per function.
Same with other container configs like secrets, roles, volumes.

indexify-cli executor --dev mode is still present. It makes server try
to run any function on the executor. This is convenient for development
of Indexify and for integration testing.

Co-authored-by: Diptanu Gon Choudhury [email protected]
Co-authored-by: Eugene Batalov [email protected]

@diptanu diptanu force-pushed the executor-containers branch from 38dc7bf to 864f356 Compare January 5, 2025 05:01
@eabatalov eabatalov self-requested a review January 6, 2025 11:16
@@ -24,11 +24,14 @@ def __init__(
self,
executor_id: str,
code_path: Path,
compute_graph: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be optional.

@@ -28,11 +33,19 @@ def __init__(
self.config_path = config_path
self._logger = structlog.get_logger(module=__name__)

hostname = socket.gethostname()
ip_address = socket.gethostbyname(hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks error prone as we don't guarantee that this IP address is reachable by server if we obtain it this way. It's better to pass it in CLI args into Executor. Because this depends on deployment environment. I'm also not sure why we need to pass this IP address to the Server because so far Server wasn't calling Executor but Executor was calling Server which is guaranteed to have a hostname reachable by Executors.

@@ -525,11 +524,21 @@ pub struct InvocationId {
pub id: String,
}

#[derive(Debug, Serialize, Deserialize, ToSchema)]
pub struct FunctionContainer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current OpenSource terms this is FunctionExecutor.

ip_address = socket.gethostbyname(hostname)

functions = []
if namespace is not None and compute_graph is not None and function is not None and graph_version is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I expected all these fields to be optional and passed directly to Server and it does filtering on its side with whatever provided values. E.g. if only namespace and compute_graph are set then server sees this in the ExecutorMetadata and then routes tasks for the namespace and the compute_graph to this Executor.

#[derive(Debug, Serialize, Deserialize, ToSchema)]
pub struct ExecutorMetadata {
pub id: String,
pub executor_version: String,
pub addr: String,
pub functions: Vec<FunctionContainer>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not instantiated FunctionContainers/FunctionExecutors. Because of this I'd call these things as Executor filtering labels. Server will need to eventually track FunctionContainers/FunctionExecutors that really exist. Not mixing the filtering labels and existing FunctionContainers/FunctionExecutors would be nice imo.

@diptanu diptanu force-pushed the executor-containers branch from a005315 to 6fdec9e Compare January 8, 2025 07:50
@@ -27,11 +33,31 @@ def __init__(
self.config_path = config_path
self._logger = structlog.get_logger(module=__name__)

hostname = socket.gethostname()
ip_address = socket.gethostbyname(hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines also result in requiring to run Executor under sudo, fyi.

@eabatalov eabatalov force-pushed the executor-containers branch from 54eb620 to f08a42d Compare January 9, 2025 18:59
@eabatalov eabatalov changed the title making executors start containers explicitly Function allowlist for Executor Jan 9, 2025
@eabatalov eabatalov force-pushed the executor-containers branch from f08a42d to 474ea2a Compare January 9, 2025 19:19
Making the executors only run functions of specified compute graphs.
The functions are specified as

`indexify-cli executor --function <namespace>:<workflow>:<compute_func>:<version> --function ...`.

The allowlist is used with Executors in production setups. E.g. only run
a certain function in a Kubernetes POD with an Executor.
This is a default production setup because each function has different
resource usage and it’s easier to size containers per function.
Same with other container configs like secrets, roles, volumes.

`indexify-cli executor --dev` mode is still present. It makes server try
to run any function on the executor. This is convenient for development
of Indexify and for integration testing.

Co-authored-by: Diptanu Gon Choudhury <[email protected]>
Co-authored-by: Eugene Batalov <[email protected]>
@eabatalov eabatalov force-pushed the executor-containers branch from 474ea2a to d85f840 Compare January 9, 2025 19:34
This allows users to implement versioning of their graphs with
any semantic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants