Complex-filters recap #2182

tcompa · 2025-01-15T14:44:56Z

This placeholder issue will collect all remaining open questions or TBD from:

tcompa · 2025-01-16T13:47:36Z

Here is my first recap of a bunch of relevant points, for how they are currently implemented in v2.11.

EDIT: we updated this comment, after #2187, #2190 and #2191

A review would be most welcome - cc @jluethi

Which objects have filters

DatasetV2 has columns for attribute_filters and type_filters.
JobV2 has an attribute_filters column.
WorkflowTaskV2 has a type_filters column.
TaskV2 has input_types and output_types

Valid attribute filters / typical cases

Standard valid examples:

attribute_filters={}
-> do not apply any filter

attribute_filters={"key1": ["A"]}
-> require that image has attribute key1 equal to A

attribute_filters={"key1": ["A", "B"]}
-> require that image has attribute key1 equal to A or B

attribute_filters={"key1": [1, 2]}
-> require that image has attribute key1 equal to 1 or 2

attribute_filters={"key1": [True, False]}
-> require that image has attribute key1 equal to True or False (which corresponds to no filtering)

attribute_filters={"key1": [1.5, -1.2]}
-> require that image has attribute key1 equal to 1.5 or -1.2

attribute_filters={"key1": [1, 2], "key2": ["A", "B"]}
-> require that image has (attribute key1 equal to 1 or 2) and (attribute key2 equal to A or B)

Valid attribute filters / unusual cases (TBD)

There are two edge cases, which are currently considered as valid:

First

attribute_filters={"key1": []}
-> require that image has attribute key1 in [], meaning that no image could ever match with this filter

This one formally correct but meaningless, and we could transform it into an invalid case.

Second

attribute_filters={"key1": None, "key2": ["A", "B"]}
-> require that image has attribute key2 equal to A or B, and do not apply any filtering for key1

This is due to:

Remaining backwards-compatible (since up to v2.10 setting an attribute value to None would disable that filter).
The fact that we often had to merge different sets of filters (not any more, see Replace attribute filters rather than merging them #2155) and then this was way to represent a patch that would disable a previously set filter.

The latter reason does not hold any more, and I guess the first one is not very relevant in real-life usage. Thus I'd be in favor of deprecating this option. Either way, we'll have to review this (because it's currently not fully backwards compatible, in the API schemas), and update the data-migration script.

Invalid attribute filters

The following set of attribute filters are currently considered invalid

{True: ["value"]}
because the key is not a string

{1: ["value"]},  # non-string key
because the key is not a string

{"key1": 1}
because the value is not a list

{"key1": True}
because the value is not a list

{"key1": "something"}
because the value is not a list

{"key1": [1], " key1  ": [1]}
because keys are not unique, after their normalization

{"key1": [None]}
because None is not a valid type

{"key1": [1, 2.0]}
because 1 is an integer and 2.0 is a float
(if needed, we could modify this by defining a new type which is a union of int and float)

{"key1": [1, "a"]}
because the list of values have non-homogeneous types

{"key1": [1, True]}
because the list of values have non-homogeneous types

{"key1": [[1, 2], [1, 2]]}
because the list of values has non-scalar items

Net effect of running a job on dataset filters

Running a job cannot lead to a change of dataset.attribute_filters, since there is no mechanism for changing dataset.attribute_filters via job execution.
Running a job can lead to a change of dataset.type_filters.

Which filters are consumed within the runner

Dataset attribute filters are ignored, in the runner, while job attribute filters are the only ones that are applied.
The clients are responsible for offering an easy way for the user to say "let's use the dataset filters as job filters", or to provide a custom set of filters.
When preparing the list of images to be processed by a single task, the following filters are used:
- job.attribute_filters (identical for all tasks)
- dataset.type_filters (may change during workflow execution)
- workflowtask.type_filters and workflowtask.task.input_types (both may be different for each task)
  - NOTE: These two must be compatible, or we'll raise an error. Compatibility is defined as in the merge_type_filters function - ref Use workflowtask.task.input_types for filtering, rather than for validation #2189

Which filters are produced/updated within the runner

After a task has run, there is a single possible source for updates to the dataset type_filters, namely the workflowtask.task.output_types. This is typically set in the task-package manifest.

tcompa · 2025-01-21T09:13:53Z

We edited the comment above (#2182 (comment)) as per the latest discussions and updates.

@jluethi: a review would be useful (this can go together with actual tests on the staging server). A minor point which is still to be confirmed is the one in the "Valid attribute filters / unusual cases" section.

(cc @ychiucco)

jluethi · 2025-01-21T14:16:46Z

@tcompa Agreed to all! Looks good to me. I'll review additional type filter questions on that issue separately

attribute_filters={"key1": []}

I have no strong opinion on this. In general, we should not allow people to actually set such filters in the interface.

attribute_filters={"key1": None, "key2": ["A", "B"]}
...
Thus I'd be in favor of deprecating this option.

For job attribute filters, we can remove this without an issue. Given that the only source for defaults for job attribute filters are the dataset attribute filters, I don't see a use for keeping the support for None = unset the filter anymore as well.
Agreed on deprecating it.

{"key1": 1}
because the value is not a list

That was the old datastructure, right? As long as we transition over all existing dataset attribute filters that may have been {"key1": 1} to {"key1": [1]}, I'm fine with that. I don't think they will have been used very often.

tcompa · 2025-01-21T14:26:29Z

{"key1": 1}
because the value is not a list

That was the old datastructure, right?

Yes.

As long as we transition over all existing dataset attribute filters that may have been {"key1": 1} to {"key1": [1]}, I'm fine with that.

Yes, this is part of a data-migration script for v2.11.0

tcompa · 2025-01-28T11:05:56Z

This was a recap issue for 2.11 work. Closing.

tcompa added the flexibility Support more workflow-execution use cases label Jan 15, 2025

jluethi added this to Fractal Project Management Jan 15, 2025

github-project-automation bot moved this to TODO in Fractal Project Management Jan 15, 2025

tcompa mentioned this issue Jan 16, 2025

Forbid attribute_filters={"key": None} case #2186

Closed

This was referenced Jan 21, 2025

Make sure that attribute_filters cannot be set to {"key": []} fractal-analytics-platform/fractal-web#700

Closed

Forbid attribute_filters={"key1": []} #2200

Closed

tcompa closed this as completed Jan 28, 2025

github-project-automation bot moved this from TODO to Done in Fractal Project Management Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex-filters recap #2182

Complex-filters recap #2182

tcompa commented Jan 15, 2025

tcompa commented Jan 16, 2025 •

edited

Loading

tcompa commented Jan 21, 2025

jluethi commented Jan 21, 2025

tcompa commented Jan 21, 2025

tcompa commented Jan 28, 2025

Complex-filters recap #2182

Complex-filters recap #2182

Comments

tcompa commented Jan 15, 2025

tcompa commented Jan 16, 2025 • edited Loading

Which objects have filters

Valid attribute filters / typical cases

Valid attribute filters / unusual cases (TBD)

Invalid attribute filters

Net effect of running a job on dataset filters

Which filters are consumed within the runner

Which filters are produced/updated within the runner

tcompa commented Jan 21, 2025

jluethi commented Jan 21, 2025

tcompa commented Jan 21, 2025

tcompa commented Jan 28, 2025

tcompa commented Jan 16, 2025 •

edited

Loading