Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data structure for filters #2153

Open
tcompa opened this issue Dec 19, 2024 · 2 comments · Fixed by #2168
Open

Data structure for filters #2153

tcompa opened this issue Dec 19, 2024 · 2 comments · Fixed by #2168
Assignees
Labels
flexibility Support more workflow-execution use cases

Comments

@tcompa
Copy link
Collaborator

tcompa commented Dec 19, 2024

As of fractal-analytics-platform/fractal-web#678 (comment), we are not anymore defining attribute filters for workflow tasks. But we still have type filters. On the other hand, we are introducing attribute filters for jobs, but not type filters. This would be a validation layer which is not defined in the database schema but rather in the application/API schemas, which is a scenario we always try to avoid (it's more complex and error prone, and it requires redundant definitions).

Our suggestion (with @mfranzon) is that we should also split the data structure in two parts, attributes and types, rather than having single homogeneous filters objects + validation.

We describe the different options below, and suggest that we pick the third (or even fourth) one.

Notes:

  • We describe the different options under the assumption that we don't need the "exclude" field any more (see Drop attributes_exclude and keep attributes_include #2154), but this is not a critical point.
  • Task input_types are not used for filtering, but rather to validate the image list after filtering took place. Thus they are not related to the current discussion, and they would not require any change.

1. Current situation

DatasetV2.filters = {"types": {"is_3D": false}, "attributes": {"well": "B03"}}
WorkflowTaskV2.intput_filters: {"types": {"is_3D": false}, "attributes": {"well": "B03"}}

2. Original plan for complex filters

DatasetV2.filters = {"types": {"is_3D": false}, "attributes": {"well": ["B03"]}}   # note that we now have a list of possible wells
WorkflowTaskV2.input_filters: {"types": {"is_3D": false}}                          # note that here we only accept "types"
JobV2.filters = {"attributes": {"well": ["B03"]}}                                  # note that here we only accept "attributes"

3. Improved plan for complex filters

DatasetV2.type_filters = {"is_3D": false}
DatasetV2.attribute_filters = {"well": ["B03"]}
WorkflowTaskV2.type_filters: {"is_3D": false}
JobV2.attribute_filters = {"well": ["B03"]}}

At the cost of one additional field, we are removing one level of nesting from all these JSON objects.

4. Improved plan for complex filters / without attribute filters for dataset

To discuss: are dataset attribute filters actually relevant? If not, we could go even further and have

DatasetV2.type_filters = {"is_3D": false}
WorkflowTaskV2.type_filters: {"is_3D": false}
JobV2.attribute_filters = {"well": "B03"}}
@jluethi
Copy link
Collaborator

jluethi commented Dec 19, 2024

I'd be in favor of 3. I do think there is likely a point for dataset attribute filters.

@tcompa tcompa changed the title To discuss: Data structure for filters Data structure for filters Jan 7, 2025
@tcompa
Copy link
Collaborator Author

tcompa commented Jan 7, 2025

Recap:

  1. New columns:

DatasetV2.type_filters = {"is_3D": false}
DatasetV2.attribute_filters = {"well": ["B03"]}
WorkflowTaskV2.type_filters: {"is_3D": false}
JobV2.attribute_filters = {"well": ["B03"]}}

  1. Old columns cannot be removed from db models until the new release is deployed - because they are needed for the data migration.
  2. Old columns should be removed from API schemas.
  3. This then must be implemented in the runner as well - see upcoming issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flexibility Support more workflow-execution use cases
Projects
Development

Successfully merging a pull request may close this issue.

3 participants