Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tasks from setting an output attribute filter #2170

Open
tcompa opened this issue Jan 8, 2025 · 5 comments · Fixed by #2168
Open

Prevent tasks from setting an output attribute filter #2170

tcompa opened this issue Jan 8, 2025 · 5 comments · Fixed by #2168
Labels
flexibility Support more workflow-execution use cases runner

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jan 8, 2025

Job attribute filters always take priority over dataset attribute filters. The dataset attribute filters are replaced by the job filters.

What if a task sets an output attribute filter? Do we even allow that? We probably should not


Types: First set by datasets type filters. Overwritten by workflow task input type filter. Workflow task output can set new types that overwrite both before.

Originally posted by @jluethi in #2155 (comment)

@tcompa tcompa added flexibility Support more workflow-execution use cases runner labels Jan 8, 2025
@tcompa
Copy link
Collaborator Author

tcompa commented Jan 8, 2025

I confirm that setting both origin and attributes/plate in the output is valid, and the value which will take priority is the one in attributes/plate.

Originally posted by @tcompa in fractal-analytics-platform/fractal-tasks-core#866 (comment)

@jluethi
Copy link
Collaborator

jluethi commented Jan 8, 2025

We should certainly still allow tasks to set new attributes to be added to the image list (like it's done in the projection task). I still think it probably makes sense that tasks can't set attribute filters, given what we have planned for the attributes.

I'm not aware of any task that would set its own attribute filters so far. The MIP task uses a is_3D=True type filter and sets it to is_3D=False after it ran. The plate attribute isn't used for filtering in standard workflows

@tcompa
Copy link
Collaborator Author

tcompa commented Jan 8, 2025

You are right, thanks. I opened the issue as a self-reminder, before reviewing the current code, and I mixed up attributes and attribute filters.

I'm not aware of any task that would set its own attribute filters so far.

I confirm that this is the case (at least for for fractal-tasks-core, scmultiplex and fractal-faim-ipa).

@tcompa tcompa changed the title What if a task sets an output attribute filter? Prevent tasks from setting an output attribute filter Jan 8, 2025
@tcompa tcompa mentioned this issue Jan 8, 2025
3 tasks
@tcompa
Copy link
Collaborator Author

tcompa commented Jan 13, 2025

Here a more systematic check - for these repositories

$ cat repositories.txt 
https://github.com/fractal-analytics-platform/fractal-tasks-core
https://github.com/fractal-analytics-platform/fractal-faim-ipa
https://github.com/leukemia-kispi/operetta-compose
https://github.com/fractal-analytics-platform/fractal-lif-converters
https://github.com/fractal-analytics-platform/fractal-helper-tasks
https://github.com/fmi-basel/gliberal-scMultipleX
https://github.com/m-albert/fractal-ome-zarr-hcs-stitching
https://github.com/pelkmanslab/abbott

with this script

#!/bin/bash


CURRENT_FOLDER=$(pwd)
for REPO in $(cat repositories.txt); do
    echo "$REPO"
    git clone $REPO tmpclone --quiet
    cd tmpclone
    git grep filters | grep "\.py"
    cd $CURRENT_FOLDER
    rm -rf tmpclone
    echo
done

Output:

https://github.com/fractal-analytics-platform/fractal-tasks-core
fractal_tasks_core/cellvoyager/metadata.py:    # and are included in the filters (see issue #287)
poetry.lock:    {file = "pandocfilters-1.5.1-py2.py3-none-any.whl", hash = "sha256:93be382804a9cdb0a7267585f157e5d1731bbe5545a85b268d6f5fe6232de2bc"},

https://github.com/fractal-analytics-platform/fractal-faim-ipa

https://github.com/leukemia-kispi/operetta-compose

https://github.com/fractal-analytics-platform/fractal-lif-converters

https://github.com/fractal-analytics-platform/fractal-helper-tasks
src/fractal_helper_tasks/convert_2D_segmentation_to_3D.py:        filters=dict(
src/fractal_helper_tasks/rechunk_zarr.py:            filters=dict(types=dict(rechunked=True)),
tests/test_rechunk_zarr.py:        filters=dict(types=dict(rechunked=True)),

https://github.com/fmi-basel/gliberal-scMultipleX
src/scmultiplex/aics_shparam/shtools.py:from skimage import filters as skfilters
src/scmultiplex/aics_shparam/shtools.py:            img = skfilters.gaussian(img.astype(np.float32), sigma=sigma_scaled)
src/scmultiplex/aics_shparam/shtools.py:            img = skfilters.gaussian(
src/scmultiplex/meshing/LabelFusionFunctions.py:from skimage.filters import gaussian, threshold_otsu

https://github.com/m-albert/fractal-ome-zarr-hcs-stitching

https://github.com/pelkmanslab/abbott

TLDR

src/fractal_helper_tasks/rechunk_zarr.py does set filters=dict(types=dict(rechunked=True)), but that's the only known example.

@jluethi
Copy link
Collaborator

jluethi commented Jan 13, 2025

Makes sense. I guess most tasks that set output filters do this via manifest settings. The reason I did not do this with rechunking is that the rechunked type is a bit of an edge-case. And I'm only setting it if input data is not overwritten (=> it's not written in all cases).

@tcompa tcompa linked a pull request Jan 15, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flexibility Support more workflow-execution use cases runner
Projects
Development

Successfully merging a pull request may close this issue.

2 participants