Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Fractal task list page that uses the task metadata #871

Closed
jluethi opened this issue Nov 26, 2024 · 7 comments
Closed

New Fractal task list page that uses the task metadata #871

jluethi opened this issue Nov 26, 2024 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@jluethi
Copy link
Collaborator

jluethi commented Nov 26, 2024

With the work on task metadata progressing (see #853) and work on showing task packages in Fractal web starting (see fractal-analytics-platform/fractal-web#654), we should also start thinking about how we'll showcase what tasks are available on the web.

I'm not looking for a quick-fix here, but let's start the larger conversation on where we want this to go. We have a decent start with our current task overview page:

Screenshot 2024-11-26 at 15 36 27

It has some limitations:

  1. Only displays task names
  2. Only sorts tasks by package
  3. Requires us to manually add packages to the list

My long-term goal would be that we have a great overview page of all the available Fractal task which uses the richer metadata to be searchable, browsable etc. This page should be a starting point for "will Fractal be able to solve my image analysis problem".

This page doesn't need to live in tasks-core documentation. Maybe it's part of the Fractal splash page? Or we make a whole new page just dedicated to this?

Brainstorming, mocks & prototyping is welcome! We can also list more requirements here.

cc @zonia3000 @tcompa @lorenzocerrone

@jluethi jluethi added the documentation Improvements or additions to documentation label Nov 26, 2024
@jluethi jluethi changed the title New Fractal task list page that uses the metadata New Fractal task list page that uses the task metadata Nov 26, 2024
@tcompa
Copy link
Collaborator

tcompa commented Nov 27, 2024

Some comments

Data retrieval

There exist very different approaches to data retrieval for this page (sorted by increasing complexity):

  1. A read-only page, where only the Fractal team can add new entries. This boils down to:
    • Fetching structured data from known sources - which can be fully automated (up to the actual choice of sources);
    • Hosting the UI on a GitHub page.
    • Accepting contributions in the form of PRs that add one more URL to the GitHub repository of this landing page.
  2. An interactive page, where an (authenticated?) user can add a new entry.
  3. Any other scenario in which data are retrieved from non-public sources (e.g. each one of some "federated" Fractal instance regularly pushes its tasks data to the landing-page app). Also in this case we would need a web application.

Options 2 and 3 are small but full-fledged web application, with quite some additional complexity (hosting, authentication, ..)
Our opinion (with @mfranzon) is that they are not worth the effort. There would still be a way to contribute tasks, if some really wants to, and that would go through a PR that adds a manifest URL to a list of URLs in a public GitHub repository.

UI

Independently on the choice above, we can always offer a richer and more interactive visualization. The simplest example is likely a table with some kind sort/filter/show/hide features, or otherwise we could explore a different UI like https://bioimage.io.

Who should or should not consumes this page

A clear audience for this page is "whoever is interested in Fractal", either current or perspective users. This audience would visit the page directly, in their browser.

Once this page is built, it will be tempting to say "let's expose this information in the fractal frontend as well, so that a user knows which task packages are available and can install directly from there". My opinion is strongly against this approach, where a "lightweight" landing page then turns into a critical component of actual deployments. If we want to discuss about a public registry of Fractal tasks (to be integrated with Fractal components), then we should design it as such.

@jluethi
Copy link
Collaborator Author

jluethi commented Nov 27, 2024

One additional feature: Also displaying images for each task that are saved per task somehow

@jluethi
Copy link
Collaborator Author

jluethi commented Nov 27, 2024

Requires us to manually add packages to the list

Maybe more towards: Make it easy for Fractal admins to update e.g. a list of task packages that then get rendered nicely

@jluethi
Copy link
Collaborator Author

jluethi commented Nov 27, 2024

Once this page is built, it will be tempting to say "let's expose this information in the fractal frontend as well, so that a user knows which task packages are available and can install directly from there". My opinion is strongly against this approach, where a "lightweight" landing page then turns into a critical component of actual deployments. If we want to discuss about a public registry of Fractal tasks (to be integrated with Fractal components), then we should design it as such.

Discussion to be had on whether this is an end goal for this feature. Certainly not the initial goal.

@tcompa
Copy link
Collaborator

tcompa commented Nov 28, 2024

Here is a prototype: https://tcompa.github.io/test-table-github-page (it's a minimal-working example, with no attempt to make it "nice"). All work was with @mfranzon.

Data retrieval

Data are retrieved based on the following list of sources

# pypi projects:
fractal-tasks-core
fractal-faim-ipa
fractal-lif-converters
operetta-compose

# URL of a public wheel file (valid but not actually included)
# https://files.pythonhosted.org/packages/52/c1/7d2e6c17ee636404c3f99909b21b0a60b78391c2d553a1b941bc74fb056b/fractal_tasks_core-1.3.2-py3-none-any.whl

# GitHub-release zip files
https://github.com/fractal-analytics-platform/fractal-helper-tasks/archive/refs/tags/v0.1.1.zip
https://github.com/fmi-basel/gliberal-scMultipleX/archive/refs/tags/v0.7.9.zip
https://github.com/Apricot-Therapeutics/APx_fractal_task_collection/archive/refs/tags/0.3.21.zip
https://github.com/fractal-analytics-platform/fractal-plantseg-tasks/archive/refs/tags/0.1.2.zip
https://github.com/m-albert/fractal-ome-zarr-hcs-stitching/archive/refs/tags/v0.0.5.zip
https://github.com/fractal-analytics-platform/fractal-ilastik-tasks/archive/refs/tags/0.1.1.zip

The different kinds of sources have different properties:

  • PyPI projects are the best, because we can run the build GitHub action e.g. once per week and we will always have the latest package version listed in our page. Also: processing is simple and fast.
  • URLs of public wheels are also very easy/fast to process, but they correspond to a specific version. Updating a version means updating the URL manually.
  • GitHub-release zip files are the worst: they point to a specific version and they are slow/suboptimal to process (e.g. scmultiplex zip file is 44M large and is processed in 200 seconds, to be compared with "less than one second" for pypi/wheel sources). Also: it's cumbersome to retrieve the version field, because it's defined in the way GitHub names the zip file.

There is one more data source that we could enable, which is the URL of a manifest. That one is very easy to parse, but it carries no information about the package version. For instance the manifest which is in main for a given repository may have changes with respect to latest release.

My opinion here is that we encourage/nudge task developers to publish at least some wheels (or even on pypi) - and provide all kind of help and support to make it easy.

UI

The UI is admittedly very basic, but it's good as a proof of concept for how we would display data.
I think it's not worth polishing this UI until we take a decision on whether it's going to be a table-based one or something different like https://bioimage.io.

Attributes

I only included the basic attributes that we already have. We can obviously expand this list, and find the best possible ways of displaying additional information. For each required attribute, I would first need to know:

  1. Its name
  2. Its format (a string, a markdown string, an image, a video, ...)
  3. Where is it defined/stored? Examples:
    • It's part of the package manifest
    • It's available at a given public URL
    • It's defined as part of the logic for creating the task-list landing page
    • ...

@jluethi
Copy link
Collaborator Author

jluethi commented Nov 29, 2024

@tcompa Awesome to see this experimentation with it. Even this table surely is already very useful!

My opinion here is that we encourage/nudge task developers to publish at least some wheels (or even on pypi) - and provide all kind of help and support to make it easy.

+1 on this in general, this addition to the template will help with this: fractal-analytics-platform/fractal-tasks-template#18
Let's keep an eye on whether that's always a valid option @lorenzocerrone for more complex tasks like bioformats, plantseg & ilastik.

If we need to update the listing for specific versions, that seems very suboptimal. But I can see the "manifest in main is different from released version" can be a potential issue.

We can obviously expand this list, and find the best possible ways of displaying additional information

The initial things I'm thinking about:
a) Showing docs info (markdown => maybe with images in the future?)
b) Showing future short_info (just a string?)
c) input_types

@jluethi
Copy link
Collaborator Author

jluethi commented Dec 13, 2024

I'd say the recent additions to https://github.com/fractal-analytics-platform/fractal-analytics-platform.github.io and the now available https://fractal-analytics-platform.github.io/fractal_tasks/ show that this issue can be successfully closed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

No branches or pull requests

2 participants