Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate MALDI extraction pipeline #15

Open
8 of 26 tasks
alex-l-kong opened this issue Oct 24, 2023 · 0 comments
Open
8 of 26 tasks

Automate MALDI extraction pipeline #15

alex-l-kong opened this issue Oct 24, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@alex-l-kong
Copy link
Contributor

alex-l-kong commented Oct 24, 2023

Relevant background

The MALDI pipeline currently involves much manual work on the user's end. Much of the pipeline can be automated to make the overall process more efficient.

Design overview

The components that can be automated are:

  • Pre-extraction:
    • Automatic generation of input masks for MALDI acquisition (Automatic generation of input masks for MALDI acqusition #13): bypasses manual drawing around cores in FlexImaging. Frankie thinks better contrasting prior may mitigate issues with FlexImaging Magic Wand tool.
    • Automatic .imzml and .ibd file generation (Automate .imzml extraction using timsconvert package #14): bypasses SCiLS bottleneck, 5x faster using timsconvert CLI (supports normalization, if needed)
      • Combine spectra across multiple slides (timsconvert or matchms?)
    • Ensure large .imzml files (about 1 TB) can be loaded using pyimzml.ImzMLParser (will most likely entail combining multiple ImzMLParser objects together)
  • Extraction
    • Restructure data directory to support extraction of multiple slides in MALDI notebook Data Directory Restructure #16
    • Ensure extracted signal per spot is done using area under the curve, not raw intensity
    • Glycan peaks (investigation)
      • Combine extracted glycan peaks between multiple slides (potentially matchms?)
      • Identify correct peak for glycan in case of multiple matching
    • Integrate spatialdata and napari for visualizing the MALDI cohorts
  • Post-extraction
    • Low-level clustering for masking out matrix regions outside the core (Cardinal, R package)
    • Automatic identification of RNCM positions on TMA cores (Automatic identification of RNCM position for TMA cores #12):
      • Use Albert tiler for naming (actual tiling already implemented, maybe code needed for generating input .json file)
      • Use .imzml file to locate pixels on cores for cropping out
  • Repo health
  • Investigation
    • Slide-to-slide batch effects
    • Implement co-registration of MALDI images with HNE slides (napari plugin: napari-imsmicrolink)

Code mockup

Please refer to the linked issues for more detail.

Required inputs

The same inputs will be needed.

Output files

The same output files will be produced.

Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

  • A couple days
  • A week
  • Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.

Estimated date when a fully implemented version will be ready for review:

TBD

Estimated date when the finalized project will be merged in:

TBD

@srivarra srivarra added the enhancement New feature or request label Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants