Remove the need for `imzml` and poslog files for core cropping #28

alex-l-kong · 2024-12-02T21:52:23Z

Relevant background

The existing pipeline requires the specification of a poslog file for core cropping. This can be difficult because:

If multiple TMAs are combined, the poslogs have to be specified in the order of acquisition
a. This won't be an issue after the pyTDFSDK workflow is merged, but the other points still stand
Users have to specify the full path to each poslog
Having to load in the imzml file again will be difficult, especially once the core cropping step gets separated into a different notebook

The ark-analysis repo already uses labeling algorithms provided by scikit-image to accomplish this without the need of an external coordinate list. A similar workflow can be added into the maldi-pipeline repo.

Design overview

After the core list has been created using TSAI, the skimage.measure.label function can be used to automatically segment out the cores and their corresponding coordinates. For each unique label which corresponds to a core, we can then extract the coordinates out from the array and match it to the core centroid found in the TSAI list.

Note that the "blank centroid" issue still needs to be addressed, although the existing logic can be ported over to this new workflow.

Code mockup

The following code written by @kxleow provides a template we can use:

import json
from pathlib import Path
from typing import List, Tuple

from alpineer.io_utils import validate_paths
from skimage.io import imread, imsave
from skimage.measure import label, RegionProperties, regionprops

def match_centroids_to_cores(
    glycan_crop_save_dir: Union[str, Path],
    glycan_mask_path: Union[str, Path],
    centroid_core_mapping: Union[str, Path]
)
validate_paths([glycan_mask_path, centroid_core_mapping])
if not os.path.exists(glycan_crop_save_dir):
    os.makedirs(glycan_crop_save_dir)

glycan_mask: np.ndarray = imread(glycan_mask_path)

# Label connected components (4-connectivity)
labeled_image: np.ndarray = label(glycan_mask, connectivity=1, background=0)

# Count the components
num_labels: int = labeled_image.max()

# Get properties of labeled regions
regions: List[RegionProperties] = regionprops(labeled_image)

# Step 2: Load the JSON file
with open(centroid_path, "r") as infile:
    centroid_data: dict = json.load(infile)

# Prepare a mapping of FOV coordinates to their names
fov_mapping: Dict[Tuple, str] = {tuple(core["centerPointPixels"].values()): core["name"] for core in centroid_data["fovs"]}

# Step 3: Match FOV coordinates to regions
# NOTE: we should think about optimizing this section, since iterating through several coordinates adds extra complexity
component_to_fov: Dict[int, str] = {}
for region in regions:
    # Get the region's coordinates
    coords: List[Tuple[int, int]] = region.coords  # List of (row, col) pixels in the region

    for coord in coords:
        x, y = coord[1], coord[0]  # (col, row) in skimage
        if (x, y) in fov_mapping:
            component_to_fov[region.label] = fov_mapping[(x, y)]
            break  # Stop searching once we map this region
       else:
           # add logic to handle "blank centroid" case

# Create a renamed labeled image excluding unmatched regions
renamed_image: np.ndarray = np.zeros_like(labeled_image, dtype=int)
for region in regions:
    if region.label in component_to_fov:
        renamed_image[labeled_image == region.label] = region.label

# Step 4: Save individual FOVs as TIFFs
for region_label, fov_name in component_to_fov.items():
    # Create a binary mask for the specific region
    fov_mask: np.ndarray = (labeled_image == region_label).astype(np.uint8) * 255

    # Save the binary mask as a TIFF
    output_path: Path = glycan_crop_save_dir / f"{fov_name}".tiff
    imsave(output_path, fov_mask)

Required inputs

Same as before, except without the need for a poslog or an imzml file.

Output files

Same as before

Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

A couple days
A week
Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.

Estimated date when a fully implemented version will be ready for review:

Before the winter closure

Estimated date when the finalized project will be merged in:

Just after the winter closure

The text was updated successfully, but these errors were encountered:

alex-l-kong self-assigned this Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the need for `imzml` and poslog files for core cropping #28

Remove the need for `imzml` and poslog files for core cropping #28

alex-l-kong commented Dec 2, 2024

Remove the need for imzml and poslog files for core cropping #28

Remove the need for imzml and poslog files for core cropping #28

Comments

alex-l-kong commented Dec 2, 2024

Remove the need for `imzml` and poslog files for core cropping #28

Remove the need for `imzml` and poslog files for core cropping #28