Switching nuc_est_conv and max projection order

Summary

We want to try investigating whether swapping operations in computing nuc_est_conv across z-stacks improves identification of protein localisation.

Current behaviour/setbacks

nuc_est_conv isn't computed by default during extraction.

Desired behaviour/advantages

Compute nuc_est_conv as an additional measure for an experiment of interest. Then go through the usual extraction and post-processing routines.
Investigate whether (a) finding max projection across z-stacks then computing nucEstConv or (b) computing nucEstConv for each z-stick then finding max projection across the time series does better in terms of identifying protein localisation changes.

Also see https://www.wiki.ed.ac.uk/display/SWAIN/z-stacks+and+nucEstConv -- which suggests that swapping the order may improve things. However, this was based on the MATLAB version of the image segmentation & analysis pipeline.

Implementation sketch

I will split this into two parts based on the two parts in 'Desired behaviour/advantages'.

Part 1: computing nuc_est_conv

I have identified 3 options. These options are not mutually exclusive -- the solution may well be a combination of all three.

Option 1: Add nuc_est_conv as a default measure in extraction

How: uncomment line 38 in https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/extraction/core/functions/defaults.py, then run whole pipeline again on desired experiment.

Pros: easy, takes literally 3 seconds to implement

Cons: re-segmenting takes time and may not be desired if cell outlines have already been identified. We may also have to re-do this with multiple experiments, making the data output inconsistent between experiments. Discussion: Do we want to re-integrate nuc_est_conv permanently into the pipeline? @amuoz commented the measure cd1b134e, but no reason was given.

Option 2: Define an Extractor object, adding nuc_est_conv as part of parameters, and re-extract images that have cell outlines already defined

How: Specify parameters by defining a ExtractorParameters object (https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/extraction/core/extractor.py), adding nuc_est_conv as a measure in addition to the existing defaults. Then define an Extractor object (https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/extraction/core/extractor.py) with these parameters. Use this object, take the images and parameters as arguments, and re-do extraction (or perhaps only the nuc_est_conv part). New information should be written to the HDF5 file. Then, post-processing can be run again.

Pros: This is the ideal case. This should require the least resources and is the least redundant way to solve the problem. Plus, it takes advantage of aliby's modularity and parameters-process paradigm.

Cons:

Discussion: Arin has attempted to do this, but has struggled to find the method within Extractor to achieve this. His attempt was based on https://git.ecdf.ed.ac.uk/swain-lab/aliby/skeletons/-/blob/master/notebooks/4.%20Re-postprocessing.ipynb, but apparently PostProcessor and Extractor objects are structured in quite different ways. Here is a sketch:

import h5py
from pathlib import Path
folder = Path("/home/jupyter-arin/data/23174_2022_03_25_flavin_htb2_glucose_limitation_hard_delft_04_02")
from aliby.pipeline import PipelineParameters, Pipeline

pipeline_params = PipelineParameters.default(
    general={ 
        "expt_id": 23174, # should match the experiment so that channels match
        "distributed": 10, # doesn't matter
        "server_info": {
            "host": *****,
            "username": *****,
            "password": *****,
        },
    },
)

extractor_params_dict = pipeline_params.to_dict()['extraction']
extractor_params_dict['tree']['mCherry']['np_max'].update({'nuc_est_conv'})
from extraction.core.extractor import ExtractorParameters, Extractor
from pathos.multiprocessing import Pool

def extract_file(filepath):
    try:
        with h5py.File(filepath, "a") as f:
            if "extraction" in f:
                del f["/extraction"]
            extractor = Extractor(
                ExtractorParameters.from_dict(extractor_params_dict), filepath)
            extractor.run()
            print(filepath, " PASSED\n")

    except Exception as e:
        print(filepath, " FAILED\n")
        print(e)

with Pool(1) as p:
    results = p.map(
        lambda x: extract_file(x), Path(folder).rglob("*.h5")
    )

which currently fails.

Option 3: Use nuc_est_conv function on its own and use that on images.

How: Import it from https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/extraction/core/functions/custom/localisation.py

Pros:

Cons: Doesn't take advantage of how things are organised in aliby.

Part 2: find max projection

The original method should be implemented within nuc_conv_3d in https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/extraction/core/functions/custom/localisation.py.

The alternative method should be:

Assuming that nuc_est_conv is computed separately for each z-stack, we just need to call numpy.max on the outputs from each.

Then the results from each method can be plotted and thus compared.

Edited Aug 31, 2022 by Arin Wongprommoon