Skip to content
Snippets Groups Projects
roadmap.org 4.93 KiB

ALIBY roadmap

Overview of potential improvements, goals, issues and other thoughts worth keeping in the repository. In general, it is things that the original developer would have liked to implement had there been enough time.

General goals

  • Simplify code base
  • Reduce dependency on BABY
  • Abstract components beyond cells
  • Implement multiple
  • Enable providing metadata defaults
  • (Relevant to BABY): Migrate aliby-baby to Pytorch from Keras. Immediately after upgrade h5py to the latest version (we are stuck in 2.10.0 due to Keras).

Long-term tasks

  • Split segmentation, tracking and lineage into independent Steps
  • Implement the pipeline as an acyclic graph
  • Isolate lineage and tracking into a section of aliby or an independent package
  • Abstract cells into “ROIs” or “Outlines”
  • Abstract lineage into “Outline relationships” (this may help study cell-to-cell interactions in the future)
  • Support external segmentation/tracking/lineage/processing tools
  • Make live cell processing great again!
  • Add support to next generation microscopy formats.

Potential features

  • Flat field correction (requires research on what is the best way to do it)
  • Support for monotiles (e.g., agarose pads)
  • Support the user providing location of tiles (could be a GUI in which the user selects a region)
  • Support multiple neural networks (e.g., vacuole/nucleus in adition to cell segmentation)

Potential CLI(+matplotlib) interfaces

The fastest way to get a gui-like interface is by using matplotlib as a panel to update and read keyboard inputs to interact with the data. All of this can be done within matplotlib in a few hundreds of line of code.

  • Annotate intracellular contents
  • Interface to adjust the parameters for calibration
  • Basic selection of region of interest in a per-position basis

Sections in need of refactoring

Extraction

Extraction could easily increase its processing speed. Most of the code was not originally written using casting and vectorised operations.

  • Reducing the use of python loops to the minimum
  • Replacing nested functions with functional mappings (extraction be faster and clearer with a functional programming approach)
  • Replacing the tree with a set of tuples and delegating processing order to dask. Dask can produce its own internal tree and optimise the order of rendering the tree unnecessary

Postprocessing.

  • Clarify the limits of picking and merging classes: These are temporal procedures; in the future segmentation should become more accurate, making picking Picker redundant; better tracking/lineage assignemnt will make merging redundant.
  • Formalise how lineage and reshaper processes are handled
  • Non-destructive postprocessing. The way postprocessing is done is destructive at the moment. If we aim to perform more complex data analysis automatically an implementation of complementary and tractable sub-pipelines is essential.
  • Functionalise parameter-process schema. This schema provides a decent structure, but it requires a lot of boilerplate code. To transition the best option is probably a function that converts Process classes into a function, and another that extracts default values from a Parameters class. This could in theory replace most Process-Parameters pairs. Lineage functions will pose a problem and a common interface to get lineage or outline-to-outline relationships demands to be engineered.

Compiler/Reporter

  • Remove compiler step, and focus on designing an adequate report, then build it straight after postprocessing ends.

Writers/Readers

  • Consider storing signals that are similar (e.g., signals arising from each channel) in a single multidimensional array to save storage space.

Pipeline

Pipeline is in dire need of refactoring, as it coordinates too many things. The best approach would be to modify the structure to delegate more responsibilities to Steps (such as validation) and Writers (such as writing metadata).

Testing

  • I/O interfaces
  • Visualisation helpers and other functions
  • Running one pipeline from another

Documentation

  • Tutorials and how-to for the usual tasks
  • How to deal with different types of data
  • How to aggregate data from multiple experiments
  • Contribution guidelines (after developing some)

Tools/alternatives that may be worth considering for the future

  • trio/asyncio/anyio for concurrent processing of individual threads
  • Pandas -> Polars: Reconsider after pandas 2.0; they will become interoperable
  • awkward arrays: Better way to represent
  • h5py -> zarr: OME-ZARR format is out now, it is possible that the field will move in that direction. This would also make us being stuck in h5py 2.10.0 less egregious.
  • Use CellACDC’s work on producing a common interface to access a multitude of segmentation algorithms.

Secrets in the code

  • As aliby is adapted to future Python versions, keep up with the “FUTURE” statements that enunciate how code can be improved in new python version
  • Track FIXMEs and, if we cannot solve them immediately, open an associated issue