Compare revisions

1ac94283 · 51480977 · 97528dd7 · 033c11b1 · e770d48f · 605bd7c0
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -67,7 +67,7 @@ Static Type:
    - poetry run mypy . --exclude 'setup\.py$'
    # We can remove the flag once this is resolved https://github.com/pypa/setuptools/issues/2345

-# TODO add more tests before activate automatic release
+# TODO add more tests before activating auto-release
 # Bump_release:
 #   stage: release
 #   script:

--- a/.gitlab/issue_templates/bug.md
+++ b/.gitlab/issue_templates/bug.md
 ## Summary

-(Summarize the bug encountered concisely)
+{Summarize the bug encountered concisely}
+
+I confirm that I have (if relevant):
+- [ ] Read the troubleshooting guide: https://gitlab.com/aliby/aliby/-/wikis/Troubleshooting-(basic)
+- [ ] Updated aliby and aliby-baby.
+- [ ] Tried the unit test.
+- [ ] Tried a scaled-down version of my experiment (distributed=0, filter=0, tps=10)
+- [ ] Tried re-postprocessing.

 ## Steps to reproduce

-(How one can reproduce the issue - this is very important)
+{How one can reproduce the issue - this is very important}
+
+- aliby version: 0.1.{...}, or if development/unreleased version, commit SHA: {...}
+- platform(s):
+    - [ ] Jura
+    - [ ] Other Linux, please specify distribution and version: {...}
+    - [ ] MacOS, please specify version: {...}
+    - [ ] Windows, please specify version: {...}
+- experiment ID: {...}
+   - Any special things you need to know about this experiment: {...}

 ## What is the current bug behavior?

@@ -19,6 +35,12 @@
 (Paste any relevant logs - please use code blocks (```) to format console output, logs, and code, as
 it's very hard to read otherwise.)

+```
+{PASTE YOUR ERROR MESSAGE HERE!!}
+
+
+```
+
 ## Possible fixes

 (If you can, link to the line of code that might be responsible for the problem)
--- a/README.md
+++ b/README.md
@@ -11,15 +11,18 @@ End-to-end processing of cell microscopy time-lapses. ALIBY automates segmentati
 ## Quickstart Documentation
 Installation of [VS Studio](https://visualstudio.microsoft.com/downloads/#microsoft-visual-c-redistributable-for-visual-studio-2022) Native MacOS support for is under work, but you can use containers (e.g., Docker, Podman) in the meantime.

-For analysing local data
+To analyse local data
 ```bash
-pip install aliby # aliby[network] if you want to access an OMERO server
+pip install aliby 
 ```
-
+ Add any of the optional flags `omero` and `utils` (e.g., `pip install aliby[omero, utils]`). `omero` provides tools to connect with an OMERO server and `utils` provides visualisation, user interface and additional deep learning tools.
+  
 See our [installation instructions]( https://aliby.readthedocs.io/en/latest/INSTALL.html ) for more details.

 ### CLI

+If installed via poetry, you have access to a Command Line Interface (CLI)
+
 ```bash
 aliby-run --expt_id EXPT_PATH --distributed 4 --tps None
 ```
@@ -31,14 +34,15 @@ And to run Omero servers, the basic arguments are shown:

 The output is a folder with the original logfiles and a set of hdf5 files, one with the results of each multidimensional inside.

+For more information, including available options, see the page on [running the analysis pipeline](https://aliby.readthedocs.io/en/latest/PIPELINE.html)
+
 ## Using specific components

 ### Access raw data

-ALIBY's tooling can also be used as an interface to OMERO servers, taking care of fetching data when needed.
+ALIBY's tooling can also be used as an interface to OMERO servers, for example, to fetch a brightfield channel.
 ```python
-from aliby.io.dataset import Dataset
-from aliby.io.image import Image
+from aliby.io.omero import Dataset, Image

 server_info= {
            "host": "host_address",
@@ -73,27 +77,33 @@ in time.

 It fetches the metadata from the Image object, and uses the TilerParameters values (all Processes in aliby depend on an associated Parameters class, which is in essence a dictionary turned into a class.)

-#### Get a timelapse for a given trap
+#### Get a timelapse for a given tile (remote connection)
 ```python
 fpath = "h5/location"

-trap_id = 9
-trange = list(range(0, 30))
+tile_id = 9
+trange = range(0, 10)
 ncols = 8

 riv = remoteImageViewer(fpath)
-trap_tps = riv.get_trap_timepoints(trap_id, trange, ncols)
+trap_tps = [riv.tiler.get_tiles_timepoint(tile_id, t) for t in trange] 
+
+# You can also access labelled traps
+m_ts = riv.get_labelled_trap(tile_id=0, tps=[0])
+
+# And plot them directly
+riv.plot_labelled_trap(trap_id=0, channels=[0, 1, 2, 3], trange=range(10))
 ```

-This can take several seconds at the moment.
+Depending on the network speed can take several seconds at the moment.
 For a speed-up: take fewer z-positions if you can.

-#### Get the traps for a given time point
+#### Get the tiles for a given time point
 Alternatively, if you want to get all the traps at a given timepoint:

 ```python
-timepoint = 0
-seg_expt.get_tiles_timepoints(timepoint, tile_size=96, channels=None,
+timepoint = (4,6)
+tiler.get_tiles_timepoint(timepoint, channels=None,
                                z=[0,1,2,3,4])
 ```


--- a/docs/source/INSTALL.md
+++ b/docs/source/INSTALL.md
@@ -62,7 +62,7 @@ For Windows, the simplest way to install it is using conda (or mamba). You can i
    $ \PATH\TO\POETRY\LOCATION\poetry install

  - MacOS
-  Under work (See issue https://github.com/ome/omero-py/issues/317)
+  For local access and processing, follow the same instructions as Linux. Remote access to OMERO servers depends on some issues in one of our depedencies being solved (See issue https://github.com/ome/omero-py/issues/317)

 ### Git version

@@ -71,9 +71,23 @@ Install [ poetry ](https://python-poetry.org/docs/#installation) for dependency
 In case you want to have local version:

    $ git clone git@gitlab.com/aliby/aliby.git
-    $ cd aliby && poetry install --all-extras
+    $ cd aliby
+    
+ and then either

-This will automatically install the [ BABY ](https://gitlab.com/aliby/baby) segmentation software. Support for additional segmentation and tracking algorithms is under development.
+    $$ poetry install --all-extras
+
+for everything, including tools to access OMERO servers, or
+
+    $$ poetry install
+
+for a version with only local access, or
+
+    $$ poetry install --with dev
+
+to install with compatible versions of the development tools we use, such as black.
+
+These commands will automatically install the [ BABY ](https://gitlab.com/aliby/baby) segmentation software. Support for additional segmentation and tracking algorithms is under development.

 ## Omero Server

@@ -111,3 +125,45 @@ docker-compose stop
 Segmentation has been tested on: Mac OSX Mojave, Ubuntu 20.04 and Arch Linux.
 Data processing has been tested on all the above and Windows 11.

+### Detailed Windows installation
+#### Create environment
+Open anaconda powershell as administrator
+```shell  script
+conda create -n devaliby2 -c conda-forge python=3.8 omero-py
+conda activate devaliby2
+```
+
+#### Install poetry
+    You may have to specify the python executable to get this to work :
+```shell script
+(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | C:\Users\USERNAME\Anaconda3\envs\devaliby2\python.exe -
+
+```    Also specify full path when running poetry (there must be a way to sort this)
+
+- Clone the repository (Assuming you have ssh properly set up)
+```shell script
+git clone git@gitlab.com:aliby/aliby.git
+cd aliby
+poetry install --all-extras
+```
+
+You may need to run the full poetry path twice - first time gave an error message, worked second time
+
+```shell script
+C:\Users\v1iclar2\AppData\Roaming\Python\Scripts\poetry install --all-extras
+```
+
+confirm installation of aliby - python...import aliby - get no error message
+
+#### Access the virtual environment from the IDE (e.g., PyCharm)
+New project
+In location - navigate to the aliby folder (eg c::/Users/Public/Repos/aliby
+
+- Select the correct python interpreter
+click the interpreter name at the bottom right
+click add local interpreter
+on the left click conda environment
+click the 3 dots to the right of the interpreter path and navigate to the python executable from the environment created above (eg C:\Users\v1iclar2\Anaconda3\envs\devaliby2\python.exe)
+
+#### Potential Windows issues
+- Sometimes the library pywin32 gives trouble, just install it using pip or conda 
--- a/docs/source/PIPELINE.md
+++ b/docs/source/PIPELINE.md
+# Running the analysis pipeline
+
+You can run the analysis pipeline either via the command line interface (CLI) or using a script that incorporates the `aliby.pipeline.Pipeline` object.
+
+## CLI
+
+On a CLI, you can use the `aliby-run` command.  This command takes options as follows:
+- `--host`: Address of image-hosting server.
+- `--username`: Username to access image-hosting server.
+- `--password`: Password to access image-hosting server.
+- `--expt_id`: Number ID of experiment stored on host server.
+- `--distributed`: Number of distributed cores to use for segmentation and signal processing.  If 0, there is no parallelisation.
+- `--tps`: Optional.  Number of time points from the beginning of the experiment to use.  If not specified, the pipeline processes all time points.
+- `--directory`: Optional.  Parent directory to save the data files (HDF5) generated, `./data` by default; the files will be stored in a child directory whose name is the name of the experiment.
+- `--filter`: Optional.  List of positions to use for analysis.  Alternatively, a regex (regular expression) or list of regexes to search for positions.  **Note: for the CLI, currently it is not able to take a list of strings as input.**
+- `--overwrite`: Optional.  Whether to overwrite an existing data directory.  True by default.
+- `--override_meta`: Optional.  Whether to overwrite an existing data directory.  True by default.
+
+Example usage:
+ ```bash
+aliby-run --expt_id EXPT_PATH --distributed 4 --tps None
+ ```
+
+And to run Omero servers, the basic arguments are shown:
+ ```bash
+ aliby-run --expt_id XXX --host SERVER.ADDRESS --user USER --password PASSWORD 
+ ```
+
+
+## Script
+
+Use the `aliby.pipeline.Pipeline` object and supply a dictionary, following the example below.  The meaning of the parameters are the same as described in the CLI section above.
+
+```python
+#!/usr/bin/env python3
+
+from aliby.pipeline import Pipeline, PipelineParameters
+
+# Specify experiment IDs
+ids = [101, 102]
+
+for i in ids:
+    print(i)
+    try:
+        params = PipelineParameters.default(
+            # Create dictionary to define pipeline parameters.
+            general={
+                "expt_id": i,
+                "distributed": 6,
+                "host": "INSERT ADDRESS HERE",
+                "username": "INSERT USERNAME HERE",
+                "password": "INSERT PASSWORD HERE",
+                # Ensure data will be overwriten
+                "override_meta": True,
+                "overwrite": True,
+            }
+        )
+
+        # Fine-grained control beyond general parameters:
+        # change specific leaf in the extraction tree.
+        # This example tells the pipeline to additionally compute the
+        # nuc_est_conv quantity, which is a measure of the degree of
+        # localisation of a signal in a cell.
+        params = params.to_dict()
+        leaf_to_change = params["extraction"]["tree"]["GFP"]["np_max"]
+        leaf_to_change.add("nuc_est_conv")
+
+        # Regenerate PipelineParameters
+        p = Pipeline(PipelineParameters.from_dict(params))
+
+        # Run pipeline
+        p.run()
+        
+    # Error handling
+    except Exception as e:
+        print(e)
+```
+
+This example code can be the contents of a `run.py` file, and you can run it via
+
+```bash
+python run.py
+```
+
+in the appropriate virtual environment.
+
+Alternatively, the example code can be the contents of a cell in a jupyter notebook.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -4,21 +4,15 @@
   contain the root `toctree` directive.

 .. toctree::
-   :hidden:

   Home page <self>
   Installation <INSTALL.md>
+   Pipeline options <PIPELINE.md>
+   Contributing <CONTRIBUTING.md>
   ..
      Examples <examples.rst>
-   Reference <api.rst>
+      Reference <api.rst>
   ..
-      Contributing <CONTRIBUTING.md>
-   ..
-      ALIBY reference <_autosummary/aliby>
-      extraction reference <_autosummary/extraction>
-      agora reference <_autosummary/agora>
-      postprocessor reference <_autosummary/postprocessor>
-      logfile_parser reference <_autosummary/logfile_parser>

 .. include:: ../../README.md
   :parser: myst_parser.sphinx_
--- a/docs/source/specifications/io_dependencies.org
+++ b/docs/source/specifications/io_dependencies.org
+#+title: Input/Output Stage Dependencies
+
+Overview of what fields are required for each consecutive step to run, and
+
+- Registration
+- Tiler
+  - Requires:
+    - None
+  # - Optionally:
+  - Produces:
+    - /trap_info
+- Tiler
+  - Requires:
+    - None
+  - Produces:
+    - /trap_info
--- a/docs/source/specifications/roadmap.org
+++ b/docs/source/specifications/roadmap.org
+#+title: ALIBY roadmap
+
+Overview of potential improvements, goals, issues and other thoughts worth keeping in the repository. In general, it is things that the original developer would have liked to implement had there been enough time.
+
+* General goals
+- Simplify code base
+- Reduce dependency on BABY
+- Abstract components beyond cell outlines (i.e, vacuole, or other ROIs)
+- Enable providing metadata defaults (remove dependency of metadata)
+- (Relevant to BABY): Migrate aliby-baby to Pytorch from Keras. Immediately after upgrade h5py to the latest version (we are stuck in 2.10.0 due to Keras).
+
+* Long-term tasks (Soft Eng)
+- Support external segmentation/tracking/lineage/processing tools
+  - Split segmentation, tracking and lineage into independent Steps
+- Implement the pipeline as an acyclic graph
+- Isolate lineage and tracking into a section of aliby or an independent package
+- Abstract cells into "ROIs" or "Outlines"
+- Abstract lineage into "Outline relationships" (this may help study cell-to-cell interactions in the future)
+- Add support to next generation microscopy formats.
+- Make live cell processing great again! (low priority)
+
+* Potential features
+- Flat field correction (requires research on what is the best way to do it)
+- Support for monotiles (e.g., agarose pads)
+- Support the user providing location of tiles (could be a GUI in which the user selects a region)
+- Support multiple neural networks (e.g., vacuole/nucleus in adition to cell segmentation)
+- Use CellPose as a backup for accuracy-first pipelines
+
+* Potential CLI(+matplotlib) interfaces
+The fastest way to get a gui-like interface is by using matplotlib as a panel to update and read keyboard inputs to interact with the data. All of this can be done within matplotlib in a few hundreds of line of code.
+
+- Annotate intracellular contents
+- Interface to adjust the parameters for calibration
+- Basic selection of region of interest in a per-position basis
+
+* Sections in need of refactoring
+** Extraction
+Extraction could easily increase its processing speed. Most of the code was not originally written using casting and vectorised operations.
+- Reducing the use of python loops to the minimum
+- Replacing nested functions with functional mappings (extraction be faster and clearer with a functional programming approach)
+- Replacing the tree with a set of tuples and delegating processing order to dask.
+  Dask can produce its own internal tree and optimise the order of rendering the tree unnecessary
+
+** Postprocessing.
+- Clarify the limits of picking and merging classes: These are temporal procedures; in the future segmentation should become more accurate, making picking Picker redundant; better tracking/lineage assignemnt will make merging redundant.
+- Formalise how lineage and reshaper processes are handled
+- Non-destructive postprocessing.
+  The way postprocessing is done is destructive at the moment. If we aim to perform more complex data analysis automatically an implementation of complementary and tractable sub-pipelines is essential. (low priority, perhaps within scripts)
+- Functionalise parameter-process schema. This schema provides a decent structure, but it requires a lot of boilerplate code. To transition the best option is probably a function that converts Process classes into a function, and another that extracts default values from a Parameters class. This could in theory replace most Process-Parameters pairs. Lineage functions will pose a problem and a common interface to get lineage or outline-to-outline relationships demands to be engineered.
+
+** Compiler/Reporter
+- Remove compiler step, and focus on designing an adequate report, then build it straight after postprocessing ends.
+
+** Writers/Readers
+- Consider storing signals that are similar (e.g., signals arising from each channel) in a single multidimensional array to save storage space. (mid priority)
+- Refactor (Extraction/Postprocessing) Writer to use the DynamicWriter Abstract Base Class.
+
+** Pipeline
+Pipeline is in dire need of refactoring, as it coordinates too many things. The best approach would be to modify the structure to delegate more responsibilities to Steps (such as validation) and Writers (such as writing metadata).
+
+* Testing
+- I/O interfaces
+- Visualisation helpers and other functions
+- Running one pipeline from another
+- Groupers
+
+* Documentation
+- Tutorials and how-to for the usual tasks
+- How to deal with different types of data
+- How to aggregate data from multiple experiments
+- Contribution guidelines (after developing some)
+
+* Tools/alternatives that may be worth considering for the future
+- trio/asyncio/anyio for concurrent processing of individual threads
+- Pandas -> Polars: Reconsider after pandas 2.0; they will become interoperable
+- awkward arrays: Better way to represent data series with different sizes
+- h5py -> zarr: OME-ZARR format is out now, it is possible that the field will move in that direction. This would also make us being stuck in h5py 2.10.0 less egregious.
+- Use CellACDC's work on producing a common interface to access a multitude of segmentation algorithms.
+
+* Secrets in the code
+- As aliby is adapted to future Python versions, keep up with the "FUTURE" statements that enunciate how code can be improved in new python version
+- Track FIXMEs and, if we cannot solve them immediately, open an associated issue
+
+* Minor inconveniences to fix
+- Update CellTracker models by training with current scikit-learn (currently it warns that the models were trained in an older version of sklearn )
--- a/examples/parsers/swainlab_logfile_header_example.log
+++ b/examples/parsers/swainlab_logfile_header_example.log
+2022-10-10 15:31:27,350 - INFO 
+Swain Lab microscope experiment log file
+GIT commit: e5d5e33 fix: changes to a few issues with focus control on Batman.
+Microscope name: Batman
+Date: 022-10-10 15:31:27
+Log file path: D:\AcquisitionDataBatman\Swain Lab\Ivan\RAW DATA\2022\Oct\10-Oct-2022\pH_med_to_low00\pH_med_to_low.log
+Micromanager config file: C:\Users\Public\Microscope control\Micromanager config files\Batman_python_15_4_22.cfg
+Omero project: Default project
+Omero tags: 
+Experiment details: Effect on growth and cytoplasmic pH of switch from normal pH (4.25) media to higher pH (5.69). Switching is run using the Oxygen software
+-----Acquisition settings-----
+
+2022-10-10 15:31:27,350 - INFO Image Configs:
+Image config,Channel,Description,Exposure (ms), Number of Z sections,Z spacing (um),Sectioning method
+brightfield1,Brightfield,Default bright field config,30,5,0.6,PIFOC
+pHluorin405_0_4,pHluorin405,Phluorin excitation from 405 LED 0.4v and 10ms exposure,5,1,0.6,PIFOC
+pHluorin488_0_4,GFPFast,Phluorin excitation from 488 LED 0.4v,10,1,0.6,PIFOC
+cy5,cy5,Default cy5,30,1,0.6,PIFOC
+
+Device properties:
+Image config,device,property,value
+pHluorin405_0_4,DTOL-DAC-1,Volts,0.4
+pHluorin488_0_4,DTOL-DAC-2,Volts,0.4
+cy5,DTOL-DAC-3,Volts,4
+
+2022-10-10 15:31:27,353 - INFO 
+group: YST_247 field: position
+Name, X, Y, Z, Autofocus offset
+YST_247_001,-8968,-3319,2731.125040696934,123.25
+YST_247_002,-8953,-3091,2731.3000406995416,123.25
+YST_247_003,-8954,-2849,2731.600040704012,122.8
+YST_247_004,-8941,-2611,2730.7750406917185,122.8
+YST_247_005,-8697,-2541,2731.4500407017767,118.6
+group: YST_247 field: time
+start: 0
+interval: 300
+frames: 180
+
+group: YST_247 field: config
+brightfield1: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin405_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin488_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+cy5: 0xfffffffffffffffffffffffffffffffffffffffffffff
+
+2022-10-10 15:31:27,356 - INFO 
+group: YST_1510 field: position
+Name,X,Y,Z,Autofocus offset
+YST_1510_001,-6450,-230,2343.300034917891,112.55
+YST_1510_002,-6450,-436,2343.350034918636,112.55
+YST_1510_003,-6450,-639,2344.000034928322,116.8
+YST_1510_004,-6450,-831,2344.250034932047,116.8
+YST_1510_005,-6848,-536,2343.3250349182636,110
+group: YST_1510 field: time
+start: 0
+interval: 300
+frames: 180
+
+group: YST_1510 field: config
+brightfield1: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin405_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin488_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+cy5: 0xfffffffffffffffffffffffffffffffffffffffffffff
+2022-10-10 15:31:27,359 - INFO 
+group: YST_1511 field: position
+Name, X, Y, Z, Autofocus offset
+YST_1511_001,-10618,-1675,2716.900040484965,118.7
+YST_1511_002,-10618,-1914,2717.2250404898077,122.45
+YST_1511_003,-10367,-1695,2718.2500405050814,120.95
+YST_1511_004,-10367,-1937,2718.8250405136496,120.95
+YST_1511_005,-10092,-1757,2719.975040530786,119.45
+
+group: YST_1511 field: time
+start: 0
+interval: 300
+frames: 180
+
+group: YST_1511 field: config
+brightfield1: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin405_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin488_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+cy5: 0xfffffffffffffffffffffffffffffffffffffffffffff
+
+2022-10-10 15:31:27,362 - INFO 
+group: YST_1512 field: position
+Name,X,Y,Z,Autofocus offset
+YST_1512_001,-8173,-2510,2339.0750348549336,115.65
+YST_1512_002,-8173,-2718,2338.0250348392874,110.8
+YST_1512_003,-8173,-2963,2336.625034818426,110.8
+YST_1512_004,-8457,-2963,2336.350034814328,110.9
+YST_1512_005,-8481,-2706,2337.575034832582,113.3
+group: YST_1512 field: time
+start: 0
+interval: 300
+frames: 180
+
+group: YST_1512 field: config
+brightfield1: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin405_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin488_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+cy5: 0xfffffffffffffffffffffffffffffffffffffffffffff
+
+2022-10-10 15:31:27,365 - INFO 
+group: YST_1513 field: position
+Name,X,Y,Z,Autofocus offset
+YST_1513_001,-6978,-2596,2339.8750348668545,113.3
+YST_1513_002,-6978,-2380,2340.500034876168,113.3
+YST_1513_003,-6971,-2163,2340.8750348817557,113.3
+YST_1513_004,-6971,-1892,2341.2500348873436,113.3
+YST_1513_005,-6692,-1892,2341.550034891814,113.3
+group: YST_1513 field: time
+start: 0
+interval: 300
+frames: 180
+
+group: YST_1513 field: config
+brightfield1: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin405_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+pHluorin488_0_4: 0xfffffffffffffffffffffffffffffffffffffffffffff
+cy5: 0xfffffffffffffffffffffffffffffffffffffffffffff
+
+2022-10-10 15:31:27,365 - INFO 
+2022-10-10 15:31:27,365 - INFO 
+-----Experiment started-----
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
 [tool.poetry]
 name = "aliby"
-version = "0.1.52"
+version = "0.1.64"
 description = "Process and analyse live-cell imaging data"
 authors = ["Alan Munoz <alan.munoz@ed.ac.uk>"]
 packages = [
@@ -14,6 +14,8 @@ readme = "README.md"

 [tool.poetry.scripts]
 aliby-run = "aliby.bin.run:run"
+aliby-annotate = "aliby.bin.annotate:annotate"
+aliby-visualise = "aliby.bin.visualise:napari_overlay"

 [build-system]
 requires = ["setuptools", "poetry-core>=1.0.0"]
@@ -24,9 +26,10 @@ python = ">=3.8, <3.11"
 PyYAML = "^6.0"
 flatten-dict = "^0.4.2"
 gaussianprocessderivatives = "^0.1.5"
-h5py = "2.10" # File I/O
 numpy = ">=1.21.6"
-opencv-python = "4.1.2.30"
+Bottleneck = "^1.3.5"
+opencv-python = "^4.7.0.72"
+pathos = "^0.2.8" # Lambda-friendly multithreading
 p-tqdm = "^1.3.3"
 pandas = ">=1.3.3"
 py-find-1st = "^1.1.5" # Fast indexing
@@ -34,29 +37,32 @@ scikit-learn = ">=1.0.2" # Used for an extraction metric
 scipy = ">=1.7.3"

 # Pipeline + I/O
-aliby-baby = "^0.1.15"
 dask = "^2021.12.0"
 imageio = "2.8.0" # For image-visualisation utilities
 requests-toolbelt = "^0.9.1"
 scikit-image = ">=0.18.1"
 tqdm = "^4.62.3" # progress bars
 xmltodict = "^0.13.0" # read ome-tiff metadata
+zarr = "^2.14.0"
+GitPython = "^3.1.27"
+h5py = "2.10" # File I/O
+
+
+# Networking
+omero-py = { version = ">=5.6.2", optional = true } # contact omero server
+
+# Baby segmentation
+aliby-baby = {version = "^0.1.17", optional=true}

 # Postprocessing
+[tool.poetry.group.pp.dependencies]
 leidenalg = "^0.8.8"
 more-itertools = "^8.12.0"
-pathos = "^0.2.8" # Lambda-friendly multithreading
 pycatch22 = "^0.4.2"

-# Networking
-omero-py = { version = ">=5.6.2", optional = true } # contact omero server
-zeroc-ice = { version="3.6.5", optional = true } # networking interface, slow to build
-GitPython = "^3.1.27"
-Bottleneck = "^1.3.5"
+[tool.poetry.group.pp]
+optional = true

-[tool.poetry.extras]
-omero = [ "omero-py" ]
-network = [ "omero-py", "zeroc-ice" ]

 [tool.poetry.group.dev]
 optional = true
@@ -73,9 +79,10 @@ pre-commit = "^2.20.0"
 seaborn = "^0.11.2"
 debugpy = "^1.6.3"
 coverage = "^7.0.4"
-
-# [tool.poetry.group.gui.dependencies]
-# napari = ">=0.4.16"
+jupytext = "^1.14.4"
+grid-strategy = "^0.0.1"
+readchar = "^4.0.3"
+ipdb = "^0.13.11"

 [tool.poetry.group.docs]
 optional = true
@@ -92,6 +99,21 @@ optional = true
 [tool.poetry.group.test.dependencies]
 pytest = "^6.2.5"

+[tool.poetry.group.utils]
+optional = true
+
+# Dependency groups can only be used by a poetry installation, not pip
+[tool.poetry.group.utils.dependencies]
+napari = {version = ">=0.4.16", optional=true}
+Torch = {version = "^1.13.1", optional=true}
+pytorch-lightning = {version = "^1.9.3", optional=true}
+torchvision = {version = "^0.14.1", optional=true}
+trio = {version = "^0.22.0", optional=true}
+grid-strategy = {version = "^0.0.1", optional=true}
+
+[tool.poetry.extras]
+omero = ["omero-py"]
+baby = ["aliby-baby"]

 [tool.black]
 line-length = 79

--- a/src/agora/abc.py
+++ b/src/agora/abc.py
@@ -3,7 +3,7 @@ import typing as t
 from abc import ABC, abstractmethod
 from collections.abc import Iterable
 from copy import copy
-from pathlib import Path, PosixPath
+from pathlib import Path
 from time import perf_counter
 from typing import Union

@@ -60,14 +60,14 @@ class ParametersABC(ABC):
        else:
            return iterable

-    def to_yaml(self, path: Union[PosixPath, str] = None):
+    def to_yaml(self, path: Union[Path, str] = None):
        """
        Returns a yaml stream of the attributes of the class instance.
        If path is provided, the yaml stream is saved there.

        Parameters
        ----------
-        path : Union[PosixPath, str]
+        path : Union[Path, str]
            Output path.
        """
        if path:
@@ -80,7 +80,7 @@ class ParametersABC(ABC):
        return cls(**d)

    @classmethod
-    def from_yaml(cls, source: Union[PosixPath, str]):
+    def from_yaml(cls, source: Union[Path, str]):
        """
        Returns instance from a yaml filename or stdin
        """
@@ -202,7 +202,7 @@ class ProcessABC(ABC):
    def run(self):
        pass

-    def _log(self, message: str, level: str = "warn"):
+    def _log(self, message: str, level: str = "warning"):
        # Log messages in the corresponding level
        logger = logging.getLogger("aliby")
        getattr(logger, level)(f"{self.__class__.__name__}: {message}")
@@ -211,7 +211,7 @@ class ProcessABC(ABC):
 def check_type_recursive(val1, val2):
    same_types = True
    if not isinstance(val1, type(val2)) and not all(
-        type(x) in (PosixPath, str) for x in (val1, val2)  # Ignore str->path
+        type(x) in (Path, str) for x in (val1, val2)  # Ignore str->path
    ):
        return False
    if not isinstance(val1, t.Iterable) and not isinstance(val2, t.Iterable):
@@ -249,5 +249,5 @@ class StepABC(ProcessABC):
        return self._run_tp(tp, **kwargs)

    def run(self):
-        # Replace run withn run_tp
+        # Replace run with run_tp
        raise Warning("Steps use run_tp instead of run")
--- a/src/agora/io/bridge.py
+++ b/src/agora/io/bridge.py
@@ -162,5 +162,8 @@ def image_creds_from_h5(fpath: str):
    attrs = attrs_from_h5(fpath)
    return (
        attrs["image_id"],
-        yaml.safe_load(attrs["parameters"])["general"]["server_info"],
+        {
+            k: yaml.safe_load(attrs["parameters"])["general"][k]
+            for k in ("username", "password", "host")
+        },
    )
--- a/src/agora/io/cells.py
+++ b/src/agora/io/cells.py
 import logging
 import typing as t
-from collections.abc import Iterable
 from itertools import groupby
-from pathlib import Path, PosixPath
+from pathlib import Path
 from functools import lru_cache, cached_property

 import h5py
@@ -27,13 +26,13 @@ class Cells:
    """

    def __init__(self, filename, path="cell_info"):
-        self.filename: t.Optional[t.Union[str, PosixPath]] = filename
+        self.filename: t.Optional[t.Union[str, Path]] = filename
        self.cinfo_path: t.Optional[str] = path
        self._edgemasks: t.Optional[str] = None
        self._tile_size: t.Optional[int] = None

    @classmethod
-    def from_source(cls, source: t.Union[PosixPath, str]):
+    def from_source(cls, source: t.Union[Path, str]):
        return cls(Path(source))

    def _log(self, message: str, level: str = "warn"):
@@ -146,12 +145,47 @@ class Cells:
        )

    def mask(self, cell_id, trap_id):
+        """
+        Returns the times and the binary masks of a given cell in a given tile.
+
+        Parameters
+        ----------
+        cell_id : int
+            The unique ID of the cell.
+        tile_id : int
+            The unique ID of the tile.
+
+        Returns
+        -------
+        Tuple[np.ndarray, np.ndarray]
+            The times when the binary masks were taken and the binary masks of the given cell in the given tile.
+
+        """
        times, outlines = self.outline(cell_id, trap_id)
        return times, np.array(
            [ndimage.morphology.binary_fill_holes(o) for o in outlines]
        )

-    def at_time(self, timepoint, kind="mask"):
+    def at_time(
+        self, timepoint: t.Iterable[int], kind="mask"
+    ) -> t.List[t.List[np.ndarray]]:
+        """
+        Returns a list of lists of binary masks in a given list of time points.
+
+        Parameters
+        ----------
+        timepoints : Iterable[int]
+            The list of time points for which to return the binary masks.
+        kind : str, optional
+            The type of binary masks to return, by default "mask".
+
+        Returns
+        -------
+        List[List[np.ndarray]]
+            A list of lists with binary masks grouped by tile IDs.
+
+        """
+
        ix = self["timepoint"] == timepoint
        traps = self["trap"][ix]
        edgemasks = self._edgem_from_masking(ix)
@@ -162,6 +196,33 @@ class Cells:
        ]
        return self.group_by_traps(traps, masks)

+    def at_times(
+        self, timepoints: t.Iterable[int], kind="mask"
+    ) -> t.List[t.List[np.ndarray]]:
+        """
+        Returns a list of lists of binary masks for a given list of time points.
+
+        Parameters
+        ----------
+        timepoints : Iterable[int]
+            The list of time points for which to return the binary masks.
+        kind : str, optional
+            The type of binary masks to return, by default "mask".
+
+        Returns
+        -------
+        List[List[np.ndarray]]
+            A list of lists with binary masks grouped by tile IDs.
+
+        """
+        return [
+            [
+                np.stack(tile_masks) if len(tile_masks) else []
+                for tile_masks in self.at_time(tp, kind=kind).values()
+            ]
+            for tp in timepoints
+        ]
+
    def group_by_traps(
        self, traps: t.Collection, cell_labels: t.Collection
    ) -> t.Dict[int, t.List[int]]:
@@ -217,8 +278,39 @@ class Cells:
    def ntimepoints(self) -> int:
        return self["timepoint"].max() + 1

+    @cached_property
+    def _cells_vs_tps(self):
+        # Binary matrix showing the presence of all cells in all time points
+        ncells_per_tile = [len(x) for x in self.labels]
+        cells_vs_tps = np.zeros(
+            (sum(ncells_per_tile), self.ntimepoints), dtype=bool
+        )
+
+        cells_vs_tps[
+            self._cell_cumsum[self["trap"]] + self["cell_label"] - 1,
+            self["timepoint"],
+        ] = True
+        return cells_vs_tps
+
+    @cached_property
+    def _cell_cumsum(self):
+        # Cumulative sum indicating the number of cells per tile
+        ncells_per_tile = [len(x) for x in self.labels]
+        cumsum = np.roll(np.cumsum(ncells_per_tile), shift=1)
+        cumsum[0] = 0
+        return cumsum
+
+    def _flat_index_to_tuple_location(self, idx: int) -> t.Tuple[int, int]:
+        # Convert a cell index to a tuple
+        # Note that it assumes tiles and cell labels are flattened, but
+        # it is agnostic to tps
+
+        tile_id = int(np.where(idx + 1 > self._cell_cumsum)[0][-1])
+        cell_label = idx - self._cell_cumsum[tile_id] + 1
+        return tile_id, cell_label
+
    @property
-    def ncells_matrix(self):
+    def _tiles_vs_cells_vs_tps(self):
        ncells_mat = np.zeros(
            (self.ntraps, self["cell_label"].max(), self.ntimepoints),
            dtype=bool,
@@ -228,46 +320,28 @@ class Cells:
        ] = True
        return ncells_mat

-    def matrix_trap_tp_where(
-        self, min_ncells: int = None, min_consecutive_tps: int = None
+    def cell_tp_where(
+        self,
+        min_consecutive_tps: int = 15,
+        interval: None or t.Tuple[int, int] = None,
    ):
-        """
-        Return a matrix of shape (ntraps x ntps - min_consecutive_tps to
-        indicate traps and time-points where min_ncells are available for at least min_consecutive_tps
-
-        Parameters
-        ---------
-            min_ncells: int Minimum number of cells
-            min_consecutive_tps: int
-                Minimum number of time-points a
-
-        Returns
-        ---------
-            (ntraps x ( ntps-min_consecutive_tps )) 2D boolean numpy array where rows are trap ids and columns are timepoint windows.
-            If the value in a cell is true its corresponding trap and timepoint contains more than min_ncells for at least min_consecutive time-points.
-        """
-        if min_ncells is None:
-            min_ncells = 2
-        if min_consecutive_tps is None:
-            min_consecutive_tps = 5
-
        window = sliding_window_view(
-            self.ncells_matrix, min_consecutive_tps, axis=2
+            self._cells_vs_tps, min_consecutive_tps, axis=1
        )
+
        tp_min = window.sum(axis=-1) == min_consecutive_tps
-        ncells_tp_min = tp_min.sum(axis=1) >= min_ncells
-        return ncells_tp_min

-    def random_valid_trap_tp(
-        self, min_ncells: int = None, min_consecutive_tps: int = None
-    ):
-        # Return a randomly-selected pair of trap_id and timepoints
-        mat = self.matrix_trap_tp_where(
-            min_ncells=min_ncells, min_consecutive_tps=min_consecutive_tps
-        )
-        traps, tps = np.where(mat)
-        rand = np.random.randint(mat.sum())
-        return (traps[rand], tps[rand])
+        # Apply an interval filter to focucs on a slice
+        if interval is not None:
+            interval = tuple(np.array(interval))
+        else:
+            interval = (0, window.shape[1])
+
+        low_boundary, high_boundary = interval
+
+        tp_min[:, :low_boundary] = False
+        tp_min[:, high_boundary:] = False
+        return tp_min

    @lru_cache(20)
    def mothers_in_trap(self, trap_id: int):
@@ -288,8 +362,17 @@ class Cells:
    @cached_property
    def mothers_daughters(self) -> np.ndarray:
        """
-        Return mothers and daugters as a single array with three columns:
-        trap, mothers and daughters
+        Return a single array with three columns, containing information about
+        the mother-daughter relationships: tile, mothers and daughters.
+
+        Returns
+        -------
+        np.ndarray
+            An array with shape (n, 3) where n is the number of mother-daughter pairs found.
+            The columns contain:
+            - tile: the tile where the mother cell is located.
+            - mothers: the index of the mother cell within the tile.
+            - daughters: the index of the daughter cell within the tile.
        """
        nested_massign = self.mothers

@@ -311,7 +394,41 @@ class Cells:

    @staticmethod
    def mother_assign_to_mb_matrix(ma: t.List[np.array]):
-        # Convert from list of lists to mother_bud sparse matrix
+        """
+        Convert from a list of lists of mother-bud paired assignments to a
+        sparse matrix with a boolean dtype. The rows correspond to
+        to daughter buds. The values are boolean and indicate whether a
+        given cell is a mother cell and a given daughter bud is assigned
+        to the mother cell in the next timepoint.
+
+        Parameters:
+        -----------
+        ma : list of lists of integers
+            A list of lists of mother-bud assignments. The i-th sublist contains the
+            bud assignments for the i-th tile. The integers in each sublist
+            represent the mother label, if it is zero no mother was found.
+
+        Returns:
+        --------
+        mb_matrix : boolean numpy array of shape (n, m)
+            An n x m boolean numpy array where n is the total number of cells (sum
+            of the lengths of all sublists in ma) and m is the maximum number of buds
+            assigned to any mother cell in ma. The value at (i, j) is True if cell i
+            is a daughter cell and cell j is its mother assigned to i.
+
+        Examples:
+        --------
+        ma = [[0, 0, 1], [0, 1, 0]]
+        Cells(None).mother_assign_to_mb_matrix(ma)
+        # array([[False, False, False, False, False, False],
+        #        [False, False, False, False, False, False],
+        #        [ True, False, False, False, False, False],
+        #        [False, False, False, False, False, False],
+        #        [False, False, False,  True, False, False],
+        #        [False, False, False, False, False, False]])
+
+        """
+
        ncells = sum([len(t) for t in ma])
        mb_matrix = np.zeros((ncells, ncells), dtype=bool)
        c = 0
@@ -326,10 +443,26 @@ class Cells:

    @staticmethod
    def mother_assign_from_dynamic(
-        ma, cell_label: t.List[int], trap: t.List[int], ntraps: int
-    ):
+        ma: np.ndarray, cell_label: t.List[int], trap: t.List[int], ntraps: int
+    ) -> t.List[t.List[int]]:
        """
-        Interpolate the list of lists containing the associated mothers from the mother_assign_dynamic feature
+        Interpolate the associated mothers from the 'mother_assign_dynamic' feature.
+
+        Parameters
+        ----------
+        ma: np.ndarray
+            An array with shape (n_t, n_c) containing the 'mother_assign_dynamic' feature.
+        cell_label: List[int]
+            A list containing the cell labels.
+        trap: List[int]
+            A list containing the trap labels.
+        ntraps: int
+            The total number of traps.
+
+        Returns
+        -------
+        List[List[int]]
+            A list of lists containing the interpolated mother assignment for each cell in each trap.
        """
        idlist = list(zip(trap, cell_label))
        cell_gid = np.unique(idlist, axis=0)
@@ -352,14 +485,33 @@ class Cells:
        return nested_massign

    @lru_cache(maxsize=200)
-    def labelled_in_frame(self, frame: int, global_id=False) -> np.ndarray:
+    def labelled_in_frame(
+        self, frame: int, global_id: bool = False
+    ) -> np.ndarray:
        """
-        Return labels in a ndarray with the global ids
-        with shape (ntraps, max_nlabels, ysize, xsize)
-        at a given frame.
+        Returns labels in a 4D ndarray with the global ids with shape
+        (ntraps, max_nlabels, ysize, xsize) at a given frame.
+
+        Parameters
+        ----------
+        frame : int
+            The frame number.
+        global_id : bool, optional
+            If True, the returned array contains global ids, otherwise it
+            contains only the local ids of the labels. Default is False.
+
+        Returns
+        -------
+        np.ndarray
+            A 4D numpy array containing the labels in the given frame.
+            The array has dimensions (ntraps, max_nlabels, ysize, xsize),
+            where max_nlabels is specific for this frame, not the entire
+            experiment.
+
+        Notes
+        -----
+        This method uses lru_cache to cache the results for faster access.

-        max_nlabels is specific for this frame, not
-        the entire experiment.
        """
        labels_in_frame = self.labels_at_time(frame)
        n_labels = [
@@ -396,22 +548,180 @@ class Cells:
                labels_mat[trap_id] += global_id_masks
        return labels_mat

-    def get_stacks_in_frame(self, frame: int, tile_shape: t.Tuple[int]):
-        # Stack all cells in a trap-wise manner
+    def get_stacks_in_frame(
+        self, frame: int, tile_shape: t.Tuple[int]
+    ) -> t.List[np.ndarray]:
+        """
+        Returns a list of stacked masks, each corresponding to a tile at a given timepoint.
+
+        Parameters
+        ----------
+        frame : int
+            Frame for which to obtain the stacked masks.
+        tile_shape : Tuple[int]
+            Shape of a tile to stack the masks into.
+
+        Returns
+        -------
+        List[np.ndarray]
+            List of stacked masks for each tile at the given timepoint.
+        """
        masks = self.at_time(frame)
        return [
-            stack_masks_in_trap(
+            stack_masks_in_tile(
                masks.get(trap_id, np.array([], dtype=bool)), tile_shape
            )
            for trap_id in range(self.ntraps)
        ]

+    def _sample_tiles_tps(
+        self,
+        size=1,
+        min_consecutive_ntps: int = 15,
+        seed: int = 0,
+        interval=None,
+    ) -> t.Tuple[np.ndarray, np.ndarray]:
+        """
+        Sample tiles that have a minimum number of cells and are occupied for at least a minimum number of consecutive timepoints.
+
+        Parameters
+        ----------
+        size: int, optional (default=1)
+            The number of tiles to sample.
+        min_ncells: int, optional (default=2)
+            The minimum number of cells per tile.
+        min_consecutive_ntps: int, optional (default=5)
+            The minimum number of consecutive timepoints a cell must be present in a trap.
+        seed: int, optional (default=0)
+            Random seed value for reproducibility.
+        interval: None or Tuple(int,int), optional (default=None)
+            Random seed value for reproducibility.
+
+        Returns
+        -------
+        Tuple[np.ndarray, np.ndarray,np.ndarray]
+            A tuple of 1D numpy arrays containing the indices of the sampled tiles and the corresponding timepoints.
+        """
+        # cell_availability_matrix = self.matrix_trap_tp_where(
+        #     min_ncells=min_ncells, min_consecutive_tps=min_consecutive_ntps
+        # )
+
+        # # Find all valid tiles with min_ncells for at least min_tps
+        # tile_ids, tps = np.where(cell_availability_matrix)
+        cell_availability_matrix = self.cell_tp_where(
+            min_consecutive_tps=min_consecutive_ntps,
+            interval=interval,
+        )
+
+        # Find all valid tiles with min_ncells for at least min_tps
+        index_id, tps = np.where(cell_availability_matrix)
+
+        if interval is None:  # Limit search
+            interval = (0, cell_availability_matrix.shape[1])
+
+        np.random.seed(seed)
+        choices = np.random.randint(len(index_id), size=size)
+
+        linear_indices = np.zeros_like(self["cell_label"], dtype=bool)
+        for cell_index_flat, tp in zip(index_id[choices], tps[choices]):
+            tile_id, cell_label = self._flat_index_to_tuple_location(
+                cell_index_flat
+            )
+            linear_indices[
+                (
+                    (self["cell_label"] == cell_label)
+                    & (self["trap"] == tile_id)
+                    & (self["timepoint"] == tp)
+                )
+            ] = True
+
+        return linear_indices
+
+    def _sample_masks(
+        self,
+        size: int = 1,
+        min_consecutive_ntps: int = 15,
+        interval: t.Union[None, t.Tuple[int, int]] = None,
+        seed: int = 0,
+    ) -> t.Tuple[t.Tuple[t.List[int], t.List[int], t.List[int]], np.ndarray]:
+        """
+        Sample a number of cells from within an interval.
+
+        Parameters
+        ----------
+        size: int, optional (default=1)
+            The number of cells to sample.
+        min_ncells: int, optional (default=2)
+            The minimum number of cells per tile.
+        min_consecutive_ntps: int, optional (default=5)
+            The minimum number of consecutive timepoints a cell must be present in a trap.
+        seed: int, optional (default=0)
+            Random seed value for reproducibility.
+
+        Returns
+        -------
+        Tuple[Tuple[np.ndarray, np.ndarray, List[int]], List[np.ndarray]]
+            Two tuples are returned. The first tuple contains:
+            - `tile_ids`: A 1D numpy array of the tile ids that correspond to the tile identifier.
+            - `tps`: A 1D numpy array of the timepoints at which the cells were sampled.
+            - `cell_ids`: A list of integers that correspond to the local id of the sampled cells.
+            The second tuple contains:
+            - `masks`: A list of 2D numpy arrays representing the binary masks of the sampled cells at each timepoint.
+        """
+        sampled_bitmask = self._sample_tiles_tps(
+            size=size,
+            min_consecutive_ntps=min_consecutive_ntps,
+            seed=seed,
+            interval=interval,
+        )
+
+        #  Sort sampled tiles to use automatic cache when possible
+        tile_ids = self["trap"][sampled_bitmask]
+        cell_labels = self["cell_label"][sampled_bitmask]
+        tps = self["timepoint"][sampled_bitmask]
+
+        masks = []
+        for tile_id, cell_label, tp in zip(tile_ids, cell_labels, tps):
+            local_idx = self.labels_at_time(tp)[tile_id].index(cell_label)
+            tile_mask = self.at_time(tp)[tile_id][local_idx]
+            masks.append(tile_mask)
+
+        return (tile_ids, cell_labels, tps), np.stack(masks)
+
+    def matrix_trap_tp_where(
+        self, min_ncells: int = 2, min_consecutive_tps: int = 5
+    ):
+        """
+        NOTE CURRENLTY UNUSED WITHIN ALIBY THE MOMENT. MAY BE USEFUL IN THE FUTURE.
+
+        Return a matrix of shape (ntraps x ntps - min_consecutive_tps) to
+        indicate traps and time-points where min_ncells are available for at least min_consecutive_tps
+
+        Parameters
+        ---------
+            min_ncells: int Minimum number of cells
+            min_consecutive_tps: int
+                Minimum number of time-points a
+
+        Returns
+        ---------
+            (ntraps x ( ntps-min_consecutive_tps )) 2D boolean numpy array where rows are trap ids and columns are timepoint windows.
+            If the value in a cell is true its corresponding trap and timepoint contains more than min_ncells for at least min_consecutive time-points.
+        """
+
+        window = sliding_window_view(
+            self._tiles_vs_cells_vs_tps, min_consecutive_tps, axis=2
+        )
+        tp_min = window.sum(axis=-1) == min_consecutive_tps
+        ncells_tp_min = tp_min.sum(axis=1) >= min_ncells
+        return ncells_tp_min
+

-def stack_masks_in_trap(
+def stack_masks_in_tile(
    masks: t.List[np.ndarray], tile_shape: t.Tuple[int]
 ) -> np.ndarray:
    # Stack all masks in a trap padding accordingly if no outlines found
    result = np.zeros((0, *tile_shape), dtype=bool)
    if len(masks):
-        result = np.array(masks)
+        result = np.stack(masks)
    return result
--- a/src/agora/io/metadata.py
+++ b/src/agora/io/metadata.py
 """
-Anthology of interfaces for different parsers and lack of them.
+Anthology of interfaces fordispatch_metadata_parse different parsers and lack of them.

 ALIBY decides on using different metadata parsers based on two elements:

@@ -12,10 +12,11 @@ If there are no metadata files, ALIBY requires indicating indices for tiler, seg

 """
 import glob
+import logging
 import os
 import typing as t
 from datetime import datetime
-from pathlib import Path, PosixPath
+from pathlib import Path

 import pandas as pd
 from pytz import timezone
@@ -97,7 +98,9 @@ def find_file(root_dir, regex):
        )
        file = [sorted(file)[0]]
    if len(file) == 0:
-        print("Warning:Metadata: No valid swainlab .log found.")
+        logging.getLogger("aliby").log(
+            logging.WARNING, "Metadata: No valid swainlab .log found."
+        )
    else:
        return file[0]
    return None
@@ -173,7 +176,7 @@ def get_meta_from_legacy(parsed_metadata: dict):
    return result


-def parse_swainlab_metadata(filedir: t.Union[str, PosixPath]):
+def parse_swainlab_metadata(filedir: t.Union[str, Path]):
    """
    Dispatcher function that determines which parser to use based on the file ending.

@@ -192,7 +195,7 @@ def parse_swainlab_metadata(filedir: t.Union[str, PosixPath]):
        raw_parse = parse_from_swainlab_grammar(filepath)
        minimal_meta = get_meta_swainlab(raw_parse)
    else:
-        if filedir.is_file():
+        if filedir.is_file() or str(filedir).endswith(".zarr"):
            filedir = filedir.parent
        legacy_parse = parse_logfiles(filedir)
        minimal_meta = (
@@ -202,7 +205,7 @@ def parse_swainlab_metadata(filedir: t.Union[str, PosixPath]):
    return minimal_meta


-def dispatch_metadata_parser(filepath: t.Union[str, PosixPath]):
+def dispatch_metadata_parser(filepath: t.Union[str, Path]):
    """
    Function to dispatch different metadata parsers that convert logfiles into a
    basic metadata dictionary. Currently only contains the swainlab log parsers.
@@ -219,7 +222,7 @@ def dispatch_metadata_parser(filepath: t.Union[str, PosixPath]):
    return parsed_meta


-def dir_to_meta(path: PosixPath, suffix="tiff"):
+def dir_to_meta(path: Path, suffix="tiff"):
    filenames = list(path.glob(f"*.{suffix}"))

    try:

--- a/src/agora/io/signal.py
+++ b/src/agora/io/signal.py
+import logging
 import typing as t
 from copy import copy
 from functools import cached_property, lru_cache
-from pathlib import PosixPath
+from pathlib import Path

 import bottleneck as bn
 import h5py
@@ -10,7 +11,7 @@ import pandas as pd

 from agora.io.bridge import BridgeH5
 from agora.io.decorators import _first_arg_str_to_df
-from agora.utils.association import validate_association
+from agora.utils.indexing import validate_association
 from agora.utils.kymograph import add_index_levels
 from agora.utils.merge import apply_merges

@@ -22,7 +23,7 @@ class Signal(BridgeH5):
    Signal assumes that the metadata and data are accessible to perform time-adjustments and apply previously recorded post-processes.
    """

-    def __init__(self, file: t.Union[str, PosixPath]):
+    def __init__(self, file: t.Union[str, Path]):
        """Define index_names for dataframes, candidate fluorescence channels, and composite statistics."""
        super().__init__(file, flag=None)
        self.index_names = (
@@ -46,20 +47,25 @@ class Signal(BridgeH5):
    def __getitem__(self, dsets: t.Union[str, t.Collection]):
        """Get and potentially pre-process data from h5 file and return as a dataframe."""
        if isinstance(dsets, str):  # no pre-processing
-            df = self.get_raw(dsets)
-            return self.add_name(df, dsets)
+            return self.get(dsets)
        elif isinstance(dsets, list):  # pre-processing
            is_bgd = [dset.endswith("imBackground") for dset in dsets]
-            # Check we are not comaring tile-indexed and cell-indexed data
+            # Check we are not comparing tile-indexed and cell-indexed data
            assert sum(is_bgd) == 0 or sum(is_bgd) == len(
                dsets
            ), "Tile data and cell data can't be mixed"
-            return [
-                self.add_name(self.apply_prepost(dset), dset) for dset in dsets
-            ]
+            return [self.get(dset) for dset in dsets]
        else:
            raise Exception(f"Invalid type {type(dsets)} to get datasets")

+    def get(self, dsets: t.Union[str, t.Collection], **kwargs):
+        """Get and potentially pre-process data from h5 file and return as a dataframe."""
+        if isinstance(dsets, str):  # no pre-processing
+            df = self.get_raw(dsets, **kwargs)
+            prepost_applied = self.apply_prepost(dsets, **kwargs)
+
+            return self.add_name(prepost_applied, dsets)
+
    @staticmethod
    def add_name(df, name):
        """Add column of identical strings to a dataframe."""
@@ -85,10 +91,10 @@ class Signal(BridgeH5):
        """Find the interval between time points (minutes)."""
        tinterval_location = "time_settings/timeinterval"
        with h5py.File(self.filename, "r") as f:
-            if tinterval_location in f:
+            if tinterval_location in f.attrs:
                return f.attrs[tinterval_location][0]
            else:
-                print(
+                logging.getlogger("aliby").warn(
                    f"{str(self.filename).split('/')[-1]}: using default time interval of 5 minutes"
                )
                return 5
@@ -128,18 +134,24 @@ class Signal(BridgeH5):
        Returns an array with three columns: the tile id, the mother label, and the daughter label.
        """
        if lineage_location is None:
-            lineage_location = "postprocessing/lineage"
-            if merged:
-                lineage_location += "_merged"
+            lineage_location = "modifiers/lineage_merged"
        with h5py.File(self.filename, "r") as f:
+            # if lineage_location not in f:
+            #     lineage_location = lineage_location.split("_")[0]
+            if lineage_location not in f:
+                lineage_location = "postprocessing/lineage"
            tile_mo_da = f[lineage_location]
-            lineage = np.array(
-                (
-                    tile_mo_da["trap"],
-                    tile_mo_da["mother_label"],
-                    tile_mo_da["daughter_label"],
-                )
-            ).T
+
+            if isinstance(tile_mo_da, h5py.Dataset):
+                lineage = tile_mo_da[()]
+            else:
+                lineage = np.array(
+                    (
+                        tile_mo_da["trap"],
+                        tile_mo_da["mother_label"],
+                        tile_mo_da["daughter_label"],
+                    )
+                ).T
        return lineage

    @_first_arg_str_to_df
@@ -170,7 +182,7 @@ class Signal(BridgeH5):

        """
        if isinstance(merges, bool):
-            merges: np.ndarray = self.get_merges() if merges else np.array([])
+            merges: np.ndarray = self.load_merges() if merges else np.array([])
        if merges.any():
            merged = apply_merges(data, merges)
        else:
@@ -202,9 +214,8 @@ class Signal(BridgeH5):
                    merged = pd.DataFrame([], index=index)
        return merged

-    # Alan: do we need two similar properties - see below?
-    @property
-    def datasets(self):
+    @cached_property
+    def p_available(self):
        """Print data sets available in h5 file."""
        if not hasattr(self, "_available"):
            self._available = []
@@ -213,11 +224,6 @@ class Signal(BridgeH5):
        for sig in self._available:
            print(sig)

-    @cached_property
-    def p_available(self):
-        """Print data sets available in h5 file."""
-        self.datasets
-
    @cached_property
    def available(self):
        """Get data sets available in h5 file."""
@@ -297,7 +303,7 @@ class Signal(BridgeH5):
            self._log(f"Could not fetch dataset {dataset}: {e}", "error")
            raise e

-    def get_merges(self):
+    def load_merges(self):
        """Get merge events going up to the first level."""
        with h5py.File(self.filename, "r") as f:
            merges = f.get("modifiers/merges", np.array([]))
@@ -314,7 +320,9 @@ class Signal(BridgeH5):
        with h5py.File(self.filename, "r") as f:
            picks = set()
            if path in f:
-                picks = set(zip(*[f[path + name] for name in names]))
+                picks = set(
+                    zip(*[f[path + name] for name in names if name in f[path]])
+                )
            return picks

    def dataset_to_df(self, f: h5py.File, path: str) -> pd.DataFrame:

--- a/src/agora/io/utils.py
+++ b/src/agora/io/utils.py
@@ -5,14 +5,12 @@ import itertools
 import logging
 import operator
 import typing as t
-from functools import partial, wraps
+from functools import wraps
 from pathlib import Path
 from time import perf_counter
 from typing import Callable

 import cv2
-import h5py
-import numpy as np


 def repr_obj(obj, indent=0):

--- a/src/agora/io/writer.py
+++ b/src/agora/io/writer.py
@@ -11,7 +11,6 @@ import yaml
 from utils_find_1st import cmp_equal, find_1st

 from agora.io.bridge import BridgeH5
-from agora.io.utils import timed

 #################### Dynamic version ##################################

@@ -173,7 +172,6 @@ class DynamicWriter:
                            # append or create new dataset
                            self._append(value, key, hgroup)
                    except Exception as e:
-                        print(key, value)
                        self._log(
                            f"{key}:{value} could not be written: {e}", "error"
                        )
@@ -231,7 +229,6 @@ class LinearBabyWriter(DynamicWriter):
    Assumes the edgemasks are of form ((None, tile_size, tile_size), bool).
    """

-    # TODO make this YAML: Alan: why?
    compression = "gzip"
    _default_tile_size = 117
    datatypes = {
@@ -320,11 +317,7 @@ class StateWriter(DynamicWriter):
    @staticmethod
    def format_values_tpback(states: list, val_name: str):
        """Unpacks a dict of state data into tp_back, trap, value."""
-        # initialise as empty lists
-        # Alan: is this initialisation necessary?
-        tp_back, trap, value = [
-            [[] for _ in states[0][val_name]] for _ in range(3)
-        ]
+
        # store results as a list of tuples
        lbl_tuples = [
            (tp_back, trap, cell_label)
@@ -335,6 +328,11 @@ class StateWriter(DynamicWriter):
        # unpack list of tuples to define variables
        if len(lbl_tuples):
            tp_back, trap, value = zip(*lbl_tuples)
+        else:
+            # set as empty lists
+            tp_back, trap, value = [
+                [[] for _ in states[0][val_name]] for _ in range(3)
+            ]
        return tp_back, trap, value

    @staticmethod
@@ -410,9 +408,9 @@ class StateWriter(DynamicWriter):

 #################### Extraction version ###############################
 class Writer(BridgeH5):
-    """Class to transform data into compatible structures."""
-
-    # Alan: when is this used?
+    """
+    Class to transform data into compatible structures.
+    Used by Extractor and Postprocessor within the pipeline."""

    def __init__(self, filename, flag=None, compression="gzip"):
        """
@@ -474,7 +472,7 @@ class Writer(BridgeH5):
            self.write_pd(f, path, data, compression=self.compression)
        # data is a multi-index dataframe
        elif isinstance(data, pd.MultiIndex):
-            # Alan: should we still not compress here?
+            # TODO: benchmark I/O speed when using compression
            self.write_index(f, path, data)  # , compression=self.compression)
        # data is a dictionary of dataframes
        elif isinstance(data, Dict) and np.all(
@@ -551,7 +549,7 @@ class Writer(BridgeH5):
                compression=kwargs.get("compression", None),
            )
            dset = f[values_path]
-            dset[()] = df.values
+            dset[()] = df.values.astype("float16")

            # create dateset and write indices
            if not len(df):  # Only write more if not empty
@@ -568,21 +566,18 @@ class Writer(BridgeH5):
                )
                dset = f[indices_path]
                dset[()] = df.index.get_level_values(level=name).tolist()
-            # create dataset and write columns
-            if (
-                df.columns.dtype == int
-                or df.columns.dtype == np.dtype("uint")
-                or df.columns.name == "timepoint"
-            ):
+
+                # create dataset and write time points as columns
                tp_path = path + "/timepoint"
-                f.create_dataset(
-                    name=tp_path,
-                    shape=(df.shape[1],),
-                    maxshape=(max_tps,),
-                    dtype="uint16",
-                )
-                tps = list(range(df.shape[1]))
-                f[tp_path][tps] = tps
+                if tp_path not in f:
+                    f.create_dataset(
+                        name=tp_path,
+                        shape=(df.shape[1],),
+                        maxshape=(max_tps,),
+                        dtype="uint16",
+                    )
+                    tps = list(range(df.shape[1]))
+                    f[tp_path][tps] = tps
            else:
                f[path].attrs["columns"] = df.columns.tolist()
        else:

--- a/src/agora/utils/association.py
+++ b/src/agora/utils/association.py
 #!/usr/bin/env jupyter
-"""
-Utilities based on association are used to efficiently acquire indices of tracklets with some kind of relationship.
-This can be:
-    - Cells that are to be merged
-    - Cells that have a linear relationship
-"""
-
-import numpy as np
-import typing as t
-
-
-def validate_association(
-    association: np.ndarray,
-    indices: np.ndarray,
-    match_column: t.Optional[int] = None,
-) -> t.Tuple[np.ndarray, np.ndarray]:
-
-    """Select rows from the first array that are present in both.
-        We use casting for fast multiindexing, generalising for lineage dynamics
-
-
-        Parameters
-        ----------
-        association : np.ndarray
-            2-D array where columns are (trap, mother, daughter) or 3-D array where
-            dimensions are (X, (trap,mother), (trap,daughter))
-        indices : np.ndarray
-            2-D array where each column is a different level.
-        match_column: int
-            int indicating a specific column is required to match (i.e.
-            0-1 for target-source when trying to merge tracklets or mother-bud for lineage)
-            must be present in indices. If it is false one match suffices for the resultant indices
-            vector to be True.
-
-        Returns
-        -------
-        np.ndarray
-            1-D boolean array indicating valid merge events.
-        np.ndarray
-            1-D boolean array indicating indices with an association relationship.
-
-        Examples
-        --------
-
-        >>> import numpy as np
-        >>> from agora.utils.association import validate_association
-        >>> merges = np.array(range(12)).reshape(3,2,2)
-        >>> indices = np.array(range(6)).reshape(3,2)
-
-        >>> print(merges, indices)
-        >>> print(merges); print(indices)
-        [[[ 0  1]
-          [ 2  3]]
-
-         [[ 4  5]
-          [ 6  7]]
-
-         [[ 8  9]
-          [10 11]]]
-
-        [[0 1]
-         [2 3]
-         [4 5]]
-
-        >>> valid_associations, valid_indices  = validate_association(merges, indices)
-        >>> print(valid_associations, valid_indices)
-    [ True False False] [ True  True False]
-
-    """
-    if association.ndim < 3:
-        # Reshape into 3-D array for broadcasting if neded
-        association = np.stack(
-            (association[:, [0, 1]], association[:, [0, 2]]), axis=1
-        )
-
-    # Compare existing association with available indices
-    # Swap trap and label axes for the association array to correctly cast
-    valid_ndassociation = association[..., None] == indices.T[None, ...]
-
-    # Broadcasting is confusing (but efficient):
-    # First we check the dimension across trap and cell id, to ensure both match
-    valid_cell_ids = valid_ndassociation.all(axis=2)
-
-    if match_column is None:
-        # Then we check the merge tuples to check which cases have both target and source
-        valid_association = valid_cell_ids.any(axis=2).all(axis=1)
-
-        # Finally we check the dimension that crosses all indices, to ensure the pair
-        # is present in a valid merge event.
-        valid_indices = (
-            valid_ndassociation[valid_association].all(axis=2).any(axis=(0, 1))
-        )
-    else:  # We fetch specific indices if we aim for the ones with one present
-        valid_indices = valid_cell_ids[:, match_column].any(axis=0)
-        # Valid association then becomes a boolean array, true means that there is a
-        # match (match_column) between that cell and the index
-        valid_association = (
-            valid_cell_ids[:, match_column] & valid_indices
-        ).any(axis=1)
-
-    return valid_association, valid_indices
--- a/src/agora/utils/cast.py
+++ b/src/agora/utils/cast.py
+#!/usr/bin/env jupyter
+
+"""
+Convert some types to others
+"""
+
+
+def _str_to_int(x: str or None):
+    """
+    Cast string as int if possible. If Nonetype return None.
+    """
+    if x is not None:
+        try:
+            return int(x)
+        except:
+            return x
No results found