Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • swain-lab/aliby/aliby-mirror
  • swain-lab/aliby/alibylite
2 results
Show changes
Commits on Source (980)
Showing
with 479 additions and 846 deletions
......@@ -110,7 +110,6 @@ venv.bak/
omero_py/omeroweb/
omero_py/pipeline/
**.ipynb
data/
notebooks/
*.pdf
*.h5
......
image: python:3.7
image: python:3.8
cache:
key: "project-${CI_JOB_NAME}"
......@@ -12,8 +12,8 @@ variables:
TRIGGER_PYPI_NAME: ""
stages:
- test
- check
- tests
- checks
# - release
before_script:
......@@ -25,28 +25,49 @@ before_script:
# - git remote rm origin && git remote add origin https://${ACCESS_TOKEN_NAME}:${ACCESS_TOKEN}@${CI_SERVER_HOST}/${CI_PROJECT_PATH}.git
# - git config pull.rebase false
# - git pull origin HEAD:master
- rm -rf ~/.cache/pypoetry
- if [ ${var+TRIGGER_PYPI_NAME} ]; then echo "Pipeline triggered by ${TRIGGER_PYPI_NAME}"; poetry add ${TRIGGER_PYPI_NAME}@latest; fi
- poetry install -vv
# - rm -rf ~/.cache/pypoetry
# - if [ ${var+TRIGGER_PYPI_NAME} ]; then echo "Pipeline triggered by ${TRIGGER_PYPI_NAME}"; poetry add ${TRIGGER_PYPI_NAME}@latest; fi
# - export WITHOUT="docs,network";
- export ARGS="--with test,dev";
- if [[ "$CI_STAGE_NAME" == "tests" ]]; then echo "Installing system dependencies for ${CI_STAGE_NAME}"; apt update && apt install -y ffmpeg libsm6 libxext6; fi
- if [[ "$CI_JOB_NAME" == "Static Type" ]]; then echo "Activating development group"; export ARGS="${ARGS},dev"; fi
- if [[ "$CI_JOB_NAME" == "Network Tools Tests" ]]; then echo "Setting flag to compile zeroc-ice"; export ARGS="${ARGS} --all-extras"; fi
- poetry install -vv $ARGS
Unit test:
stage: test
Local Tests:
stage: tests
script:
- apt update && apt install ffmpeg libsm6 libxext6 -y
- poetry run pytest ./tests/
# - poetry install -vv
- poetry run coverage run -m --branch pytest ./tests --ignore ./tests/aliby/network --ignore ./tests/aliby/pipeline
- poetry run coverage report -m
- poetry run coverage xml
coverage: '/(?i)total.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
Python Code Lint:
stage: check
Network Tools Tests:
stage: tests
script:
- poetry run black .
- poetry run pytest ./tests/aliby/network
- DIRNAME="test_datasets"
- curl https://zenodo.org/record/7513194/files/test_datasets.tar.gz\?download\=1 -o "test_datasets.tar.gz"
- mkdir -p $DIRNAME
- tar xvf test_datasets.tar.gz -C $DIRNAME
- poetry run pytest -s tests/aliby/pipeline --file $DIRNAME/560_2022_11_30_pypipeline_unit_test_reconstituted_00
Static Type:
stage: check
stage: checks
allow_failure: true
script:
- poetry run black .
- poetry run isort .
- poetry run mypy . --exclude 'setup\.py$'
# We can remove the flag once this is resolved https://github.com/pypa/setuptools/issues/2345
# TODO add more tests before activating auto-release
# Bump_release:
# stage: release
# script:
......
## Summary
(Summarize the bug encountered concisely)
{Summarize the bug encountered concisely}
I confirm that I have (if relevant):
- [ ] Read the troubleshooting guide: https://gitlab.com/aliby/aliby/-/wikis/Troubleshooting-(basic)
- [ ] Updated aliby and aliby-baby.
- [ ] Tried the unit test.
- [ ] Tried a scaled-down version of my experiment (distributed=0, filter=0, tps=10)
- [ ] Tried re-postprocessing.
## Steps to reproduce
(How one can reproduce the issue - this is very important)
{How one can reproduce the issue - this is very important}
- aliby version: 0.1.{...}, or if development/unreleased version, commit SHA: {...}
- platform(s):
- [ ] Jura
- [ ] Other Linux, please specify distribution and version: {...}
- [ ] MacOS, please specify version: {...}
- [ ] Windows, please specify version: {...}
- experiment ID: {...}
- Any special things you need to know about this experiment: {...}
## What is the current bug behavior?
......@@ -19,6 +35,12 @@
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code, as
it's very hard to read otherwise.)
```
{PASTE YOUR ERROR MESSAGE HERE!!}
```
## Possible fixes
(If you can, link to the line of code that might be responsible for the problem)
......@@ -9,7 +9,7 @@ version: 2
build:
os: ubuntu-20.04
tools:
python: "3.7"
python: "3.8"
# Build documentation in the docs/ directory with Sphinx
sphinx:
......
# Contributing
We focus our work on python 3.7 due to the current neural network being developed on tensorflow 1. In the near future we will migrate the networ to pytorch to support more recent versions of all packages.
We focus our work on python 3.8 due to the current neural network being developed on tensorflow 1. In the near future we will migrate the network to pytorch to support more recent versions of all packages.
## Issues
All issues are managed within the gitlab [ repository ](https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/issues), if you don't have an account on the University of Edinburgh's gitlab instance and would like to submit issues please get in touch with [Prof. Peter Swain](mailto:peter.swain@ed.ac.uk ).
All issues are managed within the gitlab [ repository ](https://gitlab.com/aliby/aliby/-/issues), if you don't have an account on the University of Edinburgh's gitlab instance and would like to submit issues please get in touch with [Prof. Peter Swain](mailto:peter.swain@ed.ac.uk ).
## Data aggregation
......
# ALIBY (Analyser of Live-cell Imaging for Budding Yeast)
[![docs](https://readthedocs.org/projects/aliby/badge/?version=master)](https://aliby.readthedocs.io/en/latest)
[![PyPI version](https://badge.fury.io/py/aliby.svg)](https://badge.fury.io/py/aliby)
[![readthedocs](https://readthedocs.org/projects/aliby/badge/?version=latest)](https://aliby.readthedocs.io/en/latest)
[![pipeline status](https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/badges/master/pipeline.svg)](https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/pipelines)
[![pipeline](https://gitlab.com/aliby/aliby/badges/master/pipeline.svg?key_text=master)](https://gitlab.com/aliby/aliby/-/pipelines)
[![dev pipeline](https://gitlab.com/aliby/aliby/badges/dev/pipeline.svg?key_text=dev)](https://gitlab.com/aliby/aliby/-/commits/dev)
[![coverage](https://gitlab.com/aliby/aliby/badges/dev/coverage.svg)](https://gitlab.com/aliby/aliby/-/commits/dev)
The core classes and methods for the python microfluidics, microscopy, data analysis and reporting.
### Installation
See [INSTALL.md](./INSTALL.md) for installation instructions.
End-to-end processing of cell microscopy time-lapses. ALIBY automates segmentation, tracking, lineage predictions, post-processing and report production. It leverages the existing Python ecosystem and open-source scientific software available to produce seamless and standardised pipelines.
## Quickstart Documentation
### Setting up a server
For testing and development, the easiest way to set up an OMERO server is by
using Docker images.
[The software carpentry](https://software-carpentry.org/) and the [Open
Microscopy Environment](https://www.openmicroscopy.org), have provided
[instructions](https://ome.github.io/training-docker/) to do this.
The `docker-compose.yml` file can be used to create an OMERO server with an
accompanying PostgreSQL database, and an OMERO web server.
It is described in detail
[here](https://ome.github.io/training-docker/12-dockercompose/).
Our version of the `docker-compose.yml` has been adapted from the above to
use version 5.6 of OMERO.
To start these containers (in background):
```shell script
cd pipeline-core
docker-compose up -d
```
Omit the `-d` to run in foreground.
Installation of [VS Studio](https://visualstudio.microsoft.com/downloads/#microsoft-visual-c-redistributable-for-visual-studio-2022) Native MacOS support for is under work, but you can use containers (e.g., Docker, Podman) in the meantime.
To stop them, in the same directory, run:
```shell script
docker-compose stop
```
To analyse local data
```bash
pip install aliby
```
Add any of the optional flags `omero` and `utils` (e.g., `pip install aliby[omero, utils]`). `omero` provides tools to connect with an OMERO server and `utils` provides visualisation, user interface and additional deep learning tools.
See our [installation instructions]( https://aliby.readthedocs.io/en/latest/INSTALL.html ) for more details.
### CLI
If installed via poetry, you have access to a Command Line Interface (CLI)
```bash
aliby-run --expt_id EXPT_PATH --distributed 4 --tps None
```
And to run Omero servers, the basic arguments are shown:
```bash
aliby-run --expt_id XXX --host SERVER.ADDRESS --user USER --password PASSWORD
```
The output is a folder with the original logfiles and a set of hdf5 files, one with the results of each multidimensional inside.
### Raw data access
For more information, including available options, see the page on [running the analysis pipeline](https://aliby.readthedocs.io/en/latest/PIPELINE.html)
## Using specific components
### Access raw data
ALIBY's tooling can also be used as an interface to OMERO servers, for example, to fetch a brightfield channel.
```python
from aliby.io.dataset import Dataset
from aliby.io.image import Image
from aliby.io.omero import Dataset, Image
server_info= {
"host": "host_address",
......@@ -76,39 +77,36 @@ in time.
It fetches the metadata from the Image object, and uses the TilerParameters values (all Processes in aliby depend on an associated Parameters class, which is in essence a dictionary turned into a class.)
#### Get a timelapse for a given trap
#### Get a timelapse for a given tile (remote connection)
```python
fpath = "h5/location"
trap_id = 9
trange = list(range(0, 30))
tile_id = 9
trange = range(0, 10)
ncols = 8
riv = remoteImageViewer(fpath)
trap_tps = riv.get_trap_timepoints(trap_id, trange, ncols)
```
This can take several seconds at the moment.
For a speed-up: take fewer z-positions if you can.
trap_tps = [riv.tiler.get_tiles_timepoint(tile_id, t) for t in trange]
If you're not sure what indices to use:
```python
seg_expt.channels # Get a list of channels
channel = 'Brightfield'
ch_id = seg_expt.get_channel_index(channel)
# You can also access labelled traps
m_ts = riv.get_labelled_trap(tile_id=0, tps=[0])
n_traps = seg_expt.n_traps # Get the number of traps
# And plot them directly
riv.plot_labelled_trap(trap_id=0, channels=[0, 1, 2, 3], trange=range(10))
```
#### Get the traps for a given time point
Depending on the network speed can take several seconds at the moment.
For a speed-up: take fewer z-positions if you can.
#### Get the tiles for a given time point
Alternatively, if you want to get all the traps at a given timepoint:
```python
timepoint = 0
seg_expt.get_traps_timepoints(timepoint, tile_size=96, channels=None,
timepoint = (4,6)
tiler.get_tiles_timepoint(timepoint, channels=None,
z=[0,1,2,3,4])
```
### Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md) for installation instructions.
See [CONTRIBUTING](https://aliby.readthedocs.io/en/latest/INSTALL.html) on how to help out or get involved.
#!/usr/bin/env python3
import shutil
from pathlib import Path, PosixPath
from typing import Union
import omero
from aliby.io.image import ImageLocal
from aliby.io.omero import Argo
class DatasetLocal:
"""Load a dataset from a folder
We use a given image of a dataset to obtain the metadata, for we cannot expect folders to contain it straight away.
"""
def __init__(self, dpath: Union[str, PosixPath], *args, **kwargs):
self.fpath = Path(dpath)
assert len(self.get_images()), "No tif files found"
def __enter__(self):
return self
def __exit__(self, *exc):
return False
@property
def dataset(self):
return self.fpath
@property
def name(self):
return self.fpath.name
@property
def unique_name(self):
return self.fpath.name
@property
def date(self):
return ImageLocal(list(self.get_images().values())[0]).date
def get_images(self):
return {f.name: str(f) for f in self.fpath.glob("*.tif")}
@property
def files(self):
if not hasattr(self, "_files"):
self._files = {
f: f for f in self.fpath.rglob("*") if str(f).endswith(".txt")
}
return self._files
def cache_logs(self, root_dir):
for name, annotation in self.files.items():
shutil.copy(annotation, root_dir / name.name)
return True
class Dataset(Argo):
def __init__(self, expt_id, **server_info):
super().__init__(**server_info)
self.expt_id = expt_id
self._files = None
@property
def dataset(self):
return self.conn.getObject("Dataset", self.expt_id)
@property
def name(self):
return self.dataset.getName()
@property
def date(self):
return self.dataset.getDate()
@property
def unique_name(self):
return "_".join(
(
str(self.expt_id),
self.date.strftime("%Y_%m_%d").replace("/", "_"),
self.name,
)
)
def get_images(self):
return {im.getName(): im.getId() for im in self.dataset.listChildren()}
@property
def files(self):
if self._files is None:
self._files = {
x.getFileName(): x
for x in self.dataset.listAnnotations()
if isinstance(x, omero.gateway.FileAnnotationWrapper)
}
if not len(self._files):
raise Exception(
"exception:metadata: experiment has no annotation files."
)
return self._files
@property
def tags(self):
if self._tags is None:
self._tags = {
x.getname(): x
for x in self.dataset.listAnnotations()
if isinstance(x, omero.gateway.TagAnnotationWrapper)
}
return self._tags
def cache_logs(self, root_dir):
for name, annotation in self.files.items():
filepath = root_dir / annotation.getFileName().replace("/", "_")
if str(filepath).endswith("txt") and not filepath.exists():
# save only the text files
with open(str(filepath), "wb") as fd:
for chunk in annotation.getFileInChunks():
fd.write(chunk)
return True
#!/usr/bin/env python3
import typing as t
from datetime import datetime
from pathlib import Path, PosixPath
import dask.array as da
import xmltodict
from agora.io.writer import load_attributes
from dask.array.image import imread
from tifffile import TiffFile
from aliby.io.omero import Argo, get_data_lazy
def get_image_class(source: t.Union[str, int, t.Dict[str, str], PosixPath]):
"""
Wrapper to pick the appropiate Image class depending on the source of data.
"""
if isinstance(source, int):
instatiator = Image
elif isinstance(source, dict) or (
isinstance(source, (str, PosixPath)) and Path(source).is_dir()
):
instatiator = ImageDirectory
elif isinstance(source, str) and Path(source).is_file():
instatiator = ImageLocal
else:
raise Exception(f"Invalid data source at {source}")
return instatiator
class ImageLocal:
def __init__(self, path: str, dimorder=None):
self.path = path
self.image_id = str(path)
meta = dict()
try:
with TiffFile(path) as f:
self.meta = xmltodict.parse(f.ome_metadata)["OME"]
for dim in self.dimorder:
meta["size_" + dim.lower()] = int(
self.meta["Image"]["Pixels"]["@Size" + dim]
)
meta["channels"] = [
x["@Name"] for x in self.meta["Image"]["Pixels"]["Channel"]
]
meta["name"] = self.meta["Image"]["@Name"]
meta["type"] = self.meta["Image"]["Pixels"]["@Type"]
except Exception as e:
print("Metadata not found: {}".format(e))
assert (
self.dimorder or self.meta.get("dims") is not None
), "No dimensional info provided."
# Mark non-existent dimensions for padding
base = "TCZXY"
self.base = base
self.ids = [base.index(i) for i in dimorder]
self._dimorder = dimorder
self._meta = meta
def __enter__(self):
return self
def __exit__(self, *exc):
for e in exc:
if e is not None:
print(e)
return False
@property
def name(self):
return self._meta["name"]
@property
def data(self):
return self.get_data_lazy_local()
@property
def date(self):
date_str = [
x
for x in self.meta["StructuredAnnotations"]["TagAnnotation"]
if x["Description"] == "Date"
][0]["Value"]
return datetime.strptime(date_str, "%d-%b-%Y")
@property
def dimorder(self):
"""Order of dimensions in image"""
if not hasattr(self, "_dimorder"):
self._dimorder = self.meta["Image"]["Pixels"]["@DimensionOrder"]
return self._dimorder
@dimorder.setter
def dimorder(self, order: str):
self._dimorder = order
return self._dimorder
@property
def metadata(self):
return self._meta
def get_data_lazy_local(self) -> da.Array:
"""Return 5D dask array. For lazy-loading local multidimensional tiff files"""
if not hasattr(self, "formatted_img"):
if not hasattr(self, "ids"): # Standard dimension order
img = (imread(str(self.path))[0],)
else: # Custom dimension order, we rearrange the axes for compatibility
img = imread(str(self.path))[0]
for i, d in enumerate(self._dimorder):
self._meta["size_" + d.lower()] = img.shape[i]
target_order = (
*self.ids,
*[
i
for i, d in enumerate(self.base)
if d not in self.dimorder
],
)
reshaped = da.reshape(
img,
shape=(
*img.shape,
*[1 for _ in range(5 - len(self.dimorder))],
),
)
img = da.moveaxis(
reshaped, range(len(reshaped.shape)), target_order
)
self._formatted_img = da.rechunk(
img,
chunks=(1, 1, 1, self._meta["size_y"], self._meta["size_x"]),
)
return self._formatted_img
class ImageDirectory(ImageLocal):
"""
Image class for case where all images are split in one or multiple folders with time-points and channels as independent files.
It inherits from Imagelocal so we only override methods that are critical.
Assumptions:
- Assumes individual folders for individual channels. If only one path provided it assumes it to be brightfield.
- Assumes that images are flat.
- Provides Dimorder as TCZYX
"""
def __init__(self, path: t.Union[str, t.Dict[str, str]]):
if isinstance(path, str):
path = {"Brightfield": path}
self.path = path
self.image_id = str(path)
self._meta = dict(channels=path.keys(), name=list(path.values())[0])
# Parse name if necessary
# Build lazy-loading array using dask?
def get_data_lazy_local(self) -> da.Array:
"""Return 5D dask array. For lazy-loading local multidimensional tiff files"""
img = da.stack([imread(v) for v in self.path.values()])
if (
img.ndim < 5
): # Files do not include z-stack: Add and swap with time dimension.
img = da.stack((img,)).swapaxes(0, 2)
# TODO check whether x and y swap is necessary
# Use images to redefine axes
for i, dim in enumerate(("t", "c", "z", "y", "x")):
self._meta["size_" + dim] = img.shape[i]
self._formatted_img = da.rechunk(
img,
chunks=(1, 1, 1, self._meta["size_y"], self._meta["size_x"]),
)
return self._formatted_img
class Image(Argo):
"""
Loads images from OMERO and gives access to the data and metadata.
"""
def __init__(self, image_id, **server_info):
"""
Establishes the connection to the OMERO server via the Argo
base class.
Parameters
----------
image_id: integer
server_info: dictionary
Specifies the host, username, and password as strings
"""
super().__init__(**server_info)
self.image_id = image_id
# images from OMERO
self._image_wrap = None
@classmethod
def from_h5(
cls,
filepath: t.Union[str, PosixPath],
):
"""Instatiate Image from a hdf5 file.
Parameters
----------
cls : Image
Image class
filepath : t.Union[str, PosixPath]
Location of hdf5 file.
Examples
--------
FIXME: Add docs.
"""
metadata = load_attributes(filepath)
image_id = metadata["image_id"]
server_info = metadata["parameters"]["general"].get("server_info", {})
return cls(image_id, **server_info)
@property
def image_wrap(self):
"""
Get images from OMERO
"""
if self._image_wrap is None:
# get images using OMERO
self._image_wrap = self.conn.getObject("Image", self.image_id)
return self._image_wrap
@property
def name(self):
return self.image_wrap.getName()
@property
def data(self):
return get_data_lazy(self.image_wrap)
@property
def metadata(self):
"""
Store metadata saved in OMERO: image size, number of time points,
labels of channels, and image name.
"""
meta = dict()
meta["size_x"] = self.image_wrap.getSizeX()
meta["size_y"] = self.image_wrap.getSizeY()
meta["size_z"] = self.image_wrap.getSizeZ()
meta["size_c"] = self.image_wrap.getSizeC()
meta["size_t"] = self.image_wrap.getSizeT()
meta["channels"] = self.image_wrap.getChannelLabels()
meta["name"] = self.image_wrap.getName()
return meta
import dask.array as da
import numpy as np
from dask import delayed
from omero.gateway import BlitzGateway
from omero.model import enums as omero_enums
# convert OMERO definitions into numpy types
PIXEL_TYPES = {
omero_enums.PixelsTypeint8: np.int8,
omero_enums.PixelsTypeuint8: np.uint8,
omero_enums.PixelsTypeint16: np.int16,
omero_enums.PixelsTypeuint16: np.uint16,
omero_enums.PixelsTypeint32: np.int32,
omero_enums.PixelsTypeuint32: np.uint32,
omero_enums.PixelsTypefloat: np.float32,
omero_enums.PixelsTypedouble: np.float64,
}
class Argo:
"""
Base class to interact with OMERO.
See
https://docs.openmicroscopy.org/omero/5.6.0/developers/Python.html
"""
def __init__(
self,
host="islay.bio.ed.ac.uk",
username="upload",
password="***REMOVED***",
):
"""
Parameters
----------
host : string
web address of OMERO host
username: string
password : string
"""
self.conn = None
self.host = host
self.username = username
self.password = password
# standard method required for Python's with statement
def __enter__(self):
self.conn = BlitzGateway(
host=self.host, username=self.username, passwd=self.password
)
self.conn.connect()
self.conn.c.enableKeepAlive(60)
return self
# standard method required for Python's with statement
def __exit__(self, *exc):
for e in exc:
if e is not None:
print(e)
self.conn.close()
return False
def get_data_lazy(image) -> da.Array:
"""
Get 5D dask array, with delayed reading from OMERO image.
"""
nt, nc, nz, ny, nx = [getattr(image, f"getSize{x}")() for x in "TCZYX"]
pixels = image.getPrimaryPixels()
dtype = PIXEL_TYPES.get(pixels.getPixelsType().value, None)
# using dask
get_plane = delayed(lambda idx: pixels.getPlane(*idx))
def get_lazy_plane(zct):
return da.from_delayed(get_plane(zct), shape=(ny, nx), dtype=dtype)
# 5D stack: TCZXY
t_stacks = []
for t in range(nt):
c_stacks = []
for c in range(nc):
z_stack = []
for z in range(nz):
z_stack.append(get_lazy_plane((z, c, t)))
c_stacks.append(da.stack(z_stack))
t_stacks.append(da.stack(c_stacks))
return da.stack(t_stacks)
import re
import struct
def clean_ascii(text):
return re.sub(r"[^\x20-\x7F]", ".", text)
def xxd(x, start=0, stop=None):
if stop is None:
stop = len(x)
for i in range(start, stop, 8):
# Row number
print("%04d" % i, end=" ")
# Hexadecimal bytes
for r in range(i, i + 8):
print("%02x" % x[r], end="")
if (r + 1) % 4 == 0:
print(" ", end="")
# ASCII
print(
" ",
clean_ascii(x[i : i + 8].decode("utf-8", errors="ignore")),
" ",
end="",
)
# Int32
print(
"{:>10} {:>10}".format(*struct.unpack("II", x[i : i + 8])),
end=" ",
)
print("") # Newline
return
# Buffer reading functions
def read_int(buffer, n=1):
res = struct.unpack("I" * n, buffer.read(4 * n))
if n == 1:
res = res[0]
return res
def read_string(buffer):
return "".join([x.decode() for x in iter(lambda: buffer.read(1), b"\x00")])
def read_delim(buffer, n):
delim = read_int(buffer, n)
assert all([x == 0 for x in delim]), "Unknown nonzero value in delimiter"
"""
Post-processing utilities
Notes: I don't have statistics on ranges of radii for each of the knots in
the radial spline representation, but we regularly extract the average of
these radii for each cell. So, depending on camera/lens, we get:
* 60x evolve: mean radii of 2-14 pixels (and measured areas of 30-750
pixels^2)
* 60x prime95b: mean radii of 3-24 pixels (and measured areas of 60-2000
pixels^2)
And I presume that for a 100x lens we would get an ~5/3 increase over those
values.
In terms of the current volume estimation method, it's currently only
implemented in the AnalysisToolbox repository, but it's super simple:
mVol = 4/3*pi*sqrt(mArea/pi).^3
where mArea is simply the sum of pixels for that cell.
These functions were developed by D. Adjavon in 2020 to find the most accurate
way to calculate cell volume from a mask.
"""
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
from scipy import ndimage
from skimage import draw, measure
from skimage.morphology import ball, erosion
def my_ball(radius):
"""Generates a ball-shaped structuring element.
This is the 3D equivalent of a disk.
A pixel is within the neighborhood if the Euclidean distance between
it and the origin is no greater than radius.
Parameters
----------
radius : int
The radius of the ball-shaped structuring element.
Other Parameters
----------------
dtype : data-type
The data type of the structuring element.
Returns
-------
selem : ndarray
The structuring element where elements of the neighborhood
are 1 and 0 otherwise.
"""
n = 2 * radius + 1
Z, Y, X = np.mgrid[
-radius : radius : n * 1j,
-radius : radius : n * 1j,
-radius : radius : n * 1j,
]
X **= 2
Y **= 2
Z **= 2
X += Y
X += Z
# s = X ** 2 + Y ** 2 + Z ** 2
return X <= radius * radius
def circle_outline(r):
return ellipse_perimeter(r, r)
def ellipse_perimeter(x, y):
im_shape = int(2 * max(x, y) + 1)
img = np.zeros((im_shape, im_shape), dtype=np.uint8)
rr, cc = draw.ellipse_perimeter(
int(im_shape // 2), int(im_shape // 2), int(x), int(y)
)
img[rr, cc] = 1
return np.pad(img, 1)
def capped_cylinder(x, y):
max_size = y + 2 * x + 2
pixels = np.zeros((max_size, max_size))
rect_start = ((max_size - x) // 2, x + 1)
rr, cc = draw.rectangle_perimeter(
rect_start, extent=(x, y), shape=(max_size, max_size)
)
pixels[rr, cc] = 1
circle_centres = [
(max_size // 2 - 1, x),
(max_size // 2 - 1, max_size - x - 1),
]
for r, c in circle_centres:
rr, cc = draw.circle_perimeter(
r, c, (x + 1) // 2, shape=(max_size, max_size)
)
pixels[rr, cc] = 1
pixels = ndimage.morphology.binary_fill_holes(pixels)
pixels ^= erosion(pixels)
return pixels
def volume_of_sphere(radius):
return 4 / 3 * np.pi * radius**3
def plot_voxels(voxels):
verts, faces, _, _ = measure.marching_cubes_lewiner(voxels, 0)
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection="3d")
mesh = Poly3DCollection(verts[faces])
mesh.set_edgecolor("k")
ax.add_collection3d(mesh)
ax.set_xlim(0, voxels.shape[0])
ax.set_ylim(0, voxels.shape[1])
ax.set_zlim(0, voxels.shape[2])
plt.tight_layout()
plt.show()
# Volume estimation
def union_of_spheres(outline, shape="my_ball", debug=False):
filled = ndimage.binary_fill_holes(outline)
nearest_neighbor = (
ndimage.morphology.distance_transform_edt(outline == 0) * filled
)
voxels = np.zeros((filled.shape[0], filled.shape[1], max(filled.shape)))
c_z = voxels.shape[2] // 2
for x, y in zip(*np.where(filled)):
radius = nearest_neighbor[(x, y)]
if radius > 0:
if shape == "ball":
b = ball(radius)
elif shape == "my_ball":
b = my_ball(radius)
else:
raise ValueError(
f"{shape} is not an accepted value for " f"shape."
)
centre_b = ndimage.measurements.center_of_mass(b)
I, J, K = np.ogrid[: b.shape[0], : b.shape[1], : b.shape[2]]
voxels[
I + int(x - centre_b[0]),
J + int(y - centre_b[1]),
K + int(c_z - centre_b[2]),
] += b
if debug:
plot_voxels(voxels)
return voxels.astype(bool).sum()
def improved_uos(outline, shape="my_ball", debug=False):
filled = ndimage.binary_fill_holes(outline)
nearest_neighbor = (
ndimage.morphology.distance_transform_edt(outline == 0) * filled
)
voxels = np.zeros((filled.shape[0], filled.shape[1], max(filled.shape)))
c_z = voxels.shape[2] // 2
while np.any(nearest_neighbor != 0):
radius = np.max(nearest_neighbor)
x, y = np.argwhere(nearest_neighbor == radius)[0]
if shape == "ball":
b = ball(np.ceil(radius))
elif shape == "my_ball":
b = my_ball(np.ceil(radius))
else:
raise ValueError(f"{shape} is not an accepted value for shape")
centre_b = ndimage.measurements.center_of_mass(b)
I, J, K = np.ogrid[: b.shape[0], : b.shape[1], : b.shape[2]]
voxels[
I + int(x - centre_b[0]),
J + int(y - centre_b[1]),
K + int(c_z - centre_b[2]),
] += b
# Use the central disk of the ball from voxels to get the circle
# = 0 if nn[x,y] < r else nn[x,y]
rr, cc = draw.circle(x, y, np.ceil(radius), nearest_neighbor.shape)
nearest_neighbor[rr, cc] = 0
if debug:
plot_voxels(voxels)
return voxels.astype(bool).sum()
def conical(outline, debug=False):
nearest_neighbor = ndimage.morphology.distance_transform_edt(
outline == 0
) * ndimage.binary_fill_holes(outline)
if debug:
hf = plt.figure()
ha = hf.add_subplot(111, projection="3d")
X, Y = np.meshgrid(
np.arange(nearest_neighbor.shape[0]),
np.arange(nearest_neighbor.shape[1]),
)
ha.plot_surface(X, Y, nearest_neighbor)
plt.show()
return 4 * nearest_neighbor.sum()
def volume(outline, method="spheres"):
if method == "conical":
return conical(outline)
elif method == "spheres":
return union_of_spheres(outline)
else:
raise ValueError(f"Method {method} not implemented.")
numpydoc>=1.3.1
aliby>=0.1.25
aliby[network]>=0.1.43
sphinx-autodoc-typehints==1.19.2
sphinx-rtd-theme==1.0.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
myst-parser
sphinx-autodoc-typehints
# Installation
Tested on: Mac OSX Mojave and Ubuntu 20.04
## Requirements
We strongly recommend installing within a python environment as there are many dependencies that you may not want polluting your regular python environment.
Make sure you are using python 3.
An environment can be created with using the conda package manager:
An environment can be created using [Anaconda](https://www.anaconda.com/):
$ conda create --name <env>
$ conda activate <env>
......@@ -32,33 +30,140 @@ In your local environment, run:
Or using [pyenv](https://github.com/pyenv/pyenv) with pyenv-virtualenv:
$ pyenv install 3.7.9
$ pyenv virtualenv 3.7.9 aliby
$ pyenv install 3.8.14
$ pyenv virtualenv 3.8.14 aliby
$ pyenv local aliby
## Pipeline installation
### Pip version
Once you have created your local environment, run:
Once you have created and activated your virtual environment, run:
$ cd aliby
$ pip install -e ./
If you are not using an OMERO server setup:
$ pip install aliby
Otherwise, if you are contacting an OMERO server:
$ pip install aliby[network]
NOTE: Support for OMERO servers in GNU/Linux computers requires building ZeroC-Ice, thus it requires build tools. The versions for Windows and MacOS are provided as Python wheels and thus installation is faster.
### FAQ
- Installation fails during zeroc-ice compilation (Windows and MacOS).
For Windows, the simplest way to install it is using conda (or mamba). You can install the (OMERO) network components separately:
$ conda create -n aliby -c conda-forge python=3.8 omero-py
$ conda activate aliby
$ cd c:/Users/Public/Repos/aliby
$ \PATH\TO\POETRY\LOCATION\poetry install
- MacOS
For local access and processing, follow the same instructions as Linux. Remote access to OMERO servers depends on some issues in one of our depedencies being solved (See issue https://github.com/ome/omero-py/issues/317)
### Git version
We use [ poetry ](https://python-poetry.org/docs/#installation) for dependency management.
Install [ poetry ](https://python-poetry.org/docs/#installation) for dependency management.
In case you want to have local version:
$ git clone git@gitlab.com/aliby/aliby.git
$ cd aliby
and then either
$$ poetry install --all-extras
for everything, including tools to access OMERO servers, or
$$ poetry install
for a version with only local access, or
$$ poetry install --with dev
to install with compatible versions of the development tools we use, such as black.
These commands will automatically install the [ BABY ](https://gitlab.com/aliby/baby) segmentation software. Support for additional segmentation and tracking algorithms is under development.
## Omero Server
We use (and recommend) [OMERO](https://www.openmicroscopy.org/omero/) to manage our microscopy database, but ALIBY can process both locally-stored experiments and remote ones hosted on a server.
### Setting up a server
For testing and development, the easiest way to set up an OMERO server is by
using Docker images.
[The software carpentry](https://software-carpentry.org/) and the [Open
Microscopy Environment](https://www.openmicroscopy.org), have provided
[instructions](https://ome.github.io/training-docker/) to do this.
The `docker-compose.yml` file can be used to create an OMERO server with an
accompanying PostgreSQL database, and an OMERO web server.
It is described in detail
[here](https://ome.github.io/training-docker/12-dockercompose/).
Our version of the `docker-compose.yml` has been adapted from the above to
use version 5.6 of OMERO.
To start these containers (in background):
```shell script
cd pipeline-core
docker-compose up -d
```
Omit the `-d` to run in foreground.
To stop them, in the same directory, run:
```shell script
docker-compose stop
```
### Troubleshooting
Segmentation has been tested on: Mac OSX Mojave, Ubuntu 20.04 and Arch Linux.
Data processing has been tested on all the above and Windows 11.
### Detailed Windows installation
#### Create environment
Open anaconda powershell as administrator
```shell script
conda create -n devaliby2 -c conda-forge python=3.8 omero-py
conda activate devaliby2
```
#### Install poetry
You may have to specify the python executable to get this to work :
```shell script
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | C:\Users\USERNAME\Anaconda3\envs\devaliby2\python.exe -
``` Also specify full path when running poetry (there must be a way to sort this)
- Clone the repository (Assuming you have ssh properly set up)
```shell script
git clone git@gitlab.com:aliby/aliby.git
cd aliby
poetry install --all-extras
```
You may need to run the full poetry path twice - first time gave an error message, worked second time
```shell script
C:\Users\v1iclar2\AppData\Roaming\Python\Scripts\poetry install --all-extras
```
In case you want to have local versions (usually for development) the main three aliby dependencies you must install them in a specific order:
confirm installation of aliby - python...import aliby - get no error message
$ git clone git@git.ecdf.ed.ac.uk:swain-lab/aliby/aliby.git
$ git clone git@git.ecdf.ed.ac.uk:swain-lab/aliby/postprocessor.git
$ git clone git@git.ecdf.ed.ac.uk:swain-lab/aliby/agora.git
#### Access the virtual environment from the IDE (e.g., PyCharm)
New project
In location - navigate to the aliby folder (eg c::/Users/Public/Repos/aliby
$ cd aliby && poetry install
$ cd ../postprocessor && poetry install
$ cd ../agora && poetry install
- Select the correct python interpreter
click the interpreter name at the bottom right
click add local interpreter
on the left click conda environment
click the 3 dots to the right of the interpreter path and navigate to the python executable from the environment created above (eg C:\Users\v1iclar2\Anaconda3\envs\devaliby2\python.exe)
And that should install all three main dependencies in an editable mode. The same process can be used for [BABY](https://git.ecdf.ed.ac.uk/swain-lab/aliby/baby)
#### Potential Windows issues
- Sometimes the library pywin32 gives trouble, just install it using pip or conda
# Running the analysis pipeline
You can run the analysis pipeline either via the command line interface (CLI) or using a script that incorporates the `aliby.pipeline.Pipeline` object.
## CLI
On a CLI, you can use the `aliby-run` command. This command takes options as follows:
- `--host`: Address of image-hosting server.
- `--username`: Username to access image-hosting server.
- `--password`: Password to access image-hosting server.
- `--expt_id`: Number ID of experiment stored on host server.
- `--distributed`: Number of distributed cores to use for segmentation and signal processing. If 0, there is no parallelisation.
- `--tps`: Optional. Number of time points from the beginning of the experiment to use. If not specified, the pipeline processes all time points.
- `--directory`: Optional. Parent directory to save the data files (HDF5) generated, `./data` by default; the files will be stored in a child directory whose name is the name of the experiment.
- `--filter`: Optional. List of positions to use for analysis. Alternatively, a regex (regular expression) or list of regexes to search for positions. **Note: for the CLI, currently it is not able to take a list of strings as input.**
- `--overwrite`: Optional. Whether to overwrite an existing data directory. True by default.
- `--override_meta`: Optional. Whether to overwrite an existing data directory. True by default.
Example usage:
```bash
aliby-run --expt_id EXPT_PATH --distributed 4 --tps None
```
And to run Omero servers, the basic arguments are shown:
```bash
aliby-run --expt_id XXX --host SERVER.ADDRESS --user USER --password PASSWORD
```
## Script
Use the `aliby.pipeline.Pipeline` object and supply a dictionary, following the example below. The meaning of the parameters are the same as described in the CLI section above.
```python
#!/usr/bin/env python3
from aliby.pipeline import Pipeline, PipelineParameters
# Specify experiment IDs
ids = [101, 102]
for i in ids:
print(i)
try:
params = PipelineParameters.default(
# Create dictionary to define pipeline parameters.
general={
"expt_id": i,
"distributed": 6,
"host": "INSERT ADDRESS HERE",
"username": "INSERT USERNAME HERE",
"password": "INSERT PASSWORD HERE",
# Ensure data will be overwriten
"override_meta": True,
"overwrite": True,
}
)
# Fine-grained control beyond general parameters:
# change specific leaf in the extraction tree.
# This example tells the pipeline to additionally compute the
# nuc_est_conv quantity, which is a measure of the degree of
# localisation of a signal in a cell.
params = params.to_dict()
leaf_to_change = params["extraction"]["tree"]["GFP"]["np_max"]
leaf_to_change.add("nuc_est_conv")
# Regenerate PipelineParameters
p = Pipeline(PipelineParameters.from_dict(params))
# Run pipeline
p.run()
# Error handling
except Exception as e:
print(e)
```
This example code can be the contents of a `run.py` file, and you can run it via
```bash
python run.py
```
in the appropriate virtual environment.
Alternatively, the example code can be the contents of a cell in a jupyter notebook.
......@@ -10,4 +10,7 @@
:recursive:
aliby
agora
extraction
postprocessor
logfile_parser
......@@ -4,11 +4,15 @@
contain the root `toctree` directive.
.. toctree::
:hidden:
Home page <self>
ALIBY reference <_autosummary/aliby>
extraction reference <_autosummary/extraction>
Installation <INSTALL.md>
Pipeline options <PIPELINE.md>
Contributing <CONTRIBUTING.md>
..
Examples <examples.rst>
Reference <api.rst>
..
.. include:: ../../README.md
:parser: myst_parser.sphinx_
#+title: Input/Output Stage Dependencies
Overview of what fields are required for each consecutive step to run, and
- Registration
- Tiler
- Requires:
- None
# - Optionally:
- Produces:
- /trap_info
- Tiler
- Requires:
- None
- Produces:
- /trap_info
#+title: Aliby metadata specification
Draft for recommended metadata for images to provide a standard interface for aliby. I attempt to follow OMERO metadata structures.
* Essential data
- DimensionOrder: str
Order of dimensions (e.g., TCZYX for Time, Channel, Z, Y, X)
- PixelSize: float
Size of pixel, useful for segmentation.
- Channels: List[str]
Channel names, used to refer to as parameters.
* Optional but useful data
- ntps: int
Number of time-points
- Date
Date of experiment
- interval: float
Time interval when the experiment has a constant acquisition time. If it changes depending on the position or it is a dynamic experiment, this is the maximum number that can divide all different conditions.
- Channel conditions: DataFrame
Dataframe with acquisition features for each image as a function of a minimal time interval unit.
- Group config: DataFrame
If multiple groups are used, it indicates the time-points at which the corresponding channel was acquired.
- LED: List[str]
Led names. Useful when images are acquired with the same LED and filter but multiple voltage conditions.
- Filter: List[str]
Filter names. Useful when images are acquired with the same LED and filter but multiple voltage conditions.
- tags : List[str]
Tags associated with the experiment. Useful for semi-automated experiment exploration.
- Experiment-wide groups: List[int]
List of groups for which each position belongs.
- Group names: List[str]
List of groups
* Optional
- hardware information : Dict[str, str]
Name of all hardware used to acquire images.
- Acquisition software and version: Tuple[str,str]
- Experiment start: date
- Experiment end: date
#+title: ALIBY roadmap
Overview of potential improvements, goals, issues and other thoughts worth keeping in the repository. In general, it is things that the original developer would have liked to implement had there been enough time.
* General goals
- Simplify code base
- Reduce dependency on BABY
- Abstract components beyond cell outlines (i.e, vacuole, or other ROIs)
- Enable providing metadata defaults (remove dependency of metadata)
- (Relevant to BABY): Migrate aliby-baby to Pytorch from Keras. Immediately after upgrade h5py to the latest version (we are stuck in 2.10.0 due to Keras).
* Long-term tasks (Soft Eng)
- Support external segmentation/tracking/lineage/processing tools
- Split segmentation, tracking and lineage into independent Steps
- Implement the pipeline as an acyclic graph
- Isolate lineage and tracking into a section of aliby or an independent package
- Abstract cells into "ROIs" or "Outlines"
- Abstract lineage into "Outline relationships" (this may help study cell-to-cell interactions in the future)
- Add support to next generation microscopy formats.
- Make live cell processing great again! (low priority)
* Potential features
- Flat field correction (requires research on what is the best way to do it)
- Support for monotiles (e.g., agarose pads)
- Support the user providing location of tiles (could be a GUI in which the user selects a region)
- Support multiple neural networks (e.g., vacuole/nucleus in adition to cell segmentation)
- Use CellPose as a backup for accuracy-first pipelines
* Potential CLI(+matplotlib) interfaces
The fastest way to get a gui-like interface is by using matplotlib as a panel to update and read keyboard inputs to interact with the data. All of this can be done within matplotlib in a few hundreds of line of code.
- Annotate intracellular contents
- Interface to adjust the parameters for calibration
- Basic selection of region of interest in a per-position basis
* Sections in need of refactoring
** Extraction
Extraction could easily increase its processing speed. Most of the code was not originally written using casting and vectorised operations.
- Reducing the use of python loops to the minimum
- Replacing nested functions with functional mappings (extraction be faster and clearer with a functional programming approach)
- Replacing the tree with a set of tuples and delegating processing order to dask.
Dask can produce its own internal tree and optimise the order of rendering the tree unnecessary
** Postprocessing.
- Clarify the limits of picking and merging classes: These are temporal procedures; in the future segmentation should become more accurate, making picking Picker redundant; better tracking/lineage assignemnt will make merging redundant.
- Formalise how lineage and reshaper processes are handled
- Non-destructive postprocessing.
The way postprocessing is done is destructive at the moment. If we aim to perform more complex data analysis automatically an implementation of complementary and tractable sub-pipelines is essential. (low priority, perhaps within scripts)
- Functionalise parameter-process schema. This schema provides a decent structure, but it requires a lot of boilerplate code. To transition the best option is probably a function that converts Process classes into a function, and another that extracts default values from a Parameters class. This could in theory replace most Process-Parameters pairs. Lineage functions will pose a problem and a common interface to get lineage or outline-to-outline relationships demands to be engineered.
** Compiler/Reporter
- Remove compiler step, and focus on designing an adequate report, then build it straight after postprocessing ends.
** Writers/Readers
- Consider storing signals that are similar (e.g., signals arising from each channel) in a single multidimensional array to save storage space. (mid priority)
- Refactor (Extraction/Postprocessing) Writer to use the DynamicWriter Abstract Base Class.
** Pipeline
Pipeline is in dire need of refactoring, as it coordinates too many things. The best approach would be to modify the structure to delegate more responsibilities to Steps (such as validation) and Writers (such as writing metadata).
* Testing
- I/O interfaces
- Visualisation helpers and other functions
- Running one pipeline from another
- Groupers
* Documentation
- Tutorials and how-to for the usual tasks
- How to deal with different types of data
- How to aggregate data from multiple experiments
- Contribution guidelines (after developing some)
* Tools/alternatives that may be worth considering for the future
- trio/asyncio/anyio for concurrent processing of individual threads
- Pandas -> Polars: Reconsider after pandas 2.0; they will become interoperable
- awkward arrays: Better way to represent data series with different sizes
- h5py -> zarr: OME-ZARR format is out now, it is possible that the field will move in that direction. This would also make us being stuck in h5py 2.10.0 less egregious.
- Use CellACDC's work on producing a common interface to access a multitude of segmentation algorithms.
* Secrets in the code
- As aliby is adapted to future Python versions, keep up with the "FUTURE" statements that enunciate how code can be improved in new python version
- Track FIXMEs and, if we cannot solve them immediately, open an associated issue
* Minor inconveniences to fix
- Update CellTracker models by training with current scikit-learn (currently it warns that the models were trained in an older version of sklearn )