Skip to content

Extractor properly handles number of time points from metadata

Arin Wongprommoon requested to merge hotfix-issue-020 into master

When an extractor is defined (based on the Extractor object) without specifying time points (the tps argument), the program raises a TypeError.

The definition of tps when it is not otherwise defined in .run() is based on the assumption that the number of time points is stored as a scalar in the metadata of the HDF5 file (self.meta["time_settings/ntimepoints"]). This is wrong -- the number of time points is stored as a one-element numpy array. The reason it is stored in this way is down to the behaviour of the log file parser (https://git.ecdf.ed.ac.uk/swain-lab/aliby/agora/-/blob/master/logfile_parser/logfile_parser.py) and how it processes tables as defined in the JSON file that specifies the grammar.

This issue does not occur when the whole pipeline is run because the scalar element of the numpy array that stores the number of time points is properly extracted in https://git.ecdf.ed.ac.uk/swain-lab/aliby/aliby/-/blob/master/aliby/pipeline.py, line 89.

The alternative way to fix it was to fix the behaviour of the log file parser. The 'proper' way is to store the number of time points as scalars, not an array with one element. However: (a) It is JSON-parsing code that I am not familiar with, and (b) There is a risk of breaking parts of the code base that relies on other parts of the metadata.

So, this is essentially a quick-and-dirty fix, and it should be okay given that time_settings/ntimepoints only appears twice in the project. However, if we decide to handle the metadata differently (unlikely given the fixed nature of our log files, and we've kind of exhausted all the useful information already), we will be in for adding [0]s at various places.

Fixes issue #20 (closed).

Merge request reports