FEATURE: Mutual information process
Release notes
Mutual information as a process.
Problem to solve
Users could import MIdecoding from https://git.ecdf.ed.ac.uk/pswain/mutual-information/-/blob/master/MIdecoding.py and use the estimateMI() function to estimate the mutual information between multiple sets of data, using a list of arrays as input.
Converting this function into a process will integrate this function into aliby-post and its process-parameters paradigm. It will make code simplified and consistent with other post-processes, and will eliminate the need of importing an external module. Users can then use the mutual information process to compute the mutual information, based on a DataFrame input -- much more easy to manipulate than a list of arrays.
Proposal
The mutual information process will follow the process-parameters paradigm as in https://git.ecdf.ed.ac.uk/swain-lab/aliby/agora/-/blob/master/agora/abc.py.
The process class will be modelled after the estimateMI() function:
- The input will be one
pandas DataFramerather than a list of arrays as defined by the existingdataparameter. ThisDataFramewill contain indices/multi-indices that define the data labels. - The output will be the existing
resparameter, an array that contains the summary statistics. Optionally, I could modify it so that it is a named tuple or dictionary, so that it is obvious which summary statistic is which. - I don't expect to modify the internals by much, except for slight changes to the data types of the I/O as described above. The code is already well-organised and has lots of comments.
The parameters will be modelled after the other parameters in the estimateMI() function:
- These parameters include
overtime,n_bootstraps,c1,Crange,gammarange. - Optionally include verbose -- I'm not sure if printing stuff fits with the paradigm of the rest of the processors. We could change its behaviour to write to metadata or a log file?
Testing
- Convert https://github.com/swainlab/mi-by-decoding/blob/master/matlab/ExampleScript/fig1_sfp1_replicates.json to CSV files (or, I can use JSON as-is) into test input data.
- Use this test input data to generate reference output data using
estimateMI()outside thealibyecosystem. Write test scripts (pytest) under https://git.ecdf.ed.ac.uk/swain-lab/aliby/postprocessor/-/tree/master/tests to compare the output from the proposed mutual information process with this reference output data. The test should success if there is an exact or approximate match. - This will automate development, using our CI/CD pipeline.
Documentation
- Copy from the original function. It is already well-documented.
Intended users
People who benefited from the mutual information code.
Metrics
...