Skip to content
Snippets Groups Projects
Forked from Swain Lab / aliby / aliby-mirror
256 commits behind the upstream repository.
Alán Muñoz's avatar
Alán Muñoz authored
refactor: Remove unused code and obsolete tests

See merge request aliby!9
b04ea1b2
History

logfile_parser

Simple log file parsing according to grammars specified in JSON

Basic usage

This package comes with three built-in grammars: 'multiDGUI_acq_format', 'multiDGUI_log_format' and 'cExperiment_log_format'. As an example, the 'multiDGUI_acq_format' grammar can be used to parse the included example using:

>>> from logfile_parser import Parser
>>> acq_parser = Parser('multiDGUI_acq_format')
>>> with open('examples/example_multiDGUI_acq.txt', 'r') as f:
...     parsed = acq_parser.parse(f)
>>> print(parsed)

The parsed output is a dict containing any fields satisfying the grammar.

Defining new grammars

Custom grammars should be written in json as a dictionary with keys specifying the information to extract from the log file.

The built-in grammars are useful examples or starting points for defining custom grammars. They can be found in the logfile_parser/grammars directory.

Let's start with a basic example of a log file that we might want to parse:

Date: 16 Apr 2020
Microscope: Batgirl
Experiment details:
My lengthy description of what will certainly be a great experiment.
This description takes multiple lines.
Tags:
User name, Project name, Experiment name

A basic grammar that just extracts the description of the experiment could be defined using:

{
    "description": {
        "trigger_startswith": "Experiment details:"
    },
    "stop": {
        "trigger_startswith": "Tags:",
        "type": "stop"
    }
}

This tells the parser to fill the "description" field of the parsed result with text on lines after that starting with the text "Experiment details:", and then tells the parser to terminate parsing whenever it encounters a line that starts with the text "Tags:". If you wanted it to include the trigger line, you would specify "skip": "false" as an additional property for "description".

If we also wanted to fill a "tags" field with the comma separated tags, we would just need to change the type to "list":

{
    "description": {
        "trigger_startswith": "Experiment details:"
    },
    "tags": {
        "trigger_startswith": "Tags:",
        "type": "list"
    }
}

To extract the microscope name, we can make use of the "regex" type:

{
    "microscope": {
        "trigger_startswith": "Microscope:",
        "type": "regex",
        "regex": "^Microscope:\\s*(.*)$"
    }
}

The expression found in the bracketed group will be stored in the "microscope" field of the parsed result.

Finally, to extract a date, we combine a "regex" with a "map" to map the text to a Python datetime object:

{
    "date": {
        "trigger_startswith": "Date:",
        "type": "regex",
        "regex": "^.*(\\d{2} [A-Z][a-z]{2} \\d{4})$",
        "map": "datetime:%d %b %Y"
    }
}

Putting this all together gives us the following grammar:

{
    "date": {
        "trigger_startswith": "Date:",
        "type": "regex",
        "regex": "^.*(\\d{2} [A-Z][a-z]{2} \\d{4})$",
        "map": "datetime:%d %b %Y"
    },
    "microscope": {
        "trigger_startswith": "Microscope:",
        "type": "regex",
        "regex": "^Microscope:\\s*(.*)$"
    },
    "description": {
        "trigger_startswith": "Experiment details:"
    },
    "tags": {
        "trigger_startswith": "Tags:",
        "type": "list"
    }
}

If this is saved to a file newgrammar.json we could parse the log file as listed above (say it is in logfile.txt) using the following:

>>> from logfile_parser import Parser
>>> parser = Parser('newgrammar.json')
>>> with open('logfile.txt', 'r') as f:
...     parsed = parser.parse(f)
>>> print(parsed)
{'date': datetime.datetime(2020, 4, 16, 0, 0), 'microscope': 'Batgirl',
'description': 'My lengthy description of what will certainly be a great
experiment.\nThis description takes multiple lines.', 'tags': ['User name',
'Project name', 'Experiment name']}