Configurable columns?

The columns in the input data are highly variable, and it sometimes doesn't make so much sense to have hard-coded Column objects in each datasource. Maybe we could have a seperate config file per dataset:

# some_dataset_2020-12-06.csv
sample_id  this  that  other
CCP0001    0.1   4     some description
...


# some_dataset.yaml
---
description: 'Some dataset'
columns:
  - name: patient_id
    description: Clean ISARIC patient ID
    type: string
    patient_id: true

  - name: sample_id
    description: ISARIC sample ID
    type: string
    dirty_sample_id: true

  - name: clean_kit_id
    description: Clean ISARIC kit ID
    type: string
    clean_kit_id: true

  - name: this
    type: float

  - name: that
    type: integer

  - name: other
    type: string

Then linkage_config.yaml would have:

---
data_sources:
  some_dataset:
    input_data: path/to/some_dataset_????-??-??.csv
    config: path/to/some_dataset.yaml

This way, adding new datasets would just be a config change. We'd need some way of defining custom behaviour, though.

Further down the line with auto-file syncing, we could then potentially let the users drop their data into SharePoint and supply their own config file alongside, and we just connect it up to the pipeline.