WP1 Improved parallelisation: speeding up Tadah!
Objectives and success metrics:
- Improve parallelisation of machine learning process by replacing OpenMP EIGEN library calls with MPI ScaLAPACK. This will complement the already MPI-parallelized hyperparameter optimisation, resulting in a two-level parallelism. Success metrics: Significantly improved Tadah!MLIP parallel scaling for single machine-learning step with fixed hyperparameters.
The hyperparameter loop in Tadah!MLIP is essentially a minimisation problem on a rough surface using the MaxLIPO+TR algorithm. We define a global objective function which includes both the validation data and macrostate properties such as lattice parameters, elastic constants etc. This can be trivially parallelised to find optimal parameter sets in different regions of hyperparameter space. However, such parallelisation is only effective across dozens of processes – beyond that the same optimal hyperparameters are found multiple times, so the apparent speed-up is not useful. Moreover, every calculation of MLIPs with fixed hyperparameters requires linear algebra on a system the size of the training data: this too must be parallelised. Therefore, on ARCHER2, the hyperparameter search is normally parallelised across nodes rather than cores. Each independent MLIP training process currently uses the EIGEN library for linear algebra and OpenMP for parallelisation of expensive routines. However, EIGEN only supports OpenMP for a limited number of algorithms, causing the current bottleneck during the final optimisation step. We will rewrite Tadah! using ScaLAPACK to fully utilise MPI for the linear algebra. This will generate a hybrid code architecture with an inter-node parallelisation over hyperparameter settings combined with an intra-node MLIP training process.
- MODELS · Get rid of Eigen from Empirical Kernel Map
- MODELS · Get rid of Eigen from Evidence Approximation
- Tadah.MLIP · Replace Eigen regression
- Tadah.MLIP · New matrix class for Tadah required by LAPACK
- Tadah.MLIP · Replace eigen data structures