Skip to content

perf(postproc): align circumvents pandas indexing in a for loop

Arin Wongprommoon requested to merge issue-030 into dev

WHY IS THIS CHANGE NEEDED?:

  • align process is slow

HOW DOES THE CHANGE SOLVE THE PROBLEM?: (see #30 (comment 105095))

  • df_shift() slows down the align process because of it uses .loc in a for loop. so instead of .loc in a for loop, use numpy arrays instead.
  • argsort the list that indicates how many elements you will shift each row
  • convert the pandas array into a numpy matrix
  • Generate N matrices from your big matrix where N is the number of different shift values
  • Roll each of these matrices according to their shift value
  • Concatenate them and return an index by combining them with the existing multi-indices and columns.

REFERENCES:

Merge request reports