Our repeated_fisher_yates function is used in construction of SASOs. In some cases, constructing the SASO can be a bottleneck. We can speed things up dramatically by adding parallelism. Because of how repeated_fisher_yates manages workspace it should be straightforward to have a wrapper function call it in parallel to handle blocks of short-axis-vectors. This is related to #111.