Acknowledgements
We're grateful for the support from the US National Science Foundation for grant number NSF OCI-0946441 and from NVIDIA Corp. for equipment donations under the CUDA Fellows Program. LAB would like to acknowledge the hospitality of the Berkeley Institute of Data Science (BIDS), where this paper was written.