We may run into serious memory issues for large data sets. Namely, with the initial sampling of makeNhoods. I have a suspicion that very sparse initial sampling might be sub-optimal on large data sets, and this should be ~0.3. However, on large data sets (~200,000 cells, ~140 donors), the memory explodes.