-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Do you have updated guidance for handling large datasets (>100 samples and/or >500,000 cells)?
A previous issue addressed this topic nearly 4 years ago (#108), but I'm hoping you have more insights now. The clearest recommendation from that discussion was: "for very large datasets with many samples, use large k~[50, 100] and small prop~[0.01, 0.1] to reduce neighborhood redundancy."
The main concerns are:
- Sample representation: How do I ensure each neighborhood captures enough cells from each sample? Should
kscale with sample size? - Computational constraints: Does adjusting
propaddress memory/computation limits, or are other strategies needed?
Metadata
Metadata
Assignees
Labels
No labels