This repository hosts the code for our paper accepted in IEEE DSAA 2019: "Lightweight and Scalable Particle Tracking and Motion Clustering of 3D Cell Trajectories". The project provides a robust pipeline for tracking and clustering cell trajectories using various implementations, including serial, local Dask, and Dask on a cluster.
This repository features three versions of the code:
- 🔗 Serial Version: A straightforward implementation for running on a single machine.
- ⚙️ Local Dask Version: A parallelized version that leverages Dask for local computation.
- ☁️ Dask on Cluster Version: A scalable implementation for running on distributed computing clusters.
- Python 3.6
- Scientific Computing Libraries: NumPy, SciPy, scikit-learn, matplotlib
- Computer Vision Tools: OpenCV 3.1 for detection and tracking
- Parallel and Distributed Computing:
- Dask Arrays & DataFrames for scalable processing
- Dask-ML & Dask-NDArray for distributed machine learning
- Joblib & multiprocessing for local parallel execution
The methods and results of this project are detailed in our paper, which can be accessed through the following links:
- IEEE Download Page: Link to IEEE
- arXiv Preprint: Link to arXiv
For those interested in citing this work, please use the following format:
@INPROCEEDINGS{8964177,
author={Sedigh Fazli, Mojtaba and Stadler, Rachel V. and Alaila, BahaaEddin and Vella, Stephen A. and Moreno, Silvia N. J. and Ward, Gary E. and Quinn, Shannon},
booktitle={2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},
title={Lightweight and Scalable Particle Tracking and Motion Clustering of 3D Cell Trajectories},
year={2019},
volume={},
number={},
pages={412-421},
keywords={Large scale 3D Cell Tracking;Motion Trajectories;Geodesic distance;Spectral Clustering;Toxoplasma gondii;Martin Distance},
doi={10.1109/DSAA.2019.00056}
}
Ensure that all image data is stored locally on your machine. Update the following parameters in the code before execution:
folders
: Path to the directories containing image slices.sample_address
: Path to a sample TIFF image to extract initial parameters.cluster_numbers
: The number of clusters to use in the analysis.
Additional parameters can be modified within the script based on your specific application needs.
After setting up the parameters, run the code and measure the wall time with:
time python <The_code_filename.py>
This command will provide a performance benchmark by displaying the total execution time.
We opted to include all functions in a single script to facilitate seamless wall-time computation for the entire pipeline. This structure allows us to efficiently test and compare running times across different implementations.
Before running the code, ensure you have the following libraries installed:
pip install numpy scipy scikit-learn matplotlib opencv-python dask dask-ml joblib
For the cluster version, additional Dask configurations may be required.
- Cluster Configuration: Adjust Dask scheduler and client settings to optimize performance on distributed systems.
- Memory Management: For large datasets, tweak Dask's memory handling parameters to prevent overflow.
For questions, feedback, or collaboration opportunities, feel free to reach out:
🚀 Dive into the code and explore the powerful capabilities of lightweight and scalable 3D cell tracking!