Library for neural data analysis with the Spark cluster computing framework
Spark is a powerful new framework for cluster computing, particularly well suited to iterative computations; see the project webpage. Thunder is a family of analyses for finding structure in high-dimensional spatiotemporal neural imaging data (e.g. calcium) implemented in Spark.
To run these functions, first install Spark and scala.
For python functions, call using pyspark:
SPARK_HOME/pyspark ica.py local data/ica_test.txt results 4 4
For scala functions, build and run in sbt:
sbt package
sbt "run local data/hierarchical_test.txt results.txt"
All functions use neural data as input, and some additionally use information about external covariates (e.g. stimuli or behavioral attributes).
All functions use the same format for neural data: a text file, where the rows are voxels and the columns are time points. The first three entries in each row are the x,y,z coordinates of that voxel, and the subsequent entries are the neural signals for that voxel at each time point. For example, a data set with 2x2x2 voxels and 8 time points might look like:
1 1 1 11 41 2 17 43 24 56 87
1 2 1 ...
2 1 1 ...
2 2 1 ...
1 1 2 ...
1 2 2 ...
2 1 2 ...
2 2 2 ...
Subsets of voxels (e.g. different imaging planes) can be stored in separate text files within the same directory, or all in one file.
Many functions make use of covariates, and there is a common input format: a text file of 0s and 1s, where the rows are variables, and the columns are time points. For example, if eight orientations were presented in random order for the example above, the file would be:
1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0
0 0 0 1 0 1 0 0
For parameteric models (e.g. tuning), also provide a text file with the stimulus value corresponding to each row, like this:
0 45 90 135 180 225 270 315
pca - principal components analysis
empca - iterative PCA using EM algorithm
ica - independent components analysis
cca - canonical correlation analysis
rpca - robust PCA
fourier - fourier analysis on time series data
query - get average time series from voxels with desired indices
kmeans - k-means clustering
bisecting - divisive hierarchlal clustering using bisecting k-means
hierarchical - agglomerative hierachical clustering
mantis - streaming analysis of neuroimaging data (prototype)
scala versions of all functions