The goal of the project is to explore how to "stitch together" 2 stacks
of images capturing nerve cell structure in different regions of a
brain.
In particular, the program applies specific set of algorithms (which can be configured by users) on given source and target image stacks comprising of thousands of feature points. These algorithms will -
- Detect overlap between the 2 stacks and generate matching correspondences (feature points from both stacks).
- Generate score (indicating confidence in the match).
Currently, the program only supports stitching 2 image stacks but can be extended to stitch thousands of such stacks in future.
- Linux is supported as a development and production platform
- Java 7 or above. You can download latest Oracle JDK from - http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
- Apache Hadoop 2.7.3
- Apache Maven 3.3.9 or above
- Git client
Please make sure that commands - git, mvn, hadoop and java
can be executed directly from the command line.
If not, please add them to the PATH environment variable.
Following are the steps to setup and build the project Linux environment using CLI (verified on Ubuntu 16.04):
- Check out the project using -
git clone https://github.com/ankur-shanbhag/ImageStitcher.git
- Steps to build the project
- Build the project -
./build.sh
- Building the project generates a self contained
target/image-stitcher-1.0.tar.gz
file. This file can be shipped to any machine which meets specified software requirements. - Once the
image-stitcher-1.0.tar.gz
file is copied to required location, untar the file using command
tar -zxf image-stitcher-1.0.tar.gz
. Then docd image-stitcher-1.0
- Open conf/configuration.properties and add values for all the properties as per instructions in the file.
Eg: hadoop.home=/usr/local/hadoop-2.7.3/
The properties related to GnuPlot are optional and only needed by developers for debugging (discussed below)
- Run the project using - `./image-stitcher.sh run local.output.path=[output directory path to be created]
NOTE:
- Some algorithms may need input such as clustering configuration parameters (sample DBSCAN clustering configuration can be found at sample-data/input). In such cases input can be passed when you run the algorithm using param
local.input.path=[input path on local machine]
- Similarly there are many such configurable parameters than can be controlled by the user. Users can also configure classes to perform specific task.
Example - To control number of input records passed to a map task usenum.input.lines.mapper=2000
. For extensive list of all the configurable parameters refer developers guide. - If any of the above specified steps do not work, verify all the required softwares are installed with specified versions.
- Start EMR cluster with Hadoop 2.7 or above.
- SSH to AWS master node. For steps here - http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-ssh.html
- Create S3 bucket (recommended for working on EMR) or use HDFS as a distributed storage. To use HDFS refer - http://stackoverflow.com/questions/22343918/how-do-i-use-hdfs-with-emr
- Generate
image-stitcher-1.0.tar.gz
file on your local machine using the steps specified above. This file can then be copied on AWS master usingscp
command.
Eg:scp -i [keyFile].pem ~/ImageStitcher/target/image-stitcher-1.0.tar.gz hadoop@[emr-master-dns]:/home/hadoop
- Optionally, you can copy input files, source and target image files on EMR master using
scp
. - The steps to untar and deploy the project on EMR server is same as the local environment.
NOTE: By default, Hadoop on EMR machines is installed at location /usr/lib/hadoop
. Also, GnuPlot are not supported on EMR.
Optionally, developers can debug the output generated by the program by plotting 3D scatter plot on their local machines. However, we do not support GnuPlot feature on EMR instances. For users working with EMR, copy the generated output files using scp
to their local enironment.
Following are the instructions to use GnuPlot on local machine:
- Install GnuPlot -
- For Debian based distributions (like Ubuntu) use -
sudo apt-get install gnuplot-x11
- For RHEL, Fedora, CentOS do following -
sudo yum update
sudo yum install gnuplot
- Note: For some reason if below intructions doesn't work, enter gnuplot shell using command
gnuplot
and change the terminal tox11
using commandset terminal x11
- For Debian based distributions (like Ubuntu) use -
- Before you can use gnuplot, you must generate output using -
./image-stitcher.sh run
command - Make sure you have correctly set the parameter
gnuplot.process.path
in conf/configuration.properties files. - The output directory may have multiple part files generated by the mapreduce program (depending on number of lines in the input file). Each part file contains a list of matching correspondences (from source to target) generated based on the one line of input configuration parameters. To visualize these points on a 3D scatter plot, execute
./image-stitcher.sh plot
command.