-
Couldn't load subscription status.
- Fork 0
My first stab at Hadoop
Once you have cloned this repository, this guide/tutorial will take you through the steps to set up Hadoop and run two simple word count problems.
My software configuration is running it off ubuntu-12.04.
- Virtual-box
- Vagrant
- Git
Note there are quite a few software dependencies such as Sun Java and of course Hadoop. But the cookbook already takes care of installing these.
- To download and install the vagrant virtual box image just run vagrant up
- To log into the box run vagrant ssh
For the following steps I have followed michael noll tutorials. I have found this guide to be really easy to follow and so I will just refer you to that site for steps.
The vagrant image comes with Sun Java 6 so we can skip this section.
Follow guide
sudo chown -R hduser:hadoop /etc/hadoop
sudo chown -R hduser:hadoop /usr/lib/hadoop
Follow guide
Should be auto generated.
If you are getting binding errors (maybe around 0.0.0.0) then follow the guide
Hadoop is already installed on the image. The install directory is:
- /usr/lib/hadoop
- /usr/lib/hadoop-hdfs
- /usr/lib/hadoop-mapreduce
- /usr/lib/hadoop-yarn
And the configuration files can be found at /etc/hadoop/conf
However to keep this guide in sync with Michael's (which installs hadoop in /usr/local/hadoop), we shall add a symbolic link. Check you are user vagrant (if you are user hduser, just type exit, maybe twice), then
sudo ln -s /usr/lib/hadoop /usr/local/hadoop
sudo ln -s /usr/lib/hadoop/libexec/ /usr/lib/hadoop-hdfs/libexec
sudo ln -s /etc/hadoop/conf /usr/local/hadoop/conf
No need updated attributes.
Should be auto-generated
Just type hadoop namenode -format
This is a little different to the way the guide starts it. As user hduser cd /usr/local/hadoop/sbin ./hadoop-daemon.sh start namenode ./hadoop-daemon.sh start datanode
This will startup a Namenode. To check it is running jps. You should see a Java process called namenode running.
Note if you get the following error: /usr/lib/hadoop-hdfs/bin/hdfs: line 34: /usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
Then one way to fix this is to create a symbolic link: sudo ln -s /usr/lib/hadoop/libexec/ /usr/lib/hadoop-hdfs/libexec
cd /usr/local/hadoop/sbin
./hadoop-daemon.sh stop namenode
- Create a file with some text mkdir ~/input echo "Could this guide actually be a useful guide" > ~/input/sample.txt
- Create a folder in hadoop to store this file
hdfs dfs -mkdir -p /user/hduser/sample
To check it worked
hdfs dfs -ls /user/hduser
should show output similar to
Found 1 items
drwxr-xr-x - hduser supergroup 0 2014-12-23 15:28 /user/hduser/sample
hduser@hadoop-ubuntu-12:
$ hdfs dfs -ls /user/hduser/sample/ hduser@hadoop-ubuntu-12:$ hdfs dfs -ls /user/hduser/ Found 1 items drwxr-xr-x - hduser supergroup 0 2014-12-23 15:28 /user/hduser/sample - Upload the file into the hadoop filesystem hdfs dfs -copyFromLocal ~/input /user/hduser/sample
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.11.0-1.jar wordcount /user/hduser/sample/input /user/hduser/sample-output
hdfs dfs -getmerge /user/hduser/sample-output ~/output
This will create a file output containing our list of words and the number of times they occured.
See the guide for details of how to view the output without extracting it from hadoops filesystem. Also it has a example using a larger test set.