Skip to content
Brandon Holt edited this page May 11, 2013 · 19 revisions

Apache Hadoop includes a distributed file system called "HDFS" which we plan to use in some incarnation in Grappa.

For now, we have the latest stable version of Hadoop downloaded from hadoop.apache.org, installed at: /sampa/share/hadoop-1.0.3

How to get it back up and running

For a variety of reasons, our HDFS stuff dies sometimes or needs kicking to get it working again. Here's the list of commands I typically run to reboot it:

# from 'n71.sampa'
/sampa/share/hadoop-1.0.3/bin/stop-dfs.sh
/sampa/share/hadoop-1.0.3/bin/start-dfs.sh
ssh n69
# from 'n69'
/sampa/share/polysh/polysh.py `sinfo -p grappa -o '%n' -h`
# from 'polysh' prompt:
ready (12)> sudo /sampa/share/hadoop-1.0.3/bin/stop-fuse-dfs.sh
ready (12)> :hide_password
<enter password>
ready (12)> sudo /sampa/share/hadoop-1.0.3/bin/start-fuse-dfs.sh

Environment variables:

JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
HADOOP_HOME=/sampa/share/hadoop-1.0.3
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server:$HADOOP_HOME/c++/Linux-amd64-64/lib:$HADOOP_HOME/lib/native/Linux-amd64-64
CLASSPATH=$(echo $HADOOP_HOME/*.jar | tr ' ' ':'):$(echo $HADOOP_HOME/lib/*.jar | tr ' ' ':'):$HADOOP_HOME/conf

Building

Include/library flags:

-I$(HADOOP_HOME)/c++/Linux-amd64-64/include
-I$(HADOOP_HOME)/src/c++/libhdfs
-I$(JAVA_HOME)/include
-L$(HADOOP_HOME)/c++/Linux-amd64-64/lib
-L$(JAVA_HOME)/jre/lib/amd64/server
-lhdfs
-ljvm

Daemons

Configuration files for HDFS are in $(HADOOP_HOME)/conf.

  • masters: n71.sampa
  • slaves: [grappa nodes]?
  • core-site.xml, mapred-site.xml, hdfs-site.xml: Configure various things like:
    • where daemons run (n71 & all Grappa nodes)
    • block size, amount of memory for caching, etc.
    • amount of duplication (1)
    • where hadoop files go on each of the 'slaves' (/scratch/hadoop.{name,data})

Startup/teardown

> cd $(HADOOP_HOME)
# (note: if things are going wrong, must first physically delete HDFS data on all nodes)
    # start shell on all grappa nodes (assuming that's where HDFS's data lives)
    > clush -bw `sinfo -p grappa -o '%N' -h`
    > rm -rf /scratch/hadoop.data
    > quit
# set up & format HDFS (need to do this the first time)
> bin/hadoop namenode -format
# ssh to master node
> ssh n71.sampa  
# start dfs daemons (should see nameservers & dataservers start up)
# note: this must be called from the master node or else the nameserver will be running in the wrong place
> bin/start-dfs.sh
# shutdown
> bin/stop-dfs.sh

Interact with FS on cmdline

You can't interact with the HDFS stuff directly, so you have to go through the Hadoop executable. Note: it seems to work best to give an "absolute" path for HDFS destinations ("/" refers to the root of HDFS's filesystem).

# 'ls'
> $(HADOOP_HOME)/bin/hadoop dfs -ls /grappa_ckpts
# Copy files into HDFS, they should get distributed across 
> $(HADOOP_HOME)/bin/hadoop dfs -put <localfile> <dst>
# List the rest of the available filesystem commands
> $(HADOOP_HOME)/bin/hadoop dfs

WebHDFS

  • Allows for access over HTTP
  • Built into hadoop v1.0.3 and integrated with the DFS NameNodes and DataNodes, so no extra servers need to be fired up
  • To enable, add the following to conf/hdfs-site.xml (and restart dfs servers):
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

Fuse DFS

For Hadoop v1.0.3, the code for fuse-dfs can be found in /src/contrib/fuse-dfs.

Building:

# in $HADOOP_HOME/src/contrib/fuse-dfs
# make sure `$JAVA_HOME` and `$HADOOP_HOME` env. variables are set correctly
> ./configure LDFLAGS="-L$HADOOP_HOME/c++/Linux-amd64-64/lib -L$JAVA_HOME/jre/lib/amd64/server" CFLAGS="-I$HADOOP_HOME/src/c++/libhdfs"
> make PERMS=1
# executable `fuse_dfs` should be built in fuse-dfs/src

Running:

  • Find fuse_dfs_wrapper.sh in fuse-dfs/src, edit the paths in there to reflect your system
  • Find out which port the namenode is listening on. I think the default for v1.0.3 is 8020, but you can also check in the NameNode's log file by searching for this line:
2012-09-07 11:09:28,854 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: n71.sampa/10.1.2.71:8020
> grep -R 'Namenode up at' $HADOOP_HOME/logs/
  • Test out your configuration:
# in $HADOOP_HOME/src/contrib/fuse-dfs/src
> sudo ./fuse_dfs_wrapper.sh dfs://<namenode-hostname>:<namenode-port> <mount-point>
# (for example)
> mkdir /scratch/hdfs
> sudo ./fuse_dfs_wrapper.sh dfs://n71.sampa:8020 /scratch/hdfs
# ignore the warning 'fuse-dfs didn't recognize /scratch/hdfs,-2', it apparently says that no matter what
# check that it's working:
> ls /scratch/hdfs
# I have made some simple scripts to start and stop fuse-dfs nodes when they go down.
# You'll know they've gone down if they say "Transport endpoint is not connected."
# To restart, just ssh to the node and run:
> sudo /sampa/share/hadoop-1.0.3/bin/stop-fuse-dfs.sh
> sudo /sampa/share/hadoop-1.0.3/bin/start-fuse-dfs.sh

Debugging

  • Fuse logs things in /var/log/messages, so check there for messages
    • ERROR: could not connect to n71.sampa:50070 fuse_impls_getattr.c:37 meant that I had the wrong port
  • Input/output error (ls: cannot access /scratch/hdfs: Input/output error)
    • Might have the wrong port. Check the NameNode log for the port (see above)
    • Kill the ./fuse_dfs process
    • Clean up the mounted fs: sudo umount -l /scratch/hdfs (if you don't, you'll get errors that say Transport endpoint is not connected)

Automount

Should be able to add the following to /etc/fstab, if the wrapper_script is on your path and named fuse_dfs.

fuse_dfs#dfs://<namenode>:<port> /mountpoint fuse usertrash,rw 0 0