The single namespace server based architecture of HDFS has recently raised doubts on the file system’s scalability and availability. In light of these issues, this work is an attempt to achieve:
- High availability for the Namenode, i.e, no single point of failure.
- Horizontal scalability for the Namenode, i.e, to handle heavier loads, one would need to only add more Namenodes to the system than having to upgrade a single Namenode’s hardware.
To do this, we’ve modified the HDFS Namenode to store metadata in MySQL Cluster as opposed to keeping it in memory. The work is still in-progress and experimental in nature.
- Multiple stateless Namenodes can be run. Clients and Datanodes should be statically partitioned to talk to a particular Namenode.
- Inodes, Blocks and Triplet data structures have been migrated to the DB.
- Performance of an individual Namenode is being limited by the FSNamesystem/FSDirectory write locks. These can be phased out once we migrate all of the Namenode’s data structures to MySQL Cluster, which might bring improvements with write-heavy workloads.
- If you would like to plug-in a DB other than MySQL cluster behind the Namenode, you will need to re-implement the InodeTableHelper and BlocksHelper classes. We’ll clean this up soon. :)
- For best results, set "dfs.dbconnector.num-session-factories” to a value that matches the number of Namenode worker threads, and make sure that the sum of this value across all Namenodes is <= the number of free mysqld/api slots in the MySQL cluster!
- Full statelessness.
- A protocol/service for clients and Datanodes to associate with a random Namenode (needs full statelessness to work correctly).
- A clean interface class to talk to the DB rather than the static helper methods we’re using currently.
- Linux 32/64 bit (should use 32/64 bit libndbclient.so accordingly)
- Maven 3
- A running MySQL cluster environment with at least 1 empty slots to allow client connections.
In the description of the steps needed, we will refer to the project’s root folder by using Scaling-HDFS-Namenode
- Build the Clusterj connector according to your architecture.
$: cd Scaling-HDFS-Namenode/clusterj
Edit
Scaling-HDFS-Namenode/clusterj/pom.xml
, changelinux_32
forlinux_64
if needed.
$: mvn clean install
-
Dump the SQL files in
Scaling-HDFS-Namenode/other/database_create_sqls.txt
into the MySQL cluster DB. This can be done using a MySQL client or a GUI tool like MySQL Workbench (whichever suits you better :) ) -
Build the hadoop-common, hadoop-mapreduce and hadoop-hdfs projects from the root folder (be aware that protobuf is needed to build the Hadoop project)
$: mvn package -DskipTests -Dtar -Pdist
This will build the project without running the tests for it. If this doesn’t work add the flag
-P-cbuild
to the maven package command
- Now move to the project target folder:
$: cd Scaling-HDFS-NameNode/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-0.24.0-SNAPSHOT
- Once there: First format the Namenode, the start it and also start the DataNodes:
$: bin/hdfs namenode -format
$: bin/hdfs namenode
$: bin/hdfs datanode
- Write some files into the file system:
$: bin/hdfs dfs -copyFromLocal /localFile /remoteName
- Enjoy!
- A presentation done in KTH and SICS for this project.
- Jim Dowling - Swedish Institute of Computer Science
- Wasif Malik *
- Ying Solomon *
- Lalith Suresh *
- Mariano Vallés *
*KTH - Royal Institute of Technology