Subscribe via RSS Feed

Setting up Apache Hadoop Multi – Node Cluster

June 1, 2013 1 Comment

We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future.

User creation and other configurations step -

  • We start by adding a dedicated Hadoop system user in each cluster.

$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser

  • Next we configure the SSH (Secure Shell) on all the cluster to enable secure data communication.

user@node1:~$ su – hduser
hduser@node1:~$ ssh-keygen -t rsa -P “”

The output will be something like the following:

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu

  • Next we need to enable SSH access to local machine with this newly created key:

hduser@node1:~$ cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Repeat the above steps in all the cluster nodes and test by executing the following statement

hduser@node1:~$ ssh localhost

This step is also needed to save local machine’s host key fingerprint to the hduser user’s known_hosts file.

Next we need to edit the /etc/hosts file in which we put the IPs and Name of each system in the cluster.

In our scenario we have one master (with IP and one slave (with IP

$ sudo vi /etc/hosts

and we put the values into the host file as key value pair. master slave
  • Providing the SSH Access

The hduser user on the master node must be able to connect

    1. to its own user account on the master via ssh master in this context not necessarily ssh localhost.
    2. to the hduser account of the slave(s) via a password-less SSH login.

So we distribute the SSH public key of hduser@master to all its slave, (in our case we have only one slave. If you have more execute the following statement changing the machine name i.e. slave, slave1, slave2).

hduser@master:~$ ssh-copy-id -i $HOME/.ssh/ hduser@slave

Try by connecting master to master and master to slave(s) and check if everything is fine.

Configuring Hadoop

  • Let us edit the conf/masters (only in the masters node)

and we enter master into the file.

Doing this we have told Hadoop that start Namenode and secondary NameNodes in our multi-node cluster in this machine.

The primary NameNode and the JobTracker will always be on the machine we run bin/ and bin/

  • Let us now edit the conf/slaves(only in the masters node) with


This means that, we try to run datanode process on master machine also – where the namenode is also running. We can leave master to act as slave if we have more machines as datanode at our disposal.

if we have more slaves, then to add one host per line like the following:



Lets now edit two important files (in all the nodes in our cluster):

a. conf/core-site.xml
b. conf/core-hdfs.xml

a) conf/core-site.xml

We have to change the fs.default.parameter which specifies NameNode host and port. (In our case this is the master machine)



…..[Other XML Values]


Create a directory into which Hadoop will store its data -

$ mkdir /app/hadoop

We have to ensure the directory is writeable by any user:

$ chmod 777 /app/hadoop

Modify core-site.xml once again to add the following property:


b) conf/core-hdfs.xml

We have to change the dfs.replication parameter which specifies default block replication. It defines how many machines a single file should be replicated to before it becomes available. If we set this to a value higher than the number of available slave nodes (more precisely, the number of DataNodes), we will start seeing a lot of “(Zero targets found, forbidden1.size=1)” type errors in the log files.

The default value of dfs.replication is 3. However, as we have only two nodes available (in our scenario), so we set dfs.replication to 2.

…..[Other XML Values]

  • Let us format the HDFS File System via NameNode.

Run the following command at master

bin/hadoop namenode -format

  • Let us start the multi node cluster:

Run the command: (in our case we will run on the machine named as master)


Checking of Hadoop Status -

After everything has started run the jps command on all the nodes to see everything is running well or not. 

In master node the desired output will be  -

$ jps

14799 NameNode
15314 Jps
14880 DataNode
14977 SecondaryNameNode

In Slave(s):

$ jps
15314 Jps
14880 DataNode

Ofcourse the Process IDs will vary from machine to machine.



It might be possible that Datanode might not get started in all our nodes. At this point if we see the

logs/hadoop-hduser-datanode-.log on the effected nodes with the exception - Incompatible namespaceIDs

In this case we need to do the following -

  1. Stop the full cluster, i.e. both MapReduce and HDFS layers.
  2. Delete the data directory on the problematic DataNode: the directory is specified by in conf/hdfs-site.xml. In our case, the relevant directory is /app/hadoop/tmp/dfs/data
  3. Reformat the NameNode. All HDFS data will be lost during the format perocess.
  4. Restart the cluster.


We can manually update the namespaceID of problematic DataNodes:

  1. Stop the problematic DataNode(s).
  2. Edit the value of namespaceID in ${}/current/VERSION to match the corresponding value of the current NameNode in ${}/current/VERSION.
  3. Restart the fixed DataNode(s).

In Running Map-Reduce Job in Apache Hadoop (Multinode Cluster), we will share our experience about Map Reduce Job Running as per apache hadoop example.

Contribution -

Working material of this article is primarily gathered by Debopom Mitra, who is also a J2ee Programmer from Year 2012 associated with us. Piyas De helped him to learn hadoop and the areas of troubleshooting to set-up hadoop  in multiple clusters and edited the article content finally.

Reference :

Enter your email address:

Delivered by FeedBurner