Namenode High Availability
Environment Variable setup for HDFS client
Let's assume that you have downloaded Apache Hadoop 2.7.7 and unpacked it under
In order to set the environment variables in a permanent way (so that every time you
login, they are defined), we can use shell profiles. Each shell may use different file
but in this tutorial we assume Linux CentOS. Open
~/.bashrc file using
an editor (you can use command line editors such as
or editors in GUI such as
The first environment variable is
PATH which helps you to run Hadoop
commands without providing the absolute path. Add the following line to the file:
The second environment variable is
JAVA_HOME that points to the installation
directory of your JDK/JRE. Add the following line to the file
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
If you are a Mac user, use
export JAVA_HOME=$(/usr/libexec/java_home -v1.8)
The next environment variable is the one that indicates where are the configuration
files of Hadoop and it is
Let's assume that you have download the configuration files from the namenode and put
Add the following lines to the file
export HADOOP_HOME=/home/user/opt/hadoop-2.7.7 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/my-cluster
Note that the
HADOOP_HOME is not necessary but just a good practice.
Save the file and run
source ~/.bashrc. This is to setup your environment
variables for this session. From now on, whenever you login using user
all these environment variables are set.
If everything is set properly, the following command should return the content of the root directory of your Hadoop cluster.
hadoop fs -ls /
Impersonate Hadoop user
Hadoop commands and library uses current user that is logged into the client as the
effective user (if you are using a Linux machine, just run
to see that).
If this user is different from the user that has permission to the HDFS directories and
files, you might need to impersonate the user with permission. To achieve this, you can
HADOOP_USER_NAME environment variable.
Introduction to Zookeeper
The code related to this section can be found on GitHub.
Installing Apache Zookeeper
Apache Zookeeper distribution includes both server and CLI binaries. This guide covers only Linux environment. For Mac and Windows, there are a couple of tweaks to do.
Download Apache Zookeeper
Go to Apache Zookeeper Release page and follow the steps to download the revision that you need. You will get a TAR file.
mkdir -p ~/opt mv ~/Downloads/zookeeper-*.tar.gz ~/opt/ cd ~/opt tar -xf zookeeper-*.tar.gz mv ./zookeeper-*/ ./zookeeper
~/.bashrc file and add the following lines.
export ZOOKEEPER_HOME=~/opt/zookeeper export PATH=$PATH:$ZOOKEEPER_HOME/bin
Then finish it by running
Run Zookeeper Server locally
$ZOOKEEPER_HOME/conf/zoo.cfg file (if doesn't exist, create one) and add
the following lines
tickTime=2000 dataDir=$ZOOKEEPER_HOME/data clientPort=2181
Run the server
Run Zookeeper CLI
In order to run the Zookeeper CLI, just execute
zkCli.sh -server localhost:2181.