Table of Contents
By far the best tutorial for you to get started with Hadoop installation.
Source : http://alanxelsys.com/2014/02/01/hadoop-2-2-single-node-installation-on-centos-6-5/
Introduction
This HOWTO covers Hadoop 2.2 installation with CentOS 6.5. My series of tutorials are meant just as that – tutorials. The intent is to allow the user to gain familiarity with the application and should not be construed as any type of best practices document to be used in a production environment and as such performance, reliability and security considerations are compromised. The tutorials are freely available and may be distributed with the proper acknowledgements. Actual screenshots of the commands are used to eliminate any possibility of typographical errors, in addition long sequences of text are placed in front of the screenshots to facilitate copy and paste. Command text is printed using Courier font. In general the document will only cover the bare minimum of how to get a single node cluster up and running with the emphasis on HOW rather than WHY. For more in depth information the reader should consult the many excellent publications on Hadoop such as Tom White’s – Hadoop: The Definitive Guide, 3rd edition and Eric Sammer’s – Hadoop Operations along with the Apache Hadoop website.
Please consult www.alan-johnson.net for an online version of this document.
Prerequisites
- CentOS 6.5 installed
Machine configuration
In this HOWTO a physical machine was used; but for educational purposes Vmware Workstation or Virtualbox (https://www.virtualbox.org/) would work just as well. The screenshot below shows acceptable VM machine settings for VMWare.
**Note an additional Network Adapter and physical drive have been added. Memory allocation is 2GB which is sufficient for the tutorial.
**

User configuration
If installing CentOS from scratch then select a user **Note the initial configuration is done as user root.  => passwd hadoopuser Now make hadoopuser a member of hadoopgroup. usermod –g hadoopgroup hadoopuser  Verify by issuing the id command.ss id hadoopuser  The next step is to give hadoopuser access to sudo commands. Do this by executing thevisudo command and adding the highlighted line shown below.  Reboot and now log in as user _hadoopuser. Setup ssh for password-less authentication using keys. ssh-keygen -t rsa -P ”  Next change file ownership and mode. sudo chown hadoopuser ~/.ssh sudo chmod 700 ~/.ssh sudo chmod 600 ~/.ssh/id_rsa  Then append the public key to the file authorized_keys sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys  Change permissions. sudo chmod 600 ~/.ssh/authorized_keys  Edit /etc/ssh/sshd_config  Set PasswordAuthentication to no and allow empty passwords  Verify that login can be accomplished without requiring a password.  It is recommended to install the full openJDK package to take advantage of some of the java tools, yum install java-1.7.0-openjdk*  After the installation verify the java version java -version  The folder etc/alternatives contains a link to the java installation; perform a long listing of the file to show the link and use it as the location for JAVA_HOME.  Set the JAVA_HOME environmental variable by editing ~/.bashrc  From the Hadoop releases page <http: //hadoop.apache.org/releases.html> , download hadoop-2.2.0.tar.gz from one of the mirror sites.  Next untar the file tar xzvf hadoop-2.2.0.tar.gz  Move the untarred folder sudo mv hadoop-2.2.0 /usr/local/hadoop  Change the ownership with sudo chown -R hadoopuser: hadoopgroup /usr/local/hadoop  Next create namenode and datanode folders mkdir -p ~/hadoopspace/hdfs/namenode mkdir -p ~/hadoopspace/hdfs/datanode  Next edit ~/.bashrc to set up the environmental variables for Hadoop # User specific aliases and functions export HADOOP_INSTALL=/usr/local/hadoop Now apply the variables.  There are a number of xml files within the Hadoop folder that require editing which are: The files can be found in /usr/local/hadoop/etc/hadoop/. First copy themapred-site template file over and then edit it. mapred-site.xml  Add the following text between the configuration tabs. yarn-site.xml Add the following text between the configuration tabs. core-site.xml Add the following text between the configuration tabs. hdfs-site.xml Add the following text between the configuration tabs. **Note other locations can be used in hdfs by separating values with a comma, e.g. Add an entry for JAVA_HOME export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/ => Actually you don’t need to configure JAVA_HOME here since you’ve already done that in ~/.bashrc Next format the namenode.  . . . Issue the following commands. start-dfs.sh  Issue the jps command and verify that the following jobs are running:  At this point Hadoop has been installed and configured A number of test files exist that can be used to benchmark Hadoop. Entering the command below without any arguments will list available tests.  The TestDFSIO test below can be used to measure read performance – initially create the files and then read: hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100 hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 100   The results are logged in TestDFSIO_results.log which will show throughput rates:  During the test run a message will be printed with a tracking url such as that shown below:  The link can be selected or the address can be pasted into a browser.  Another test is mrbench which is a map/reduce test. hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar mrbench –maps 100  Finally the test below is used to calculate pi. The first parameter refers to the number of maps and the second is the number of samples for each map. hadoop jar $HADOOP_INSTALL/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20  . . .  **Note accuracy can be improved by increasing the value of the second parameter. Invoking a command without any or insufficient parameters will generally print out help data”  hdfs dfsadmin –help  . . .  hadoop version  The location for checking the Namenode status is at localhost: 50070/. This web page contains status information relating to the cluster.  There are also links for browsing the filesystem.  Logs can also be examined from the NameNode Logs link.  . . . The secondary namenode can be accessed using port 50090  Comprehensive documentation can be found at the Apache website or locally using a browser by pointing it at $HADOOP_INSTALL/share/doc/Hadoop/index.html/  Feedback, corrections and suggestions are welcome, as are suggestions for further HOWTOs.
**
to enable log-in for this one.
_Setting up ssh
as shown below in the extract of the file.Installing and configuring java
Installing openJDK
Installing Hadoop
Downloading Hadoop
Configuring Hadoop
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export PATH=$PATH: $HADOOP_INSTALL/sbin
export PATH=$PATH: $HADOOP_INSTALL/bin
**
**file: /home/hadoopuser/hadoopspace/hdfs/datanode, .disk2/Hadoop/datanode, . .
**
hadoop-env.sh
start-yarn.shTesting the installation
**Working from the command line
hdfs commands
hadoop commands
Web Access
On line documentation