在Ubuntu中安装Hadoop

在装Hadoop之前首先需要：

1.java1.6.x 最好是sun的，1.5.x也可以

2.ssh

安装ssh

$ sudo apt-get install ssh

$ sudo apt-get install rsync

下载Hadoop

从http://Hadoop.apache.org/core/releases.html 下载最近发布的版本

最好为Hadoop创建一个用户:

比如创建一个group为Hadoop user为Hadoop的用户以及组

$ sudo addgroup Hadoop

$ sudo adduser --ingroup Hadoop Hadoop

解压下载的Hadoop文件，放到/home/Hadoop目录下名字为Hadoop

配置JAVA_HOME:

  gedit ~/Hadoop/conf/Hadoop-env.sh

将Java代码

1. # The java implementation to use.  Required. 

2. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun 

 # The java implementation to use.  Required.

 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun

修改成java的安装目录：

# The java implementation to use. Required.

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15

现在可以使用单节点的方式运行：

$ cd Hadoop

$ mkdir input

$ cp conf/*.xml input

$ bin/Hadoop jar Hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

$ cat output/*

Pseudo-distributed方式跑:

配置ssh

$ su - Hadoop

$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/Hadoop/.ssh/id_rsa):

Created directory '/home/Hadoop/.ssh'.

Your identification has been saved in /home/Hadoop/.ssh/id_rsa.

Your public key has been saved in /home/Hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 Hadoop@Ubuntu

让其不输入密码就能登录：

  Hadoop@Ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

使用：

  $ ssh localhost

看看是不是直接ok了。

Hadoop配置文件：

  conf/core-site.xml

Java代码

   1. <?xml version="1.0"?> 

   2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 

   3.  

   4. <!-- Put site-specific property overrides in this file. --> 

   5.  

   6. <configuration> 

   7.    <property> 

   8.     <name>Hadoop.tmp.dir</name> 

   9.         <value>/home/Hadoop/Hadoop-datastore/Hadoop-${user.name}</value> 

  10.    </property> 

  11.    <property> 

  12.     <name>fs.default.name</name> 

  13.     <value>hdfs://localhost:9000</value> 

  14.    </property> 

  15. </configuration> 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

   <property>

    <name>Hadoop.tmp.dir</name>

        <value>/home/Hadoop/Hadoop-datastore/Hadoop-${user.name}</value>

   </property>

   <property>

    <name>fs.default.name</name>

    <value>hdfs://localhost:9000</value>

   </property>

</configuration>

Hadoop.tmp.dir配置为你想要的路径，${user.name}会自动扩展为运行Hadoop的代码

   1. <configuration> 

   2.   <property> 

   3.     <name>dfs.replication</name> 

   4.     <value>1</value> 

   5.   </property> 

   6. </configuration> 

 <configuration>

  <property>

    <name>dfs.replication</name>

    <value>1</value>

  </property>

</configuration>

 dfs.replication为默认block复制数量

conf/mapred-site.xml

 Xml代码

  1. <configuration> 

   2.   <property> 

   3.     <name>mapred.job.tracker</name> 

   4.     <value>localhost:9001</value> 

   5.   </property> 

   6. </configuration> 

 <configuration>

  <property>

    <name>mapred.job.tracker</name>

    <value>localhost:9001</value>

  </property>

</configuration>

  执行

 格式化分布式文件系统：

 $ bin/Hadoop namenode -format

  启动Hadoop：

  Java代码

 1. $ bin/start-all.sh 

 $ bin/start-all.sh

 可以从

 NameNode - http://localhost:50070/

JobTracker - http://localhost:50030/

  查看NameNode和JobTracker

 运行例子：

 $ bin/Hadoop fs -put conf input

$ bin/Hadoop jar Hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

 look at the run result:

 $ bin/Hadoop fs -get output output

$ cat output/*