hadoop 2.0安装

标签: hadoop | 发表时间:2014-02-09 17:42 | 作者:suncf1985
出处:http://www.iteye.com

序号 主机IP 主机名称(root/redhat)   远程管理IP 远程管理帐号口令
1 192.168.101.120 cup-slave-4 192.168.101.150 user1/hadoop123
2 192.168.101.121 cup-slave-1 192.168.101.151 user1/hadoop123
3 192.168.101.122 cup-master-1 192.168.101.152 user1/hadoop123
4 192.168.101.123 cup-master-2 192.168.101.153 user1/hadoop123
5 192.168.101.124 cup-slave-3 192.168.101.154 user1/hadoop123
6 192.168.101.125 cup-slave-2 192.168.101.155 user1/hadoop123

临时文件目录:
C:\ProgramFilesDev\CDH4\on cup-master-1\
C:\ProgramFilesDev\CDH4\install files\

注意: 配置文件的编辑最好使用UltraEdit等工具编辑,不要使用写字板等工具,否则在linux下有可能会导致错误!!!!!!!!!

/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1

依次执行,GATEWAY一定要准确,可以执行$ifconfig查看Bcast属性

$source /etc/sysconfig/network
依次执行


修改hostname:  ##这个步骤一定要执行,否则NN格式化的时候有可能会报UnknownHostEception:cup-master-1的错误
$hostname cup-master-1
$hostname cup-master-2
$hostname cup-slave-1
$hostname cup-slave-2
$hostname cup-slave-3
$hostname cup-slave-4


/etc/hosts中已经配置了的主机:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.101.122  cup-master-1
192.168.101.123  cup-master-2
192.168.101.121  cup-slave-1
192.168.101.125  cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4



$source /etc/hosts
依次执行


DNS:
/etc/resolv.conf 增加
search localdomain
nameserver 192.168.101.110 ##dns ip
nameserver 8.8.8.8
依次执行


语言配置:
/etc/sysconfig/i18n
LANG=en_US
$source /etc/sysconfig/i18n
依次执行
$echo $LANG
进行查看



关闭防火墙 $sudo service iptables stop
查看防火墙 $sudo service iptables status
依次执行

永久关闭: $chkconfig iptables off
          $iptables -F
          $service iptables save


卸载openjdk:
1. rpm -qa|grep jdk
   java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64
2. rpm -e java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64

安装jdk
1. JAVA SE 1.6以上,下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
   下载jdk-6u32-linux-x64.bin
2. cd /usr/jdk6
3. chmod 755 *.bin
4. ./jdk-6u32-linux-x64.bin
5. 配置环境变量


/etc/profile 文件末尾处添加:
/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH

#JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"
$source /etc/profile  使环境变量生效




ulimit 打开文件最大数限制设置--打开文件句柄最大数限制设置
ulimit -u
1. /etc/security/limits.conf
* soft nofile 655350
* hard nofile 655350
2. /etc/security/limits.d/90-nproc.conf
*          soft    nproc     10240
*          hard    nproc     60240



6. hadoop用户配置

   /etc/sudoers 中root ALL=(ALL) ALL 下面添加
   root    ALL=(ALL) ALL
   hadoop    ALL=(ALL) ALL

   $groupadd hadoop
   $useradd hadoop –g hadoop
   $passwd hadoop

7. root用户登录 cup-master-1 关闭防火墙  $service iptables stop 依次执行各节点

8. root-> /etc/ssh/sshd_config
   #UseLogin no修改为
   UseLogin yes
   重启ssh: $service sshd restart
   否则会报-bash: ulimit: open files: cannot modify limit: Operation not permitted

8. cup-master-1 --> 到其他节点的SSH无密码登陆配置:
   hadoop用户登录 cup-master-1
   $mkdir .ssh      ------主节点不用建
   $ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
   在cup-master-2、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
   $scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/.ssh/  依次执行各节点
   $scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-master-2:/home/hadoop/sshcm1/

   hadoop用户登录 cup-master-1  配置本机
   $cd ~/.ssh
   $chmod 700 ~/.ssh
   $cat id_rsa.pub >> authorized_keys
   $chmod 600 .ssh/authorized_keys
  
   hadoop用户登录 cup-slave-1 配置其他机器
   $mkdir .ssh
   $chmod 700 .ssh
   $cd .ssh
   $cat sshcm1/id_rsa.pub >> ~/.ssh/authorized_keys
   $chmod 600 ~/.ssh/authorized_keys
  
   其他节点依次用hadoop用户登录执行

   hadoop用户登录 cup-master-1 测试无密码SSH登录: $ssh hadoop@cup-master-2  或者$ssh cup-master-2  其他节点依次执行
   注意:
   第一次连接的时候会有询问语句打出来,输入yes即可,,,
   然后再~/.ssh/目录下回生成known_hosts文件,,,,,,
   如果以后出现什么ssh无密码登陆的问题,可以删除该文件,重新做rsa数字签名,再重新做远程ssh登陆操作即可。


known_hosts文件:
cup-slave-1,192.168.98.225 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==

cup-slave-2,192.168.98.227 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==



9. cup-master-2 --> 到其他节点的SSH无密码登陆配置:
   hadoop用户登录 cup-master-2
   $mkdir .ssh
   $ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
   在cup-master-1、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
   $scp .ssh/id_rsa.pub hadoop@cup-master-1:/home/hadoop/sshcm2/  依次执行各节点
   $scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/sshcm2/ 
   $scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm2/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm2/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm2/

   hadoop用户登录 cup-master-2  配置本机
   $cd ~/.ssh
   $chmod 700 ~/.ssh
   $cat id_rsa.pub >> authorized_keys
   $chmod 600 .ssh/authorized_keys
  
   hadoop用户登录 cup-slave-1 配置其他机器
   $mkdir .ssh
   $chmod 700 .ssh
   $cd .ssh
   $cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys
   $chmod 600 ~/.ssh/authorized_keys
  
   其他节点依次用hadoop用户登录执行

   hadoop用户登录 cup-master-2 测试无密码SSH登录: $ssh hadoop@cup-master-1  或者$ssh cup-master-1  其他节点依次执行
  
  
   注意:
   ~/.ssh/authorized_keys 的权限必须为600,如果权限给的太高会报安全错误!
   $cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys意思是将sshcm2/id_rsa.pub添加到~/.ssh/authorized_keys的末尾,即追加





1. hadoop用户登录 cup-master-1
   安装hadoop, 部署namenode
   上传hadoop介质hadoop-2.0.0-cdh4.1.2.tar.gz

   $tar zxvf hadoop-2.0.0-cdh4.1.2.tar.gz 解压缩

2. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hadoop-env.sh
   JAVA_HOME=/usr/jdk6/jdk1.6.0_32

2.1 /home/hadoop/.bash_profile:

# User specific environment and startup programs
HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_CONF_HOME=${HADOOP_HOME}/etc/hadoop
YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

ANT_HOME=/home/cup/apache-ant-1.8.4
MAVEN_HOME=/home/cup/apache-maven-3.0.4

ZOOKEEPER_HOME=/home/cup/zookeeper-3.4.5-cdh4.2.1
HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1

HADOOP_HOME_WARN_SUPPRESS=1
HADOOP_CLASSPATH=$CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common:${HADOOP_HOME}/share/hadoop/common/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/hdfs:${HADOOP_HOME}/share/hadoop/hdfs/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/mapreduce:${HADOOP_HOME}/share/hadoop/mapreduce/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/yarn:${HADOOP_HOME}/share/hadoop/yarn/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`:$HADOOP_CLASSPATH

JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:/home/cup/shell:$PATH

export JAVA_LIBRARY_PATH LD_LIBRARY_PATH HADOOP_CLASSPATH
export HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME
export ZOOKEEPER_HOME HBASE_HOME ANT_HOME MAVEN_HOME HADOOP_HOME_WARN_SUPPRESS PATH


# HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1
# HADOOP_CLASSPATH=${HIVE_HOME}/lib:$HADOOP_CLASSPATH
# HIVE_CLASSPATH=$HBASE_HOME/conf
# PATH=$HIVE_HOME/bin:$PATH
# export HIVE_HOME HIVE_CLASSPATH HADOOP_CLASSPATH PATH

$source /home/hadoop/.bash_profile
 


Hadoop集群安装完毕后,第一件事就是修改bin/hadoop-evn.sh文件设置内存。主流节点内存配置为32GB,典型场景内存设置如下
NN: 15-25 GB
JT:2-4GB
DN:1-4 GB
TT:1-2 GB,Child VM 1-2 GB
集群的使用场景不同相关设置也有不同,如果集群有大量小文件,则要求NN内存至少要20GB,DN内存至少2GB。




3. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/core-site.xml

      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://cup-master-1:9000</value>
      </property>
      <property>
           <name>hadoop.tmp.dir</name>
           <value>/home/hadoop/hadoopworkspace/tmp</value>
      </property>


<property> 
  <name>fs.trash.interval</name> 
  <value>1440</value> 
</property> 
$hadoop fs -rmr /xxx/xxx  不会被彻底删除,被你删除的数据将会mv到操作用户目录的".Trash"文件夹
value单位为分钟,开启垃圾箱后,如果希望文件直接被删除,可以在使用删除命令时添加“–skipTrash” 参数
$hadoop fs –rm –skipTrash /xxxx



4. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hdfs-site.xml

   <property>
       <name>dfs.namenode.name.dir</name>
       <value>/home/hadoop/hadoopworkspace/dfs/name</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>/home/hadoop/hadoopworkspace/dfs/data</value>
   </property>
   <property>
       <name>dfs.replication</name>
       <value>3</value>
   </property>
   <property>
       <name>dfs.permissions</name>
       <value>false</value>
   </property>


5. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>hdfs://cup-master-1:9001</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>cup-master-1:9002</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/hadoopworkspace/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/hadoopworkspace/mapred/local</value>
<final>true</final>
</property>

6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/yarn-site.xml

<property>
<name>yarn.resourcemanager.address</name>
<value>cup-master-1:8080</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cup-master-1:8081</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cup-master-1:8082</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

7. 各节点上hadoop用户登录,创建hadoop工作目录
   $mkdir /home/hadoop/hadoopworkspace


6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves

cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

   /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/masters  该文件没有也可以

cup-master-1
cup-master-2




hadoop 压缩-----------------------------------------------------
7.0 拷贝native本地库文件/libhadoop/hadoop-lzo/hadoop-snappy
    到 /home/hadoop/hadoop-2.0.0-cdh4.1.2/lib/native/
    以及拷贝hadoop-lzo/hadoop-snappy相应的jar包
    hadoop-snappy已经集成进了hadoop-common中,所以没有单独的jar包

1). snappy本身的链接库-/usr/local/lib/libsnappy*.*
2). hadoop-common的jar包-hadoop-common-2.0.0-cdh4.2.0.jar
   源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\compress\snappy
3). hadoop-common的native链接库-libhadoop.a, libhadoop.so, libhadoop.so.1.0.0
   源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\native\src\org\apache\hadoop\io\compress\snappy


    snappy-1.1.0   #root用户安装
    $./configure
    $make
    $make install
    /usr/local/lib/libsnappy*.*

    如果make时报
    libtool: Version mismatch error.  This is libtool 2.4.2 Debian-2.4.2-1ubuntu1, but the
    libtool: definition of this LT_INIT comes from libtool 2.4.
    libtool: You should recreate aclocal.m4 with macros from libtool 2.4.2 Debian-2.4.2-1ubuntu1
    libtool: and run autoconf again.
    则需要运行
    $autoreconf -ivf
    ## $autoreconf --force --install
    完了再$make

core-site.xml::::::::::::::::::::::::::::::::;
<property>
  <name>io.compression.codecs</name>   <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
  <name>io.compression.codec.snappy.class</name>
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

## LzoCodec与SnappyCodec只能配置一个,按照哪个压缩配置哪个


mapred-site.xml:  MR的输出使用snappy压缩:
<!-- enable snappy for MRv1 -->
<property>
  <name>mapred.compress.map.output</name>
  <value>true</value>
</property>
<property>
  <name>mapred.map.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
  <name>mapred.output.compression.type</name>
  <value>BLOCK</value>
</property>
<!-- enable snappy for YARN -->
<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property>
<property>
  <name>mapred.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
  <name>mapreduce.output.fileoutputformat.compress.type</name>
  <value>BLOCK</value>
</property>
<property>
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>



7. DN节点多盘存储方案:
扩磁盘之前系统盘/几乎满了,利用率99%,
扩磁盘之后系统盘/的利用率下降为80%~90%左右,,
后面持续观察,,看看是否持续下降,,,,,,,

收回系统盘-->先停掉一个datanode,,让集群自动搬数据,,

优化方案-->
1)stop the entire cluster
2)mv /home/cup/hadoopworkspace/dfs/data/current/* /cup/d0/dfs2/data/current/
3)add /cup/d0/dfs2/data into the dfs.datanode.data.dir
4)start the entire cluster





7. 安装hadoop, 部署datanode
   hadoop-->cup-master-1
   $scp -rp hadoop-2.0.0-cdh4.1.2 hadoop@cup-master-2:/home/hadoop/   依次执行各节点


8. $hdfs namenode -format  第一次需要格式化namenode
   ./start-dfs.sh
   ./start-yarn.sh
   ./stop-dfs.sh
   ./stop-yarn.sh
   以上操作slave节点会被自动启动以及关闭

9. 浏览器中输入 http://192.168.101.122:8088可以查看hadoop集群状态
                http://192.168.101.122:50070可以查看namenode状态

10. $jps 查看进程
   NN: ResourceManager NameNode SecondaryNameNode
   DN: NodeManager DataNode













1. zookeeper/hbase install

2. hadoop-->cup-master-1:
   解压zookeeper-3.4.3-cdh4.1.2 hbase-0.92.1-cdh4.1.2

1. /etc/profile 文件末尾处添加:
   见前述

$source /etc/profile  使环境变量生效


2. zookeeper install
   /home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo_sample.cfg 改名为 zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/hadoopworkspace/zookeeper/data
dataLogDir=/home/hadoop/hadoopworkspace/zookeeper/log
clientPort=2181
server.1=cup-master-1:2888:3888
server.2=cup-slave-1:2888:3888
server.3=cup-slave-2:2888:3888
server.4=cup-slave-3:2888:3888
server.5=cup-slave-4:2888:3888

$mkdir /home/hadoop/hadoopworkspace/zookeeper/data  各节点依次执行,ZK不会自动创建
$mkdir /home/hadoop/hadoopworkspace/zookeeper/log  各节点依次执行,ZK不会自动创建

3. $scp -rp /home/hadoop/zookeeper-3.4.3-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/

4. create myid in dataDir  各节点依次执行
for cup-master-1, the content in myid file should be 1
for cup-slave-1, the content in myid file should be 2

4. 配置ZK自动清理策略
   /home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo.cfg
autopurge.purgeInterval=2
autopurge.snapRetainCount=10

5. /home/hadoop/zookeeper-3.4.3-cdh4.1.2/bin/
   $ ./zkServer.sh start  各节点依次执行启动 (第一台机器启动时报大量错误,无妨,是因为还没有选出领导者的缘故)

6. $jps 进程查看
   每个节点上都会多出一个 QuorumPeerMain 进程











7. hbase install
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-env.sh
export HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
export HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HBASE_MANAGES_ZK=false
export HBASE_HEAPSIZE=4000
  
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://cup-master-1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>cup-master-1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>600000</value>
<description>Time difference of regionserver from master</description>
</property>

hbase 压缩-----------------------------------------------------
hbase-site.xml===============================
<property>
<name>hbase.regionserver.codecs</name>
<value>snappy,lzo</value>
</property>





   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

8. $ scp -rp hbase-0.92.1-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/  其他slave节点依次执行

9. 注意时间同步master与各个slave之间需要进行时间同步(包括时区),时间差不能超过30000ms,否则hbase regionserver启动报org.apache.hadoop.hbase.ClockOutOfSyncException错误

9.1 手动同步时间
    root用户登录
    $date -s 20130219
    $date -s 14:37:00
    $ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

9.2 hbase-site.xml中增加
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
<description>Time difference of regionserver from master</description>
</property>

10. /home/hadoop/hbase-0.92.1-cdh4.1.2/bin/
   $ ./start-hbase.sh   slave节点会自动被启动
   $ ./stop-hbase.sh    slave节点会自动被关闭

11. http://192.168.101.122:50070可以查看namenode状态以及hdfs上的/hbase目录
    http://192.168.101.122:60010可以查看hbase状态

12. 进程查看
    NN:
13326 ResourceManager
18617 QuorumPeerMain
19630 Jps
12980 NameNode
13190 SecondaryNameNode
19411 HMaster
    DN:
30404 Jps
30181 HRegionServer
27489 QuorumPeerMain
14014 DataNode
14148 NodeManager

HBASE测试snappy压缩:
$hbase org.apache.hadoop.hbase.util.CompressionTest /home/cup/kv.txt snappy


   
HBASE优化参数:

/etc/profile:
JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"



hbase-env.sh:
export HBASE_HEAPSIZE=4000

export HBASE_OPTS="$HBASE_OPTS -XX:NewSize=128m -XX:MaxNewSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xmx12g -Xms12g -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"


export HBASE_OPTS="$HBASE_OPTS -Xms4g -Xmx4g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3  -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms12g -Xmx12g -XX:NewSize=3g -XX:MaxNewSize=3g -XX:NewRatio=3 -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"




hbase-site.xml

hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15
hbase.master.distributed.log.splitting: false



zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000


hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内



1. 集群中新增加一台机器,现有的集群节点不用重启,
   首先做NN到新增加机器的SSH无密码登陆等基础安装配置,
   再将新机器的主机名添加到
   /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
   中,再对hadoop以及hbase执行启动命令,现有节点上的进程不会被影响

2. Hadoop Balancer  可以使DataNode节点上选择策略重新平衡DataNode上的数据块的分布
   /home/hadoop/hadoop-2.0.0-cdh4.1.2/sbin/start-balancer.sh –t 10%
   这个命令中-t参数后面跟的是HDFS达到平衡状态的磁盘使用率偏差值。
   如果机器与机器之间磁盘使用率偏差小于10%,那么我们就认为HDFS集群已经达到了平衡的状态。








1. Oozie install

   /etc/profile:
   OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2


   $OOZIE_HOME//oozie-server/bin/catalina.sh:
   JAVA_HOME=/usr/jdk6/jdk1.6.0_32
   CATALINA_HOME=/home/cup/oozie-3.3.0-cdh4.2.1/oozie-server

   $OOZIE_HOME/bin/oozie-setup.sh:
   $oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 0.20.200 $HADOOP_HOME

   $oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 2.0 $HADOOP_HOME

2. $OOZIE_HOME/bin/oozie-run.sh 启动oozie


5. oozie启动报找不到org/apache/hadoop/utils/ReflectionUtils类
   将/home/hadoop/oozie-3.2.0-cdh4.1.2/libtools/*.jar copy to /home/hadoop/oozie-3.2.0-cdh4.1.2/oozie-server/webapps/oozie/WEB-INF/lib下

6. oozie启动报
REASON: org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Schema 'SA' does not exist {SELECT t0.bean_type, t0.conf, t0.console_url, t0.cred, t0.data, t0.error_code, t0.error_message, t0.external_child_ids, t0.external_id, t0.external_status, t0.name, t0.retries, t0.stats, t0.tracker_uri, t0.transition, t0.type, t0.user_retry_count, t0.user_retry_interval, t0.user_retry_max, t0.end_time, t0.execution_path, t0.last_check_time, t0.log_token, t0.pending, t0.pending_age, t0.signal_value, t0.sla_xml, t0.start_time, t0.status, t0.wf_id FROM WF_ACTIONS t0 WHERE t0.bean_type = ? AND t0.id = ?} [code=30000, state=42Y07]

7. $OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run

Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '3.2.0-cdh4.1.2'


The SQL commands have been written to: oozie.sql

sql脚本保存到$OOZIE_HOME/bin/oozie.sql文件中.

8. oozie-site.xml:
    <!-- Default proxyuser configuration for Hue -->
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.cup.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.cup.groups</name>
        <value>*</value>
    </property>


8.
Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified

oozie-env.sh:
export CATALINA_OPTS="$CATALINA_OPTS -Xms2g -Xmx4g"


8. $OOZIE_HOME/bin/oozie-run.sh 启动oozie   
   $OOZIE_HOME/bin/oozie-run.sh & 后台启动oozie

   最新:
   $oozied.sh run

   $ jps
   28945 Bootstrap

9. $OOZIE_HOME/bin/oozie admin -oozie http://192.168.101.122:11000/oozie -status
   System mode: NORMAL 则表示已经成功
   http://192.168.101.122:11000/oozie就能看到Oozie的管理界面













重启机器hostname变了,集群启动不起来:

2. hostname变了,需要修改 /etc/sysconfig/network
/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1
依次执行

$source /etc/sysconfig/network
依次执行

3. /etc/profile 环境变量挪到hadoop用户下

5. 关闭防火墙 $sudo service iptables stop
           查看防火墙 $sudo service iptables status


6. /etc/hosts:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.122  cup-master-1
192.168.101.123  cup-master-2
192.168.101.121  cup-slave-1
192.168.101.125  cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4

#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
这一行不注释掉, hbase起不来,,,,,

7. 时间同步 date -s


HBASE启动不起来:
ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

a. 关闭防火墙 $sudo service iptables stop
b. /etc/hosts 注释掉 ::1         localhost 这一行, 即禁用ipv6
c. 集群中节点时间同步










mysql5.1.67
8. $ sudo /etc/init.d/mysqld start 启动mysql  $service mysqld start
   $ sudo service mysqld status
   $ mysql 进入mysql服务模式
   mysql>
   mysql>exit 退出进入bash shell命令行模式

   $ /usr/bin/mysqladmin -u root password '123' 设置root用户密码
   $ /usr/bin/mysqladmin -u root -h cup-master-1 password '123'


1. Hive Install
1.1 .bash_profile
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
export HIVE_HOME
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$CLASSPATH:$HADOOP_HOME/bin
1.2 $ cd /home/hadoop/hive-0.9.0-cdh4.1.2/conf
1.3 $ cp hive-default.xml.template hive-site.xml
1.4 hive-site.xml:

最上面添加:
<property>
  <name>hive.aux.jars.path</name>
  <value>file:///root/hive-0.10.0-cdh4.2.0/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/hbase-0.94.2-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/zookeeper-3.4.5-cdh4.2.0.jar</value>
</property>

hive.metastore.warehouse.dir: /home/hadoop/hive-0.9.0-cdh4.1.2/warehouse
hive.exec.scratchdir: /home/hadoop/hive-0.9.0-cdh4.1.2/hive-${user.name}

javax.jdo.option.ConnectionURL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName: com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName: hive
javax.jdo.option.ConnectionPassword: hive


以下两处的description标签有语法错误,需要补上</description>:
1) hive.optimize.union.remove  at line474
2) hive.mapred.supports.subdirectories at line 489
以下三处的partition-dir标签有语法错误,需要补上</partition-dir>:
1) hive.exec.list.bucketing.default.dir at line561
2) hive.exec.list.bucketing.default.dir at line562
3) hive.exec.list.bucketing.default.dir at line563


hive-env.sh:

export HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
export HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HIVE_CLASSPATH=$HBASE_HOME/conf
####export HIVE_AUX_JARS_PATH=/home/cup/hive-0.10.0-cdh4.2.1/lib:$HADOOP_CLASSPATH
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib


注释掉HIVE_AUX_JARS_PATH的原因:
因为hive提交mr任务的时候调用hive.aux.jars.path变量,
该变量的值应该为file:///root/hive-0.10.0-cdh4.2.0/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/hbase-0.94.2-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/zookeeper-3.4.5-cdh4.2.0.jar
这个是在hive-site.xml中配置,而
hive-env.sh中的export HIVE_AUX_JARS_PATH需要注释,
否则报java.io.FileNotFoundException: File file:/home/hadoop/hive-0.10.0-cdh4.4.0/lib:***** does not exist
就算不注释掉,也得修改为
export HIVE_AUX_JARS_PATH=file:///home/cup/hive-0.10.0-cdh4.2.1/lib


##使用HIVE脚本往外部表(映射到hbase的snappy压缩表)中insert数据时HIVE需要通过HIVE_AUX_JARS_PATH找到以下jar包:
hive-hbase-handler-0.10.0-cdh4.2.0.jar
hbase-0.94.2-cdh4.2.0.jar
zookeeper-3.4.5-cdh4.2.0.jar
所以此处需要配置为HIVE_AUX_JARS_PATH=/root/hive-0.10.0-cdh4.2.0/lib/:$HADOOP_CLASSPATH
添加$HADOOP_CLASSPATH是因为在HIVE里面添加外部表(与HBASE的snappy压缩表关联)时找不到snappy的类


将hadoop-common的jar包拷贝到/home/cup/hive-0.10.0-cdh4.2.1/lib下,
否则
Failed with exception java.io.IOException:java.io.IOException:
Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
或者
Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.Sna
ppyCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:174)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 23 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.io.compress.Sna
ppyCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
... 25 more


hive-log4j.properties:

hive.log.dir=/home/cup/hive-0.10.0-cdh4.2.1/logs
hive.log.file=hive.log


   重新启动mysql
   $ mysql -u root -p 输入密码123
   mysql>
  
   mysql> create database hive;
   ## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
   mysql> grant all on hive.* to 'hive'@'localhost' identified by 'hive';
   mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';

   mysql-connector-java-5.1.22-bin.jar 拷贝到/home/hadoop/hive-0.9.0-cdh4.1.2/lib下

1.5
hive --service hwi &
http://192.168.98.20:9999/hwi

hive --service hiveserver &
[hadoop@cup-master-1 bin]$ Starting Hive Thrift Server

$ jps
29082 RunJar

$nohup hive --service hiveserver &
[hadoop@cup-master-1 bin]$ nohup: ignoring input and appending output to `nohup.out'

或者可以按照完HUE后由HUE进行统一启动。


HIVE 集成 HBASE:

hive>create external table snappy_hive(key int, value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,cf:value")
tblproperties ("hbase.table.name"="snappy_table");

hive>create table hive (key int,value string) row format delimited fields terminated by ',';
hive>load data local inpath '/home/cup/kv.txt' into table hive;
hive>insert overwrite table snappy_hive select * from hive;


snappy --- HIVE
To enable Snappy compression for Hive output when creating SequenceFile outputs, use the following settings:
SET hive.exec.compress.output=true;
SET hive.exec.compress.intermediate=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.cli.print.header=true;
SET hive.cli.print.current.db=true;


# JVM reuse
Hadoop will typically launch map or reduce tasks in a forked JVM.
the JVM startup may create significant overhead, especially when launching
jobs with hundreds or thousands of tasks, most which have short execution times.
Reuse allows a JVM instance to be reused up to N times for the same job.
in mapred-site.xml:
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>10</value>
</property>


hive.exec.scratchdir:
/home/cup/hive-0.10.0-cdh4.2.1/hive-${user.name}
hive.metastore.warehouse.dir:
/home/cup/hive-0.10.0-cdh4.2.1/warehouse



HIVE元数据库使用ORACLE:
1) 手动oracle版本的hive元数据库脚本 hive-0.10.0-cdh4.2.1\scripts\metastore\upgrade\oracle\hive-schema-0.10.0.oracle.sql
2) 修改hive-site.xml--jdbc连接
3) nohup hive --service hiveserver &


HIVE用户权限:

其他用户想执行HIVE需要配置以下几项:
.bash_profile
/home/hadoop/cdh42/cdhworkspace/tmp               chmod 777
/home/hadoop/cdh42/hive-0.10.0-cdh4.2.0/logs      chmod 777

hive>grant create/all on database default to user xhyt;
hive>show grant user xhyt on databaase default;
hive>grant select on table hive_t to user xhyt;
hive>grant select on table hive_t to group xhyt;




hbase-env.sh里面加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
hadoop-env.sh里面也加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib




root用户下/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH


hadoop用户下/home/hadoop/.bash_profile:
# User specific environment and startup programs
HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.3-cdh4.1.2
HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2
CATALINA_HOME=$OOZIE_HOME/oozie-server
ANT_HOME=/home/hadoop/apache-ant-1.8.4
MAVEN_HOME=/home/hadoop/apache-maven-3.0.4
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$OOZIE_HOME/bin:$CATALINA_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:$PATH
export HADOOP_CLASSPATH HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME ZOOKEEPER_HOME HBASE_HOME OOZIE_HOME CATALINA_HOME ANT_HOME MAVEN_HOME PATH



3. jdk内存调整大小 /etc/profile
   export JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
   $source /etc/profile
   各节点依次执行






HADOOP机架感知-提高网络性能

core-site.xml:
<property>
<name>topology.script.file.name</name>
<value>/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/rackaware.sh</value>
</property>

/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/rackaware.sh
#!/bin/bash

HADOOP_CONF=/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop

while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
    echo -n "/default/rack "
  else
    echo -n "$result "
  fi
done

$chmod 755 rackaware.sh


/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/topology.data
cup-master-1  /default/rack1
cup-master-2  /default/rack1
cup-slave-1  /default/rack1
cup-slave-2  /default/rack1
cup-slave-3  /default/rack1
cup-slave-4  /default/rack1
cup-slave-5  /default/rack1
cup-slave-6  /default/rack1
cup-slave-7  /default/rack2
cup-slave-8  /default/rack2
cup-slave-9  /default/rack2
cup-slave-10 /default/rack2
cup-slave-11 /default/rack2
cup-slave-12 /default/rack2
10.204.193.10 /default/rack1
10.204.193.11 /default/rack1
10.204.193.20 /default/rack1
10.204.193.21 /default/rack1
10.204.193.22 /default/rack1
10.204.193.23 /default/rack1
10.204.193.24 /default/rack1
10.204.193.25 /default/rack1
10.204.193.26 /default/rack2
10.204.193.27 /default/rack2
10.204.193.28 /default/rack2
10.204.193.29 /default/rack2
10.204.193.30 /default/rack2
10.204.193.31 /default/rack2






1. hue install (hadoop user experience)
   $python 进入python解释器
   ctrl+z退出python解释器

   Required Dependencies:
   gcc, g++,
   libgcrypt-devel, libxml2-devel, libxslt-devel,
   cyrus-sasl-devel, cyrus-sasl-gssapi,
   mysql-devel, python-devel, python-setuptools, python-simplejson,
   sqlite-devel, openldap-devel,
   ant

libgcrypt-devel-1.4.5-9.el6.x86_64
libxslt-devel-1.1.26-2.el6.x86_64
cyrus-sasl-devel-2.1.23-13.el6.x86_64
mysql-devel-5.1.52.el6_0.1.x86_64
openldap-devel-2.4.23-20.el6.x86_64

   install ant
   install maven


$make
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars:42: *** "Error: must have python development packages for 2.4, 2.5, 2.6 or 2.7. Could not find Python.h. Please install python2.4-devel, python2.5-devel, python2.6-devel or python2.7-devel".  Stop.

/usr/include/python2.6/下只有pyconfig-64.h,没有Python.h文件
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars中会进行判断

这是因为没有安装python-devel模块的原因



5. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2
   $ PREFIX=/home/hadoop/hue-2.1.0-cdh4.1.2-bin make install
   $ sudo chmod 4750 apps/shell/src/shell/build/setuid



2. hadoop config
hdfs-site.xml:
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

core-site.xml:
<property>
  <name>hadoop.proxyuser.hadoop.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hadoop.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>

httpfs-site.xml:
<property>
  <name>httpfs.proxyuser.hadoop.hosts</name>
  <value>*</value>
</property>
<property>
  <name>httpfs.proxyuser.hadoop.groups</name>
  <value>*</value>
</property>
<property>
  <name>httpfs.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>httpfs.proxyuser.hue.groups</name>
  <value>*</value>
</property>

mapred-site.xml:
<property>
  <name>jobtracker.thrift.address</name>
  <value>0.0.0.0:9290</value>
</property>
<property>
  <name>mapred.jobtracker.plugins</name>
  <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
  <description>Comma-separated list of jobtracker plug-ins to be activated.</description>
</property>

3. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue
   $ cp desktop/libs/hadoop/java-lib/hue-plugins-*.jar /home/hadoop/hadoop-2.0.0-cdh4.1.2/share/hadoop/mapreduce/lib
   如果HUE安装主机和hadoop集群master主机不再同一个主机上,那么需要使用scp命令进行拷贝
   HUE使用这个插件jar文件来与JobTracker通信

4. 重启hadoop集群

5. config oozie for hue
oozie-site.xml:
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hadoop.hosts</name>
    <value>*</value>
</property>
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hadoop.groups</name>
    <value>*</value>
</property>
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
    <value>*</value>
</property>
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
    <value>*</value>
</property>
<property>
  <name>oozie.service.AuthorizationService.security.enabled</name>
  <value>true</value>
</property>

6. 重启oozie

7. 确认关闭防火墙(HUE SERVER对外提供服务使用默认8888端口)



9. /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/desktop/conf/hue.ini
[desktop]
http_host=0.0.0.0
http_port=8888
[[database]]
engine=mysql
host=cup-master-1
port=3306
user=hue
password=hue
name=hue
[[hdfs_clusters]]
fs_defaultfs=hdfs://cup-master-1:9000
webhdfs_url=http://cup-master-1:50070/webhdfs/v1
hadoop_hdfs_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[mapred_clusters]]
jobtracker_host=cup-master-1
jobtracker_port=8021
thrift_port=9290
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[yarn_clusters]]
resourcemanager_host=cup-master-1
resourcemanager_port=8032
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[liboozie]
oozie_url=http://cup-master-1:11000/oozie
[beeswax]
hive_home_dir=/home/hadoop/hive-0.9.0-cdh4.1.2
hive_conf_dir=/home/hadoop/hive-0.9.0-cdh4.1.2/conf


HUE默认使用sqlite库,,,,
  [[database]]
    # Database engine is typically one of:
    # postgresql_psycopg2, mysql, or sqlite3
    #
    # Note that for sqlite3, 'name', below is a filename;
    # for other backends, it is the database name.
    engine=sqlite3
    ## host=
    ## port=
    ## user=
    ## password=
    name=/home/cup/hue-2.2.0-cdh4.2.1-bin/hue/desktop/desktop.db

10. 初始化

   重新启动mysql
   $ mysql -u root -p 输入密码123
   mysql>
  
   mysql> create database hue;
   ## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
   mysql> grant all on hue.* to 'hue'@'localhost' identified by 'hue';


    备份已有数据文件 /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue dumpdata > /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
   

    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue syncdb --noinput
    $ mysql -u hue -p hue -e "DELETE FROM hue.django_content_type;"

    migrate之前备份的数据:
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue loaddata /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json

11. .bash_profile:
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`
HADOOP_CLASSPATH=/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$HADOOP_CLASSPATH:$CLASSPATH:$HADOOP_HOME/bin



12. 启动
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/supervisor

    HUE会把HIVE一并启动

    ***停的时候需要使用root用户kill掉Runjar进程,,否则cup用户kill的时候
    总是会自动重新启动

13. 查看
    http://192.168.101.122:8888  hue/hue  hadoop/hadoop



5. HUE shell配置
   HUE supervisor进程查询 $ps -f -u cup

[cup@cup-master-1 ~]$ ps -f -u cup
UID        PID  PPID  C STIME TTY          TIME CMD
cup       7597  7594  0 17:18 ?        00:00:00 sshd: cup@pts/1 
cup       7598  7597  0 17:18 pts/1    00:00:00 -bash
cup       7777  7598  0 17:19 pts/1    00:00:00 vim hive-site.xml
cup       7943  7940  0 17:21 ?        00:00:00 sshd: cup@pts/5 
cup       7944  7943  0 17:21 pts/5    00:00:00 -bash
cup       9860  9857  0 17:32 ?        00:00:00 sshd: cup@pts/9 
cup       9861  9860  0 17:32 pts/9    00:00:00 -bash
cup      10560 10558  0 17:36 ?        00:00:01 sshd: cup@pts/2 
cup      10561 10560  0 17:36 pts/2    00:00:00 -bash
cup      10780 10560  0 17:38 ?        00:00:00 /usr/libexec/openssh/sftp-server
cup      11683 10561  0 17:47 pts/2    00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup      11687 11683  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup      11689 11683  2 17:47 pts/2    00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      11743 11687  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
cup      11874 11873  0 17:49 pts/1    00:00:00 bash
cup      11896  7944  9 17:49 pts/5    00:00:44 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      12147 11874  4 17:50 pts/1    00:00:21 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      12351 11874  0 17:54 pts/1    00:00:00 vim hive-site.xml
cup      12748 10561  4 17:57 pts/2    00:00:00 ps -f -u cup
cup      24208     1  2 Jul09 ?        00:30:54 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_namenode -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-
cup      24660     1  0 Jul09 ?        00:02:07 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_zkfc -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4
cup      24842     1  0 Jul09 ?        00:11:10 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dyarn.log.dir=/home/cup/hado
cup      25394     1  1 Jul09 ?        00:14:32 /usr/jdk6/jdk1.6.0_32/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx24000m -Xms24g -Xmx32g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3 -XX:Sur
cup      41822 41819  0 13:45 ?        00:00:00 sshd: cup       
cup      51570 51568  0 Jul08 ?        00:00:00 sshd: cup@pts/3 
cup      51571 51570  0 Jul08 pts/3    00:00:00 -bash
cup      56534 56531  0 Jul08 ?        00:00:01 sshd: cup@notty 
cup      56535 56534  0 Jul08 ?        00:00:00 /usr/libexec/openssh/sftp-server
cup      58691 58688  0 09:46 ?        00:00:00 sshd: cup@pts/0 
cup      58692 58691  0 09:46 pts/0    00:00:00 -bash



其中的
cup      11683 10561  0 17:47 pts/2    00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup      11687 11683  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup      11689 11683  2 17:47 pts/2    00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      11743 11687  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
是HUE相关的进程,,
想要停掉HUE需要先kill -9 11689,即RunJar进程,,
再停掉11687(runspawningserver)以及11683(supervisor)

否则不停掉11689(hue runjar)下次启动hue时会报8002,8003端口的socket无法创建










HBASE优化参数:

hbase-env.sh:
export HBASE_HEAPSIZE=4000

hbase-site.xml:

hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15


zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000


hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内
hbase-site.xml:
<property>
<name>hbase.rootdir</name>
<value>hdfs://cup-master-1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>cup-master-1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
<description>Time difference of regionserver from master</description>
</property>


<property>
<name>hbase.rpc.timeout</name>
<value>540000</value>
<description></description>
</property>
<property>
<name>ipc.socket.timeout</name>
<value>540000</value>
<description></description>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>540000</value>
<description>HRegion server lease period in milliseconds. Default is
      60 seconds. Clients must report in within this period else they are
      considered dead.
    </description>
</property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>540000</value>
    <description>ZooKeeper session timeout.
      HBase passes this to the zk quorum as suggested maximum time for a
      session.  See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
      "The client sends a requested timeout, the server responds with the
      timeout that it can give the client. "
      In milliseconds.
    </description>
  </property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
<description>when timeout occurs, regionserver will be restarted but not to shut down</description>
</property>  


  <property>
    <name>hbase.client.write.buffer</name>
    <value>20971520</value>  <!--20MB-->
    <description>Default size of the HTable client write buffer in bytes.
    A bigger buffer takes more memory -- on both the client and server
    side since server instantiates the passed write buffer to process
    it -- but a larger buffer size reduces the number of RPCs made.
    For an estimate of server-side memory-used, evaluate
    hbase.client.write.buffer * hbase.regionserver.handler.count
    </description>
  </property>
  <property>
    <name>hbase.regionserver.handler.count</name>
    <value>100</value>
    <description>Count of RPC Server instances spun up on RegionServers
    Same property is used by the Master for count of master handlers.
    Default is 10.
    </description>
  </property> 
  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>402653184</value> <!--384MB-->
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency.
    </description>
  </property> 
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>2147483648</value> <!--2GB-->
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property> 
  <property>
    <name>hbase.hstore.compactionThreshold</name>
    <value>3</value>
    <description>
    If more than this number of HStoreFiles in any one HStore
    (one HStoreFile is written per flush of memstore) then a compaction
    is run to rewrite all HStoreFiles files as one.  Larger numbers
    put off compaction but when it runs, it takes longer to complete.
    </description>
  </property> 
  <property>
    <name>hbase.hstore.blockingStoreFiles</name>
    <value>10</value>
    <description>
    If more than this number of StoreFiles in any one Store
    (one StoreFile is written per flush of MemStore) then updates are
    blocked for this HRegion until a compaction is completed, or
    until hbase.hstore.blockingWaitTime has been exceeded.
    </description>
  </property> 
 
  <property>
    <name>hbase.hstore.flush.thread</name>
    <value>20</value>
  </property>   
  <property>
    <name>hbase.hstore.compaction.thread</name>
    <value>15</value>
  </property>
 
 
 
 
 
 
 
 
 
 
 
HADOOP2.0 HA (NO NN Federation)

1. SSH无密码登陆配置
2. 修改hadoop配置文件(cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4)

配置文件如下:
vi core-site.xml:

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>   <!--hdfs://cup-master-1:9000-->
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>cup-master-1:2181,cup-slave-1:2181,cup-slave-2:2181,cup-slave-3:2181,cup-slave-4:2181</value>
</property>
</configuration>


vi hdfs-site.xml
<configuration>
<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
<property>
   <name>dfs.ha.namenodes.mycluster</name>
   <value>nn1,nn2</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.mycluster.nn1</name>
   <value>cup-master-1:9000</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>cup-master-2:9000</value>
</property>
<property>
     <name>dfs.namenode.http-address.mycluster.nn1</name>
     <value>cup-master-1:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>cup-master-2:50070</value>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://cup-master-1:8485;cup-slave-1:8485;cup-slave-2:8485;cup-slave-3:8485;cup-slave-4:8485/mycluster</value>
</property>
<property>
   <name>dfs.journalnode.edits.dir</name>
   <value>/home/hadoop/hadoopworkspace/dfs/jn</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>


<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>
或者是
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/exampleuser/.ssh/id_rsa</value>
</property>
<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
       <description>
          SSH connection timeout, in milliseconds, to use with the builtin
        sshfence fencer.
         </description>
</property>




<property>
  <name>dfs.datanode.max.transfer.threads</name>
  <value>4096</value>
            <description>
                 Specifies the maximum number of threads to use for transferring data
                     in and out of the DN.
                  </description>
</property>
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>

<property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/hadoop/hadoopworkspace/dfs/name</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/hadoop/hadoopworkspace/dfs/data</value
</property>
  
</configuration>

[root@HA2kerberos conf]# vim slaves
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

3. master-1上的hadoop拷贝到master-2上 scp
4. 把各个zookeeper起来
5. 然后在某一个主节点执行hdfs zkfc -formatZK,创建命名空间

6. 在dfs.namenode.shared.edits.dir指定的各个节点
   (qjournal://cup-master-1:8485;cup-slave-1:8485;cup-slave-2:8485;cup-slave-3:8485;cup-slave-4:8485/mycluster)
   用./hadoop-daemon.sh start journalnode启日志程序
  
7. 在主namenode节点用hadoop namenode -format格式化namenode和journalnode目录

8. 在主namenode节点启动./hadoop-daemon.sh start namenode进程  ./start-dfs.sh

9. 在备namenode节点执行hdfs namenode -bootstrapStandby,
   这个是把主namenode节点的目录格式化并把数据从主namenode节点的元数据拷本过来
  
   然后用./hadoop-daemon.sh start namenode启动namenode进程!
  
6. ./hadoop-daemon.sh start zkfc 主备namenode两个节点都做
7. ./hadoop-daemon.sh start datanode所有datanode节点都做


先起namenode在起zkfc你会发现namenode无法active状态,当你把zkfc启动后就可以了!!!
以上的顺序不能变,我在做的过程就因为先把zkfc启动了,导到namenode起不来!!!
自动启动的时候能看出来,zkfc是最后才启动的!!
[hadoop@ClouderaHA1 sbin]$ ./start-dfs.sh
Starting namenodes on [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA1.out
ClouderaHA2: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA2.out
ClouderaHA3: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA3.out
ClouderaHA1: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA1.out
ClouderaHA2: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA2.out
Starting ZK Failover Controllers on NN hosts [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA1.out
ClouderaHA2: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA2.out




A. 先各个节点启journalnode
   hadoop-daemon.sh start journalnode
  
B. 在主master节点start-dfs.sh start-yarn.sh

[hadoop@cup-master-1 ~]$ start-dfs.sh
Starting namenodes on [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-2.out

cup-master-1: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-1.out
cup-slave-4: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-4.out
cup-slave-1: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-1.out
cup-slave-3: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-3.out
cup-slave-2: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-2.out
Starting ZK Failover Controllers on NN hosts [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-2.out

cup-master-1: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-1.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31283 Jps
31207 DFSZKFailoverController
[hadoop@cup-master-1 ~]$

[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
15106 Jps
[hadoop@cup-master-2 ~]$



[hadoop@cup-master-1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-resourcemanager-cup-master-1.out
cup-slave-4: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-4.out
cup-slave-1: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-1.out
cup-slave-3: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-3.out
cup-slave-2: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-2.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31628 Jps
31207 DFSZKFailoverController
31365 ResourceManager
[hadoop@cup-master-1 ~]$

[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
17092 Jps

由此得知HA只是针对HDFS, 与MR2无关

[hadoop@cup-slave-1 ~]$ jps
30692 JournalNode
31453 NodeManager
31286 DataNode
30172 QuorumPeerMain
31562 Jps
[hadoop@cup-slave-1 ~]$





HBASE HA CONF:

1. hbase-site.xml

<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster/hbase</value>    <!-- hdfs://cup-master-1:9000 -->
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>cup-master-1:60000</value>
</property>

2. 将core-site.xml和hdfs-site.xml拷贝到hbase_home\conf\下
   否则hbase无法启动,不认hdfs://mycluster





HA调试失败

还原的时候必须
1. 清空目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn

2. 做格式化操作
NNs上: hdfs namenode -format


Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/name state: NOT_FORMATTED

NNs上: hdfs namenode -format
format时要求ZK进程以及JN进程启动
zkServer.sh start
hadoop-daemon.sh start journalnode



Incompatible namespaceID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has nsId 264369592 but storage has nsId 1178230309

修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的namespaceID


Incompatible clusterID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has clusterId 'CID-34eabdd9-ca2c-48ff-9127-b6df81aded90' but storage has clusterId 'CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e'

修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的clusterId


Incompatible clusterIDs in /home/hadoop/cdh42/cdhworkspace/dfs/data: namenode clusterID = CID-34eabdd9-ca2c-48ff-9127-b6df81aded90; datanode clusterID = CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e

修改/home/hadoop/cdh42/cdhworkspace/dfs/data/current/VERSION文件中的clusterId

原因:每次format会新生成namespaceID以及clusterID
而此时cdhworkspace/dfs/name,cdhworkspace/dfs/data, cdhworkspace/dfs/jn里面的namespaceID以及clusterID是旧的,
所以要在format前清空所有机器上的所有目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn




HBASE调大
ulimit -a  open files需要调大



dfs.replication.interval
dfs.datanode.handler.count
dfs.namenode.handler.count


HIVE集成HBASE 需要拷贝hbase配置文件到hadoop下:
hbase->hadoop:
hbase-0.94.2-cdh4.2.0/conf/hbase-site.xml copy to hadoop-2.0.0-cdh4.2.0/etc/hadoop/下






挂载ISO镜像文件:
mount -t iso9660 -o loop /*/*.iso /mnt


[contrib1]
name=Server
baseurl=file:///mnt/Server
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release








1. 晚上我查询研究了一下,目前主流的观点是rowkey: 10-100B,即rowkey长度控制在10到100个字节,
   rowkey过长会降低memstore检索效率以及hfile的存储效率,有百害而无一利。
2. 我这边结合咱们的场景以及数据模型,推荐以下长度:
    recommanded 8B=64b,16B=128b,24B=192b,32B=256b,最大不要超过32字节。
    即分别是8字节, 16字节, 24字节以及32字节,皆取8的整数倍,原因是64位机器内存分配以8字节倍数对齐。

3. 以下为量化分析:
8B = 64b = 2^64 = 1.844674407371 * 10^19     --20bits long int  --最大20位整数
16B = 128b = 2^128 = 3.4028236692094 * 10^38  --39bits long int  --最大39位整数
24B = 192b = 2^192 = 6.2771017353867 * 10^57  --58bits long int  --最大58位整数
32B = 256b = 2^256 = 1.1579208923732 * 10^77  --78bits long int  --最大78位整数

而根据咱们的设计话单表ROWKEY按如下方式组织->
6156911095 8534567490 11000 45000 1111111111111111111
反转电话 10位
取反时间 10位
小区维度 10位
终端维度 19位
总共是49位整数,,所以建议直接采用该方案,ROWKEY按照24个字节走,最大支持58位整数,取57位,
这样仍然有8位的空余可用,如果不需要那就转字节的时候自动填零即可。








CDH2.0 native lib compiling
依赖包::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
maven
apr-1.4.6.tar.gz
apr-util-1.5.1.tar.gz
httpd-2.2.23.tar.gz
php-5.3.18.tar.gz
rrdtool-1.4.7.tar.gz
pcre-8.31.tar.gz
libconfuse-2.6-2.el5.rf.x86_64.rpm
libconfuse-devel-2.6-2.el5.rf.x86_64.rpm
libxml2-devel rpmbuild glib2-devel dbus-devel freetype-devel fontconfig-devel
gcc-c++ expat-devel python-devel libXrender-devel
yum -y install apr-devel apr-util check-devel cairo-devel pango-devel

pcre-devel
tcl-devel
zlib-devel
bzip2-devel
libX11-devel
readline-devel   
libXt-devel  
tk-devel
tetex-latex

rhbase:
libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev
automake libtool flex bison pkg-config g++ libssl-dev


1. install lzo以及lzo-devel  lzo-devel  zlib-devel openssl-devel
   dependancy: lzo-devel  zlib-devel  gcc autoconf automake libtool

2. install ProtocolBuffers: http://wiki.apache.org/hadoop/HowToContribute
3. $cd /home/hadoop/protobuf-2.5.0/   ##root用户
   $./configure
   $make
   $make install

4. $cd /home/hadoop/protobuf-2.5.0/java ##hadoop用户
   $mvn compile
   $mvn install

5. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common
   modify pom.xml: add
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>2.5.0</version>  <!-- 加上版本号,否则找不到包 -->
</dependency>  

6. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn clean install -DskipTests -P native


******************注意, 因为hadoop-common-project/hadoop-common中包含snappy压缩的代码,
所以common本地库编译的时候最好事先安装好snappy,如snappy-1.1.0,否则使用snappy压缩时会提示:
this version of libhadoop was built without snappy support
snappy-1.1.0.tar.

http://code.google.com/p/hadoop-snappy/
$ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=SNAPPY_INSTALLATION_DIR 
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0

##不加-Dsnappy.prefix=/root/snappy-1.1.0的话
会提示snappy native library was compiled without snappy support
this version of libhadoop was built without snappy support
http://code.google.com/p/hadoop-snappy/上有说明


copy to hadoop-common-project/hadoop-common----------------------------

7. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/

8. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/

9. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
   注意,没有clean,否则拷过去的java文件会被删除

main:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.427s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.986s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.933s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [0.852s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.246s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.645s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.827s]
[INFO] Apache Hadoop Common .............................. FAILURE [49.566s]
[INFO] Apache Hadoop Common Project ...................... SKIPPED
[INFO] Apache Hadoop HDFS ................................ SKIPPED
[INFO] Apache Hadoop HttpFS .............................. SKIPPED
[INFO] Apache Hadoop HDFS Project ........................ SKIPPED
[INFO] hadoop-yarn ....................................... SKIPPED
[INFO] hadoop-yarn-api ................................... SKIPPED
[INFO] hadoop-yarn-common ................................ SKIPPED
[INFO] hadoop-yarn-server ................................ SKIPPED
[INFO] hadoop-yarn-server-common ......................... SKIPPED
[INFO] hadoop-yarn-server-nodemanager .................... SKIPPED
[INFO] hadoop-yarn-server-web-proxy ...................... SKIPPED
[INFO] hadoop-yarn-server-resourcemanager ................ SKIPPED
[INFO] hadoop-yarn-server-tests .......................... SKIPPED
[INFO] hadoop-yarn-client ................................ SKIPPED
[INFO] hadoop-yarn-applications .......................... SKIPPED
[INFO] hadoop-yarn-applications-distributedshell ......... SKIPPED
[INFO] hadoop-mapreduce-client ........................... SKIPPED
[INFO] hadoop-mapreduce-client-core ...................... SKIPPED
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SKIPPED
[INFO] hadoop-yarn-site .................................. SKIPPED
[INFO] hadoop-yarn-project ............................... SKIPPED
[INFO] hadoop-mapreduce-client-common .................... SKIPPED
[INFO] hadoop-mapreduce-client-shuffle ................... SKIPPED
[INFO] hadoop-mapreduce-client-app ....................... SKIPPED
[INFO] hadoop-mapreduce-client-hs ........................ SKIPPED
[INFO] hadoop-mapreduce-client-jobclient ................. SKIPPED
[INFO] Apache Hadoop MapReduce Examples .................. SKIPPED
[INFO] hadoop-mapreduce .................................. SKIPPED
[INFO] Apache Hadoop MapReduce Streaming ................. SKIPPED
[INFO] Apache Hadoop Distributed Copy .................... SKIPPED
[INFO] Apache Hadoop Archives ............................ SKIPPED
[INFO] Apache Hadoop Rumen ............................... SKIPPED
[INFO] Apache Hadoop Gridmix ............................. SKIPPED
[INFO] Apache Hadoop Data Join ........................... SKIPPED
[INFO] Apache Hadoop Extras .............................. SKIPPED
[INFO] Apache Hadoop Pipes ............................... SKIPPED
[INFO] Apache Hadoop Tools Dist .......................... SKIPPED
[INFO] Apache Hadoop Tools ............................... SKIPPED
[INFO] Apache Hadoop Distribution ........................ SKIPPED
[INFO] Apache Hadoop Client .............................. SKIPPED
[INFO] Apache Hadoop Mini-Cluster ........................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 58.143s
[INFO] Finished at: Tue Apr 09 14:31:49 CST 2013
[INFO] Final Memory: 67M/1380M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native"): java.io.IOException: error=2, No such file or directory -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hadoop-common



10. install cmake  ##root用户
    $tar xvf cmake-*.*.*.tar.gz
    $cd cmake-*.*.*
    $./bootstrap
    $make
    $make install

11. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
    $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
    注意,没有clean,执行该步骤之后才能生成
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources目录


copy to hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common----------------------------

12. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/

13. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/

14. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
   注意,没有clean


[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.302s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.861s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.765s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.010s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.230s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.614s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.741s]
[INFO] Apache Hadoop Common .............................. SUCCESS [23.666s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.075s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [31.895s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [2.411s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.076s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.265s]
[INFO] hadoop-yarn-api ................................... SUCCESS [6.371s]
[INFO] hadoop-yarn-common ................................ SUCCESS [1.907s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.107s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [1.211s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [2.975s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [0.324s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [0.634s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.367s]
[INFO] hadoop-yarn-client ................................ SUCCESS [0.194s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.108s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [0.344s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.098s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [1.496s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [0.231s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.200s]
[INFO] hadoop-yarn-project ............................... SUCCESS [0.172s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [6.503s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [0.391s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [3.133s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [1.250s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [3.092s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [0.900s]
[INFO] hadoop-mapreduce .................................. SUCCESS [0.105s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [0.706s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [1.513s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [0.828s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [1.201s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [1.040s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [0.409s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [0.545s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.772s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [0.467s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.059s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [0.228s]
[INFO] Apache Hadoop Client .............................. SUCCESS [0.624s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.247s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:56.489s
[INFO] Finished at: Tue Apr 09 15:28:54 CST 2013
[INFO] Final Memory: 87M/744M
[INFO] ------------------------------------------------------------------------



15. 编译后的native文件:
    /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/



[hadoop@cup-master-1 src]$ find . -name *.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libposix_util.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libnative_mini_dfs.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib/libhdfs.a
./hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/libhadoop.a
./hadoop-tools/hadoop-pipes/target/native/libhadooputils.a
./hadoop-tools/hadoop-pipes/target/native/libhadooppipes.a
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/libcontainer.a



已有 0 人发表留言,猛击->> 这里<<-参与讨论


ITeye推荐



相关 [hadoop] 推荐:

Hadoop Streaming 编程

- - 学着站在巨人的肩膀上
Hadoop Streaming是Hadoop提供的一个编程工具,它允许用户使用任何可执行文件或者脚本文件作为Mapper和Reducer,例如:. 采用shell脚本语言中的一些命令作为mapper和reducer(cat作为mapper,wc作为reducer). 本文安排如下,第二节介绍Hadoop Streaming的原理,第三节介绍Hadoop Streaming的使用方法,第四节介绍Hadoop Streaming的程序编写方法,在这一节中,用C++、C、shell脚本 和python实现了WordCount作业,第五节总结了常见的问题.

Hadoop使用(一)

- Pei - 博客园-首页原创精华区
Hadoop使用主/从(Master/Slave)架构,主要角色有NameNode,DataNode,secondary NameNode,JobTracker,TaskTracker组成. 其中NameNode,secondary NameNode,JobTracker运行在Master节点上,DataNode和TaskTracker运行在Slave节点上.

Hadoop MapReduce技巧

- - 简单文本
我在使用Hadoop编写MapReduce程序时,遇到了一些问题,通过在Google上查询资料,并结合自己对Hadoop的理解,逐一解决了这些问题. Hadoop对MapReduce中Key与Value的类型是有要求的,简单说来,这些类型必须支持Hadoop的序列化. 为了提高序列化的性能,Hadoop还为Java中常见的基本类型提供了相应地支持序列化的类型,如IntWritable,LongWritable,并为String类型提供了Text类型.

Hadoop TaskScheduler浅析

- - kouu&#39;s home
TaskScheduler,顾名思义,就是MapReduce中的任务调度器. 在MapReduce中,JobTracker接收JobClient提交的Job,将它们按InputFormat的划分以及其他相关配置,生成若干个Map和Reduce任务. 然后,当一个TaskTracker通过心跳告知JobTracker自己还有空闲的任务Slot时,JobTracker就会向其分派任务.

HADOOP安装

- - OracleDBA Blog---三少个人自留地
最近有时间看看hadoop的一些东西,而且在测试的环境上做了一些搭建的工作. 首先,安装前需要做一些准备工作. 使用一台pcserver作为测试服务器,同时使用Oracle VM VirtualBox来作为虚拟机的服务器. 新建了三个虚拟机以后,安装linux,我安装的linux的版本是redhat linux 5.4 x64版本.

Hadoop Corona介绍

- - 董的博客
Dong | 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及 版权声明. 网址: http://dongxicheng.org/hadoop-corona/hadoop-corona/. Hadoop Corona是facebook开源的下一代MapReduce框架. 其基本设计动机和Apache的YARN一致,在此不再重复,读者可参考我的这篇文章 “下一代Apache Hadoop MapReduce框架的架构”.

Hadoop RPC机制

- - 企业架构 - ITeye博客
RPC(Remote Procedure Call Protocol)远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议. Hadoop底层的交互都是通过 rpc进行的. 例如:datanode和namenode 、tasktracker和jobtracker、secondary namenode和namenode之间的通信都是通过rpc实现的.

Hadoop Rumen介绍

- - 董的博客
Dong | 新浪微博: 西成懂 | 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及 版权声明. 网址: http://dongxicheng.org/mapreduce/hadoop-rumen-introduction/. 什么是Hadoop Rumen?. Hadoop Rumen是为Hadoop MapReduce设计的日志解析和分析工具,它能够将JobHistory 日志解析成有意义的数据并格式化存储.

Hadoop contrib介绍

- - 董的博客
Dong | 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及 版权声明. 网址: http://dongxicheng.org/mapreduce/hadoop-contrib/. Hadoop Contrib是Hadoop代码中第三方公司贡献的工具包,一般作为Hadoop kernel的扩展功能,它包含多个非常有用的扩展包,本文以Hadoop 1.0为例对Hadoop Contrib中的各个工具包进行介绍.

HADOOP SHUFFLE(转载)

- - 数据库 - ITeye博客
Shuffle过程是MapReduce的核心,也被称为奇迹发生的地方. 要想理解MapReduce,Shuffle是必须要了解的. 我看过很多相关的资料,但每次看完都云里雾里的绕着,很难理清大致的逻辑,反而越搅越混. 前段时间在做MapReduce job性能调优的工作,需要深入代码研究MapReduce的运行机制,这才对Shuffle探了个究竟.