“Hadoop集群因为在非HA集群模式下namenode和resourcemanager是单节点工作,故存在单节点故障。为防止故障发生,需要使用namenode和resourcemanger的高可用配置。在Hadoop 2.x版本中,高可用配置采用双节点主备配置,通过配置两个namenode和两个resourcemanager进行故障时的自动切换。其中,namenode故障切换采用ZKFC方式,resourcemanager故障切换不采用额外的ZKFC,由resourcemanager自行控制。”
1. 配置准备
1.1 环境准备
主机名 | 主机IP | 备注 |
---|---|---|
lizhiyong-pc | 192.168.2.42 | namenode |
lizhiyong-vb | 192.168.2.23 | namenode, journalnode |
lizhiyong-old | 192.168.2.35 | datanode |
zookeeper-0 | 10.42.0.167 | zookeeper |
zookeeper-1 | 10.42.0.163 | zookeeper |
zookeeper-2 | 10.42.0.164 | zookeeper |
在各台机器上配置/etc/hosts文件,保证各主机之间通过hostname能互相访问。三台zookeeper主机实际位于lizhiyong-pc机器上,通过rancher安装,需要在宿主机上配置端口映射:
sudo iptables -t nat -A PREROUTING -d 192.168.2.42 -p tcp --dport 32181 -j DNAT --to-destination 10.42.0.167:2181
sudo iptables -t nat -A POSTROUTING -d 10.42.0.167 -p tcp --dport 2181 -j SNAT --to 192.168.2.42
sudo iptables -t nat -A PREROUTING -d 192.168.2.42 -p tcp --dport 32182 -j DNAT --to-destination 10.42.0.163:2181
sudo iptables -t nat -A POSTROUTING -d 10.42.0.163 -p tcp --dport 2181 -j SNAT --to 192.168.2.42
sudo iptables -t nat -A PREROUTING -d 192.168.2.42 -p tcp --dport 32183 -j DNAT --to-destination 10.42.0.164:2181
sudo iptables -t nat -A POSTROUTING -d 10.42.0.164 -p tcp --dport 2181 -j SNAT --to 192.168.2.42
这样在其他机器上配置zookeeper连接时,使用zookeeper-0:32181,zookeeper-1:32182,zookeeper-2:32183即可。
1.2 安装包
安装包 | 包名 |
---|---|
zookeeper | zookeeper-3.4.5-cdh5.15.0.tar.gz |
hadoop | hadoop-2.6.0-cdh5.15.0.tar.gz |
1.3 安装准备
在zookeeper三台主机上配置好zookeeper,实际安装采用rancher进行安装,安装配置请参考: https://github.com/lizhiyong2000/docker-k8s/blob/master/k8s/zookeeper/zookeeper.yaml
另外,各组件的用户分别使用hdfs,yarn,mapred运行,需要提前将用户及用户组以及配置中涉及的目录创建好。
2. 组件配置
2.1 基础配置
- hadoop-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_PID_DIR=/var/run/hadoop
export HADOOP_LOG_DIR=/var/log/hadoop
- yarn-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export YARN_PID_DIR=/var/run/yarn
export YARN_LOG_DIR=/var/log/yarn
- mapred-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_MAPRED_PID_DIR=/var/run/mapred
export HADOOP_MAPRED_LOG_DIR=/var/log/mapred
- slaves:
lizhiyong-old
2.2 关键配置
2.2.1 core-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<!-- fs.defaultFS 配置为通过集群nameservice访问,test为nameservice名称 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://test</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
2.2.2 hdfs-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>dfs.namenode.handler.count</name>
<value>500</value>
</property>
<!-- 目录配置 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/hadoop/datanode</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/hadoop/edit</value>
</property>
<!-- 以下为高可用配置 -->
<!-- nameservices配置 -->
<property>
<name>dfs.nameservices</name>
<value>test</value>
</property>
<!-- nameservices下配置namenode列表 -->
<property>
<name>dfs.ha.namenodes.test</name>
<value>nn1,nn2</value>
</property>
<!-- namenode nn1配置 -->
<property>
<name>dfs.namenode.http-address.test.nn1</name>
<value>lizhiyong-pc:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.test.nn1</name>
<value>lizhiyong-pc:9000</value>
</property>
<!-- namenode nn2配置 -->
<property>
<name>dfs.namenode.http-address.test.nn2</name>
<value>lizhiyong-vb:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.test.nn2</name>
<value>lizhiyong-vb:9000</value>
</property>
<!-- 连接journalnode配置-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://lizhiyong-vb:8485/journalCLuster</value>
</property>
<!-- 自动切换配置 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zookeeper-0:2181,zookeeper-1:2181,zookeeper-2:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<!-- 客户端配置 -->
<property>
<name>dfs.client.failover.proxy.provider.test</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
</configuration>
2.2.3 yarn-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>test</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 另一台机器上配置为rm2 -->
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>lizhiyong-pc</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>lizhiyong-vb</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKResourceManagerStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>lizhiyong-pc:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>lizhiyong-vb:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zookeeper-0:32181,zookeeper-1:32182,zookeeper-2:32183</value>
</property>
</configuration>
2.2.4 mapred-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/opt/data/hadoop/tmp/hadoop-yarn/staging/history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/opt/data/hadoop/tmp/hadoop-yarn/staging/history/done_intermedia</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>lizhiyong-pc:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>lizhiyong-pc:19888</value>
</property>
</configuration>
3. 配置测试
3.1 HDFS NameNode高可用测试
- 启动zookeeper
- 启动集群journalnode
$HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode
- 在zookeeper中初始化
$HADOOP_HOME/bin/hdfs zkfc -formatZK
运行这个命令后,会在zookeeper上创建一个/hadoop-ha/test/的znode,用来存放automatic failover的数据。
```
[zk: localhost:2181(CONNECTED) 0] ls /hadoop-ha
[test]
[zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha/test
[ActiveBreadCrumb, ActiveStandbyElectorLock]
```
- 启动zkfc(zookeeper failover controller) 需要在nn1和nn2上都启动zkfc daemon进程。
$HADOOP_HOME/sbin/hadoop-daemon.sh start zkfc
- 启动namenode
启动nn1的namenode:
$HADOOP_HOME/bin/hdfs namenode -format
$HADOOP_HOME/sbin/hadoop-daemon.sh start namenode
启动nn2的namenode:
$HADOOP_HOME/bin/hdfs namenode -bootstrapStandby
$HADOOP_HOME/sbin/hadoop-daemon.sh start namenode
两台namenode上分别启动zkfc和namenode后状态如下: nn1上的namenode处于active状态:
nn2上的namenode处于standby状态:
在nn1上kill掉namenode进程,在nn2上的zkfc日志中看到namenode已经切换为active状态:
3.2 YARN ResourceManager高可用测试
yarn组件HA未使用zkfc,由resourcemanager自身进行切换,启动resourcemanager后kill掉active的,即可看到切换,不过切换速度不及namenode的切换。
$HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager
使用命令查看resourcemanager状态:
$ yarn rmadmin -getServiceState rm1
active
$ yarn rmadmin -getServiceState rm2
standby
在web中查看resourcemanager状态:
将active的resourcemanager kill掉后,standby的resourcemanager转换为active状态: