Hadoop 使用 DNS 的问题

最近一直在 docker 上投入精力。

做大数据,难免自己要动手搞一个小集群测试,公司来的新人一般需要折腾很久才能搭起一个环境,正好写了一堆 Dockerfile 给他们。

现在开发机一般都是16G内存的配置,起虚拟机可能有点卡,用Docker做测试问题不大,分布式环境一般都要求 ip-host 映射,可惜 Docker 这方面有些问题,最优的方案是使用 DNS 服务,现在用的 redis,zookeeper,kafka 测试容器用 dns 都问题不大,唯独用 hadoop 的时候出了问题。

先说下环境:

  • alpine 3.3
  • java 1.8
  • hadoop 2.6.4
    提一句 alpine 真心好用,不到 5M 的镜像。。。

format namenode 的时候正常,启动 HDFS 时,进程都能起来,但是 web 界面发现没有 DataNode,查看日志发现如下错误

2016-03-16 15:35:22,758 WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=172.17.0.5, hostname=172.17.0.5)
2016-03-16 15:35:22,758 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from 172.17.0.5:36126 Call#1 Retry#0
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode because hostname cannot be resolved (ip=172.17.0.5, hostname=172.17.0.5): DatanodeRegistration(0.0.0.0, datanodeUuid=52080502-69b5-4912-b4dd-455e1503ab91, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-76676a96-0b38-48d4-8878-cfdbc7238560;nsid=1376123452;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:889)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5048)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1142)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27329)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

大意是,DataNode 请求注册,参数带的是 slave 的 ip,日志显示hostname cannot be resolved,ping 了一下,dns服务可以正常解析, 后来在官方文档中找到了一个相关参数dfs.namenode.datanode.registration.ip-hostname-check

If true (the default), then the namenode requires that a connecting datanode’s address must be resolved to a hostname. If necessary, a reverse DNS lookup is performed. All attempts to register a datanode from an unresolvable address are rejected. It is recommended that this setting be left on to prevent accidental registration of datanodes listed by hostname in the excludes file during a DNS outage. Only set this to false in environments where there is no infrastructure to support reverse DNS lookup.

https://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

DataNode 注册的时候,会用ip反向查询hostname,如果 dns 服务不支持反向查询Reverse DNS lookup,就悲剧了

解决办法,在 hdfs-site.xml 添加

<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>

就可以了。

PS:hadoop 的 native 库 在 alpine 上竟然不需要重新编译,省事了。。。Docker 的宿主机是 Ubuntu server 14.04


- - - - - - - - End Thank For Your Reading - - - - - - - -