Hadoop集群部署

环境说明

主机列表

IP地址 主机名 操作系统
192.168.159.130 node01 debian 9.6
192.168.159.131 node02 debian 9.6
192.168.159.132 node03 debian 9.6

注意,此处的主机名不能包含 -_

JDK

因为Oracle变更了Java的使用授权方式,所以使用OpenJDK

openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

Hadoop

Hadoop版本:hadoop-2.9.2

Java环境

在三台主机执行以下操作

$ mkdir ~/Downloads
cd ~/Downloads
wget https://download.java.net/java/GA/jdk11/13/GPL/openjdk-11.0.1_linux-x64_bin.tar.gz
sudo tar xf openjdk-11.0.1_linux-x64_bin.tar.gz -C /usr/local/

修改/etc/profile文件,追加以下内容,并注销后重新登陆

export JAVA_HOME=/usr/local/jdk-11.0.1
export CLASSPATH=.:$JAVA_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH

修改hosts

在三台主机执行以下操作

sudo vim /etc/hosts

追加以下内容

192.168.159.130	node01
192.168.159.131	node02
192.168.159.132	node03

安装SSH服务并配置SSH Key登陆

安装SSH服务

在三台主机执行以下操作

sudo apt-get update
sudo apt-get install openssh-server     # 一般都安装过了
ssh-keygen -t rsa -P ""                 # 此步需要输入一次回车确认SSH Key生成路径

分发公钥

分别在各主机使用scp命令将生成的SSH Key复制到其它机器

本例如下:

node01

scp ~/.ssh/id_rsa.pub cloud@node02:~/.ssh/id_rsa_master.pub
scp ~/.ssh/id_rsa.pub cloud@node03:~/.ssh/id_rsa_master.pub

node02

scp ~/.ssh/id_rsa.pub cloud@node01:~/.ssh/id_rsa_slave_01.pub
scp ~/.ssh/id_rsa.pub cloud@node03:~/.ssh/id_rsa_slave_01.pub

node03

scp ~/.ssh/id_rsa.pub cloud@node01:~/.ssh/id_rsa_slave_02.pub
scp ~/.ssh/id_rsa.pub cloud@node02:~/.ssh/id_rsa_slave_02.pub

生成authorized_keys

分别在各主机使用执行

node01

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_slave_01.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_slave_02.pub >> ~/.ssh/authorized_keys

node02

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_master.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_slave_02.pub >> ~/.ssh/authorized_keys

node03

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_master.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_slave_01.pub >> ~/.ssh/authorized_keys

测试SSH Key登陆

node01

ssh node01
ssh node02
ssh node03

node02

ssh node01
ssh node02
ssh node03

node03

ssh node01
ssh node02
ssh node03

这里一般会有如下输出

The authenticity of host 'node0x (192.168.159.13x)' can't be established.
ECDSA key fingerprint is SHA256:YX+xBXQ615FPJ/Fzn051BbaelveeLC5Y+B91AjeYgLk.
Are you sure you want to continue connecting (yes/no)? 

输入yes回车即可

下载Hadoop并配置环境变量

在三台主机执行以下操作

下载操作可以在一台主机下载,然后使用scp拷贝到其它主机

cd ~/Downloads
wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
sudo mv hadoop-2.9.2.tar.gz /opt/
sudo tar xf /opt/hadoop-2.9.2.tar.gz -C /opt/
sudo chown -R 用户名:用户组 /opt/hadoop-2.9.2/

修改/etc/profile文件,追加以下内容,并注销后重新登陆

export HADOOP_HOME=/opt/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

修改配置文件

在三台主机执行以下操作

此操作可以在一个主机修改,完成后使用scp命令分发

core-site.xml

vim /opt/hadoop-2.9.2/etc/hadoop/core-site.xml

configuration节点下添加以下内容

	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/data/hadoop/tmp</value>
		<description>Abase for other temporary directories.</description>
	</property>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://node01:9000</value>
	</property>

fs.defaultFS

指定HDFS中NameNode的地址

hadoop.tmp.dir

指定hadoop运行时产生文件的存储目录

hadoop-env.sh

vim /opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh

注释掉export JAVA_HOME=${JAVA_HOME}(约26行),在下面增加 export JAVA_HOME=/usr/local/jdk-11.0.1

# The java implementation to use.
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/jdk-11.0.1

hdfs-site.xml

vi /opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml

configuration节点下添加以下内容

	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/data/hadoop/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/data/hadoop/data</value>
	</property>

dfs.replication

设置dfs副本数,不设置默认是3个

slaves

vim /opt/hadoop-2.9.2/etc/hadoop/slaves 

写入以下内容

node02
node03

mapred-site.xml

cp /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml
vim /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml

configuration节点下添加以下内容

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

mapreduce.framework.name

指定mr运行在yarn上

yarn-env.sh

vim /opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh

# export JAVA_HOME=/home/y/libexec/jdk1.6.0/下(约24行)添加export JAVA_HOME=/usr/local/jdk-11.0.1

# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/local/jdk-11.0.1
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  JAVA_HOME=$JAVA_HOME
fi

yarn-site.xml

vim /opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml 

configuration节点下添加以下内容

     <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
     </property>
     <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node01</value>
     </property>

分发配置

scp /opt/hadoop-2.9.2/etc/hadoop/core-site.xml cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/core-site.xml
scp /opt/hadoop-2.9.2/etc/hadoop/core-site.xml cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/core-site.xml

scp /opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh
scp /opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh

scp /opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
scp /opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml

scp /opt/hadoop-2.9.2/etc/hadoop/slaves cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/slaves
scp /opt/hadoop-2.9.2/etc/hadoop/slaves cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/slaves

scp /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml
scp /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml

scp /opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh
scp /opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh

scp /opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml cloud@node02:/opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml 
scp /opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml cloud@node03:/opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml 

启动集群

创建HDFS临时目录

在三台主机执行以下操作

mkdir -p /data/hadoop/tmp
mkdir -p /data/hadoop/data
mkdir -p /data/hadoop/name

格式化namenode

hdfs namenode -format

此处如果有异常,请自行查找错误

任意一台主机上执行

start-dfs.sh 
start-yarn.sh

输出

start-dfs.sh

$ start-dfs.sh 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/opt/hadoop-2.9.2/share/hadoop/common/lib/hadoop-auth-2.9.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Starting namenodes on [node01]
node01: starting namenode, logging to /opt/hadoop-2.9.2/logs/hadoop-cloud-namenode-node01.out
node03: starting datanode, logging to /opt/hadoop-2.9.2/logs/hadoop-cloud-datanode-node03.out
node02: starting datanode, logging to /opt/hadoop-2.9.2/logs/hadoop-cloud-datanode-node02.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:YX+xBXQ615FPJ/Fzn051BbaelveeLC5Y+B91AjeYgLk.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.9.2/logs/hadoop-cloud-secondarynamenode-node01.out
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/opt/hadoop-2.9.2/share/hadoop/common/lib/hadoop-auth-2.9.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t be established. ECDSA key fingerprint is SHA256:YX+xBXQ615FPJ/Fzn051BbaelveeLC5Y+B91AjeYgLk. Are you sure you want to continue connecting (yes/no)? yes 如果出现类似的内容,输入yes

start-yarn.sh

$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.9.2/logs/yarn-cloud-resourcemanager-node01.out
node02: starting nodemanager, logging to /opt/hadoop-2.9.2/logs/yarn-cloud-nodemanager-node02.out
node03: starting nodemanager, logging to /opt/hadoop-2.9.2/logs/yarn-cloud-nodemanager-node03.out

测试

可以使用浏览器访问 Master的50070端口,查看集群状态


停止集群

任意一台主机上执行

stop-yarn.sh
stop-dfs.sh 

输出

stop-yarn.sh

$ stop-yarn.sh 

stop-dfs.sh

$ stop-dfs.sh 

上面的警告是因为JDK版本过高(也是有点醉),具体的原理还没有研究,不影响正常运行,就是看着很不舒服,处女座的可以降级JDK到1.8解决这个问题