一、项目前言很多大数据专业同学做毕设、课设时80%的时间都浪费在环境搭建报错上版本不兼容、进程闪退、DataNode启动失败、Hive元数据报错、HMaster自动关闭、Spark集群连接失败等。大数据项目无论做日志分析、用户画像、推荐系统、可视化大屏前提必须拥有一套可运行的完整集群。本篇文章整理一次性成功搭建流程所有命令、配置文件均已实测可用规避99%新手坑适合所有大数据毕设、课设项目打底使用。适用人群大数据专业课设/毕设学生、零基础大数据入门、需要快速搭建集群做项目开发的同学。环境版本2026稳定顶配组合操作系统CentOS 7.9JDK1.8.0_371Hadoop3.3.6Zookeeper3.8.3Hive3.1.3HBase2.5.5Spark3.4.1MySQL8.0.33Hive元数据存储二、集群整体架构三节点节点IP主机名核心角色Master192.168.88.100masterNameNode、ResourceManager、HMaster、SparkMasterSlave1192.168.88.101slave1DataNode、NodeManager、RegionServer、ZK LeaderSlave2192.168.88.102slave2DataNode、NodeManager、RegionServer、ZK Follower三、基础环境全局配置三台机器统一操作3.1 主机名与hosts映射分别修改三台主机名hostnamectl set-hostname master hostnamectl set-hostname slave1 hostnamectl set-hostname slave2三台统一配置hostsvim /etc/hosts 192.168.88.100 master 192.168.88.101 slave1 192.168.88.102 slave23.2 关闭防火墙与SELinuxsystemctl stop firewalld systemctl disable firewalld sed -i s/SELINUXenforcing/SELINUXdisabled/ /etc/selinux/config setenforce 03.3 SSH免密登录仅Master执行ssh-keygen -t rsa -P -f ~/.ssh/id_rsa ssh-copy-id master ssh-copy-id slave1 ssh-copy-id slave23.4 JDK环境统一部署卸载系统自带OpenJDKrpm -qa | grep java | xargs rpm -e --nodeps 2/dev/null解压JDK并配置环境变量 /etc/profileexport JAVA_HOME/opt/module/jdk1.8.0_371 export PATH$JAVA_HOME/bin:$PATH export CLASSPATH.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar生效并验证source /etc/profile java -version四、Zookeeper集群搭建优先搭建4.1 解压与环境变量tar -zxvf /opt/software/apache-zookeeper-3.8.3-bin.tar.gz -C /opt/module/ mv /opt/module/apache-zookeeper-3.8.3-bin /opt/module/zookeeper-3.8.3添加环境变量export ZK_HOME/opt/module/zookeeper-3.8.3 export PATH$ZK_HOME/bin:$PATH4.2 核心配置 zoo.cfgtickTime2000 initLimit10 syncLimit5 dataDir/opt/module/zookeeper-3.8.3/zkData dataLogDir/opt/module/zookeeper-3.8.3/zkLog clientPort2181 server.1master:2888:3888 server.2slave1:2888:3888 server.3slave2:2888:38884.3 节点myid配置关键避坑masterecho -n 1 zkData/myidslave1echo -n 2 zkData/myidslave2echo -n 3 zkData/myid4.4 集群启动与验证zkServer.sh start zkServer.sh status正常结果一主两从1个leader、2个follower五、Hadoop3.3.6 集群完整搭建5.1 环境变量与目录创建export HADOOP_HOME/opt/module/hadoop-3.3.6 export PATH$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH5.2 七大核心配置文件可直接覆盖core-site.xmlconfiguration property namefs.defaultFS/name valuehdfs://master:9000/value /property property namehadoop.tmp.dir/name value/opt/module/hadoop-3.3.6/tmp/value /property property namehadoop.proxyuser.root.hosts/name value*/value /property property namehadoop.proxyuser.root.groups/name value*/value /property /configurationhdfs-site.xmlconfiguration property namedfs.namenode.name.dir/name value/opt/module/hadoop-3.3.6/dfs/name/value /property property namedfs.datanode.data.dir/name value/opt/module/hadoop-3.3.6/dfs/data/value /property property namedfs.replication/name value2/value /property property namedfs.namenode.http-address/name valuemaster:50070/value /property property namedfs.namenode.secondary.http-address/name valueslave1:50090/value /property /configurationyarn-site.xmlconfiguration property nameyarn.resourcemanager.hostname/name valuemaster/value /property property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.vmem-check-enabled/name valuefalse/value /property property nameyarn.log-aggregation-enable/name valuetrue/value /property /configurationmapred-site.xmlconfiguration property namemapreduce.framework.name/name valueyarn/value /property property namemapreduce.jobhistory.address/name valuemaster:10020/value /property property namemapreduce.jobhistory.webapp.address/name valuemaster:19888/value /property /configurationworkersmaster slave1 slave2hadoop-env.sh关键配置export JAVA_HOME/opt/module/jdk1.8.0_371 export HDFS_NAMENODE_USERroot export HDFS_DATANODE_USERroot export HDFS_SECONDARYNAMENODE_USERroot export YARN_RESOURCEMANAGER_USERroot export YARN_NODEMANAGER_USERroot5.3 集群初始化与启动hdfs namenode -format start-dfs.sh start-yarn.sh mapred --daemon start historyserver访问地址HDFShttp://192.168.88.100:50070YARNhttp://192.168.88.100:8088六、MySQL8.0 安装Hive元数据库wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm rpm -ivh mysql80-community-release-el7-3.noarch.rpm yum install -y mysql-community-server systemctl start mysqld systemctl enable mysqld获取临时密码并修改权限grep temporary password /var/log/mysqld.log mysql -uroot -p SET GLOBAL validate_password.policyLOW; SET GLOBAL validate_password.length6; ALTER USER rootlocalhost IDENTIFIED BY 123456; CREATE DATABASE hive_metastore DEFAULT CHARACTER SET utf8mb4; CREATE USER hive% IDENTIFIED BY hive123; GRANT ALL PRIVILEGES ON hive_metastore.* TO hive%; FLUSH PRIVILEGES;七、Hive3.1.3 数据仓库搭建7.1 解压与环境变量tar -zxvf /opt/software/apache-hive-3.1.3-bin.tar.gz -C /opt/module/ mv /opt/module/apache-hive-3.1.3-bin /opt/module/hive-3.1.3 export HIVE_HOME/opt/module/hive-3.1.3 export PATH$HIVE_HOME/bin:$PATH7.2 hive-site.xml 完整配置configuration property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://master:3306/hive_metastore?useSSLfalseserverTimezoneAsia/ShanghaicreateDatabaseIfNotExisttrue/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.cj.jdbc.Driver/value /property property namejavax.jdo.option.ConnectionUserName/name valuehive/value /property property namejavax.jdo.option.ConnectionPassword/name valuehive123/value /property property namehive.metastore.warehouse.dir/name value/user/hive/warehouse/value /property property namehive.cli.print.current.db/name valuetrue/value /property property namehive.cli.print.header/name valuetrue/value /property /configuration7.3 初始化元数据 启动schematool -dbType mysql -initSchema hdfs dfs -mkdir -p /user/hive/warehouse hdfs dfs -chmod -R 777 /user/hive/warehouse hdfs dfs -mkdir -p /tmp hdfs dfs -chmod -R 777 /tmp nohup hiveserver2 /dev/null 21八、HBase2.5.5 分布式集群搭建8.1 核心配置 hbase-site.xmlconfiguration property namehbase.rootdir/name valuehdfs://master:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name valuemaster,slave1,slave2/value /property property namehbase.zookeeper.property.dataDir/name value/opt/module/zookeeper-3.8.3/zkData/value /property property namehbase.unsafe.stream.capability.enforce/name valuefalse/value /property /configurationhbase-env.sh 关键修改export JAVA_HOME/opt/module/jdk1.8.0_371 export HBASE_MANAGES_ZKfalse8.2 启动集群start-hbase.sh九、Spark3.4.1 集群搭建集成Hive9.1 spark-env.sh 核心配置export JAVA_HOME/opt/module/jdk1.8.0_371 export HADOOP_HOME/opt/module/hadoop-3.3.6 export HADOOP_CONF_DIR$HADOOP_HOME/etc/hadoop export SPARK_MASTER_HOSTmaster export SPARK_MASTER_PORT7077 export HIVE_HOME/opt/module/hive-3.1.3 export SPARK_WORKER_MEMORY1024m export SPARK_WORKER_CORES1拷贝Hive配置让Spark支持HiveSQLcp /opt/module/hive-3.1.3/conf/hive-site.xml /opt/module/spark-3.4.1/conf/9.2 启动Spark集群start-all.shSpark WebUIhttp://192.168.88.100:8080十、大数据集群一键启停脚本终极神器start-all-cluster.sh#!/bin/bash echo 启动Zookeeper ssh master zkServer.sh start ssh slave1 zkServer.sh start ssh slave2 zkServer.sh start sleep 3 echo 启动HDFSYARN start-dfs.sh start-yarn.sh mapred --daemon start historyserver sleep 5 echo 启动HiveServer2 nohup hiveserver2 /dev/null 21 sleep 3 echo 启动HBase start-hbase.sh sleep 5 echo 启动Spark /opt/module/spark-3.4.1/sbin/start-all.sh echo 集群全部启动完成 stop-all-cluster.sh#!/bin/bash /opt/module/spark-3.4.1/sbin/stop-all.sh stop-hbase.sh ps -ef | grep hiveserver2 | grep -v grep | awk {print $2} | xargs kill -9 stop-yarn.sh stop-dfs.sh ssh master zkServer.sh stop ssh slave1 zkServer.sh stop ssh slave2 zkServer.sh stop echo 集群全部停止完成十一、全网高频报错避坑总结DataNode启动失败多次格式化导致clusterID不一致删除所有节点 dfs/data、dfs/name 重新格式化Hive初始化失败MySQL未开启远程权限务必授权hive%HMaster闪退未关闭自带ZK设置HBASE_MANAGES_ZKfalseSpark找不到Hive类未拷贝 hive-site.xml 到Spark conf虚拟机内存报错关闭YARN虚拟内存检查vmem-check-enabledfalse十二、文末资源与技术辅导本文涵盖大数据全套集群环境从零搭建、完整配置文件、一键启停脚本、报错解决方案、集群验证方案适合作为大数据毕设、课设的基础环境教程。需要全套打包资源配置文件合集、脚本文件、环境搭建文档、问题排查手册可以直接站内私信领取。针对大数据课程设计、毕业设计本人提供全流程技术辅导环境搭建调试、大数据项目开发、数据清洗分析、可视化开发、论文思路梳理、答辩指导全程一对一协助落地帮同学们顺利完成毕设交付。有需求可直接CSDN站内私信沟通。文章标签#大数据环境搭建 #Hadoop3.3.6 #Spark集群搭建 #Hive数据仓库 #HBase集群 #大数据毕设 #计算机课设