PostgreSQL两节点用keepalived实现主备的高可用架构
使用keepalived实现PostgreSQL数据库两节点主备的高可用架构部署详解环境配置和规划部署PostgreSQL的主备流复制架构keepalived介绍安装部署keepalived数据库配置配置keepalived相关参数文件启动keepalived模拟故障切换问题记录实践建议看腻了就来听听视频演示吧持续更新中https://www.bilibili.com/video/BV1K1421z7Kp/环境配置和规划hostnameOSPostgreSQLkeepalivedIPVIPdb06CentOS 7.916.22.2.8172.16.107.156172.16.107.159db07CentOS 7.916.22.2.8172.16.107.157同上部署PostgreSQL的主备流复制架构极简步骤echo172.16.107.156 db06/etc/hosts echo172.16.107.157 db07/etc/hosts systemctl stop firewalld systemctldisablefirewalld setenforce0sed-ris/SELINUXenforcing/SELINUXdisabled//etc/selinux/config echopostgres soft nproc unlimited/etc/security/limits.conf echopostgres hard nproc unlimited/etc/security/limits.conf echopostgres soft nofile 1024000/etc/security/limits.conf echopostgres hard nofile 1024000/etc/security/limits.conf echopostgres soft stack unlimited/etc/security/limits.conf echopostgres hard stack unlimited/etc/security/limits.conf echopostgres soft core unlimited/etc/security/limits.conf echopostgres hard core unlimited/etc/security/limits.conf echopostgres soft memlock unlimited/etc/security/limits.conf echopostgres hard memlock unlimited/etc/security/limits.conf yum install-y gcc zlib-devel libaio libuuid readline-devel krb5-libs libicu libicu-devel libnl libnl-devel libnl-3libnl3-devel libxslt tcl perl openldap pam openssl openssl-devel libxml2 bzip2 useradd-rmU postgres echopostgres|passwd postgres--stdinsu-postgres mkdir {pghome,pgdata,pg_archive,pg_log} chmod-R700{pghome,pgdata,pg_archive,pg_log} echoexport PGHOME/home/postgres/pghome~/.bashrc echoexport PGDATA/home/postgres/pgdata~/.bashrc echoexport PGPORT54321~/.bashrc echoexport PGDATABASEpostgres~/.bashrc echoexport LD_LIBRARY_PATH$PGHOME/lib:$LD_LIBRARY_PATH~/.bashrc echoexport PATH$PGHOME/bin:$PATH~/.bashrc source~/.bashrc tar-xzvf postgresql-16.2.tar.gz cd postgresql-16.2./configure--prefix/home/postgres/pghome/gmake world gmake install-world主节点操作initdb-D/home/postgres/pgdata/--data-checksums -E UTF8 --localezh_CN.UTF-8vi $PGDATA/postgresql.conf listen_addresses0.0.0.0port54321max_connections1024wal_buffers32MB work_mem4MB superuser_reserved_connections10max_locks_per_transaction64max_prepared_transactions0max_replication_slots10max_wal_senders50max_worker_processes8track_commit_timestampofflogging_collectoronlog_directory/home/postgres/pg_loglog_checkpointsonlog_lock_waitsonlog_statementddl log_truncate_on_rotationonlog_rotation_age1440log_rotation_size40MB wal_levelreplicawal_log_hintsonidle_in_transaction_session_timeout30minidle_session_timeout30minhba_file/home/postgres/pgdata/pg_hba.confident_file/home/postgres/pgdata/pg_ident.conf# archivearchive_modeonarchive_commandscp %p 172.16.107.156:/home/postgres/pg_archive/%farchive_timeout1800srestore_commandscp 172.16.107.156:/home/postgres/pg_archive/%f %precovery_target_timelinelatestarchive_cleanup_commandpg_archivecleanup /home/postgres/pg_archive %r# replicationcluster_namePGDBClusterhot_standbyonmax_standby_streaming_delay30s wal_receiver_status_interval10s hot_standby_feedbackonsynchronous_commitonsynchronous_standby_names# * sync; enpty asyncprimary_conninfoapplication_namepgdb01 host172.16.107.157 port54321 userreplicator passwordreplicatorvi $PGDATA/pg_hba.conf hostreplicationreplicator172.16.107.156/32scram-sha-256hostreplicationreplicator172.16.107.157/32scram-sha-256hostallall0.0.0.0/0scram-sha-256pg_ctlstartpsql-U postgres-cCREATE USER replicator REPLICATION ENCRYPTED PASSWORD replicator;备节点操作pg_basebackup-D $PGDATA-Fp-Xs-v-P-h172.16.107.156-p54321-U replicator vi $PGDATA/postgresql.conf primary_conninfoapplication_namepgdb02 host172.16.107.156 port54321 userreplicator passwordreplicatorechostandby_mode on$PGDATA/standby.signal pg_ctlstart主备状态查看pg_controldata|grep cluster# 主节点查看select*frompg_stat_replication\gx# 备节点查看select*frompg_stat_wal_receiver\gxkeepalived介绍vrrp协议的软件实现原生设计目的就是为了高可用ipvs服务keepalived官方文档https://www.keepalived.org/documentation.html功能基于vrrp协议完成地址漂移为vip地址所在的节点生成ipvs规则在配置文件中预先定义为ipvs集群的各RS做健康状态检测基于脚本调用接口完成脚本中定义的功能进而影响集群事务以此支持nginx、haproxy等服务。用户空间核心组件– vrrp stackVIP消息通告– checkers监测real server– system call实现vrrp协议状态转换时调用脚本的功能– SMTP邮件组件– IPVS wrapper生成IPVS规则– Netlink Reflector网络接口– WatchDog监控进程控制组件提供keepalived.conf的解析器完成keepalived配置IO复用器针对网络目的而优化的自己的线程抽象内存管理组件为某些通用的内存管理功能例如分配重新分配发布等提供访问权限安装部署keepalived下载keepalived选择最新版本https://www.keepalived.org/download.html安装keepalived# 安装依赖包yum install-y openssl*popt-*# 解压安装tar-xzvf keepalived-2.2.8.tar.gz cd keepalived-2.2.8./configure--prefix/usr/local/keepalivedmake make install设置keepalived为系统服务ln-s/usr/local/keepalived/sbin/keepalived/usr/sbin/cp keepalived/etc/init.d/keepalived/etc/init.d/cp/usr/local/keepalived/etc/sysconfig/keepalived/etc/sysconfig/service keepalivedstatus数据库配置创建数据库和用户createuserkeepalivedwithpasswordTest123;createdatabasekeepalived;alterdatabasekeepalived ownertokeepalived;创建同步表\c keepalived keepalived-- 创建表createtablesr_delay(idint,last_alivetimestamp(0)withouttimezonedefaultnow());-- sr_delay表插入一条数据insertintosr_delayvalues(1,now());后续keepalived会每隔指定时间探测postgresql数据库存活并且以keepalived用户登录keepalived数据库刷新这张表配置keepalived相关参数文件keepalived.conf文件mkdir-p/etc/keepalived/log/vi/etc/keepalived/keepalived.conf!ConfigurationFileforkeepalived global_defs { router_id DB1_PG_HA# 节点名各节点不同推荐使用主机名} vrrp_script check_pg_alived { script/etc/keepalived/pg_monitor.shinterval10# 脚本探测间隔每隔10sfall3# 失败尝试次数} vrrp_instance VI_1 { stateBACKUPnopreempt# 非抢占模式interface ens192# 网卡名virtual_router_id10# 虚拟路由ID各节点配置必须一致priority100# 节点优先级抢占模式与权重weight组合使用advert_int1authentication { auth_type PASS auth_pass Test123# 加入集群密码需保持一致} track_script { check_pg_alived } virtual_ipaddress {# VIP网卡信息172.16.107.159/24dev ens192 label ens192:vip }# 节点升主时执行的脚本notify_master/etc/keepalived/pg_active_standby.sh# 节点故障时执行的脚本#notify_fault /etc/keepalived/pg_fault.sh}以上是keepalived主节点配置keepalived备节点的priority修改为90节点优先级其余参数配置一样。上面程序分为以下三块global_defs通知模块定义邮件列表当keepalived发生事件时发送邮件vrrp_script定义本机检测模块每10秒执行脚本pg_monitor.shfall表示重试3次vrrp_instancevrrp实例定义模块定义了实例名称和实例路由ID实例状态定义为backup同时设置非抢占模式nopreempt当节点启动时不会抢占VIP。备节点的priority需要设置比主节点低这样高的会成为keepalived的主节点同时设置了VIP使用的网络设备是ens192。– notify_master当keepalived角色从备转成主时触发脚本– notify_fault当keepalived角色发生故障时执行的脚本监控pg_monitor.sh文件vi/etc/keepalived/pg_monitor.sh#!/bin/bash# env settingexport PGPORT54321export PGUSERkeepalived export PGDBNAMEkeepalived export PGDATA/home/postgres/pgdata export LANGen_US.utf8 export PGHOME/home/postgres/pghome export LD_LIBRARY_PATH$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib export PATH$PGHOME/bin:$PATH: MONITOR_LOG/etc/keepalived/log/pg_monitor.logSQL1update sr_delay set last_alivenow();SQL2select 1;# 脚本不检查备库存活状态如果是备库直接退出standby_flgpsql -p $PGPORT -U postgres -At -c select pg_is_in_recovery();if[${standby_flg}t];thenecho-edate %F\ %T: This is a standby database, exit!\n$MONITOR_LOGexit0fi# 主库更新sr_delay表psql-At-d $PGDBNAME-p $PGPORT-U $PGUSER-c$SQL2if[$?-eq0];thenecho $SQL1|psql-At-d $PGDBNAME-p $PGPORT-U $PGUSER echo-edate %F\ %T: Primary db is health.$MONITOR_LOGexit0elseecho-edate %F\ %T: Attention: Primary db is not health.$MONITOR_LOGexit1fi切换pg_avtive_standby.sh脚本文件发生failover故障切换脚本内容vi/etc/keepalived/pg_active_standby.sh#!/bin/bash#env settingexport PGPORT54321export PGUSERkeepalived export PG_OS_USERpostgres export PGDBNAMEkeepalived export PGDATA/home/postgres/pgdata export LANGen_US.utf8 export PGHOME/home/postgres/pghome export LD_LIBRARY_PATH$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib export PATH$PGHOME/bin:$PATH:.# variables settingLAG_MINUTES60HOST_IPhostname -iNOTICE_EMAILxxooqq.comFAILOVE_LOG/etc/keepalived/log/pg_failover.logSQL1select this_is_standby as cluster_role from (select pg_is_in_recovery() as std ) t where t.std is true;SQL2select standby_in_allowed_lag as cluster_lag from sr_delay where now() - last_alive interval $LAG_MINUTES SECONDS;# 配置对端远程管理卡ip地址、用户名、密码FENCE_IP172.16.107.157FENCE_USERroot FENCE_PWDroot# VIP 发生漂移纪录到日志echo-edate %F\ %T: keepalived VIP switchover!$FAILOVE_LOG# pg_failover函数当主库故障时激活pg_failover(){ su-$PG_OS_USER-cpg_ctl promote -D ${PGDATA}if[$?-eq0];thenecho-edate %F\ %T: hostname promote standby success.$FAILOVE_LOG fi }# 检查备库是否正常STANDBY_CNTecho $SQL1 | psql -At -d $PGDBNAME -p $PGPORT -U $PGUSER -f - | grep -c this_is_standbyecho-eSTANDBY_CNT: $STANDBY_CNT$FAILOVE_LOGif[$STANDBY_CNT-ne1];thenecho-edate %F\ %T: hostname is not standby database, failover not allow!$FAILOVE_LOGexit1fiif[$STANDBY_CNT-eq1];thenpg_failover$FAILOVE_LOG fi脚本授权chmodx/etc/keepalived/pg_*.sh启动keepalivedservice keepalivedstart模拟故障切换# 主节点pg_ctl stop# 观察日志打印信息和VIP的漂移情况tail-20f/etc/keepalived/log/pg_monitor.log tail-20f/etc/keepalived/log/pg_failover.log ip a# 原主节点已新备节点加入新集群echostandby_mode on$PGDATA/standby.signal pg_ctlstart问题记录问题一节点健康语句检查返回# 问题返回$? 只保存上一个命令的退出状态0成功1失败echoselect 1;|psql-At-p54321-U keepalived-d keepalived echo $? echoselect 1 from dual;|psql-At-p54321-U keepalived-d keepalived echo $?# 推荐使用psql-At-p54321-U keepalived-d keepalived-cselect 1;echo $? psql-At-p54321-U keepalived-d keepalived-cselect 1 from dual;echo $?问题二Keepalived默认参数配置路径Keepalived默认的配置文件路径在/etc/keepalived/keepalived.confKeepalived默认的日志文件保存在OS日志/var/log/messages[rootdb06~]# cat /usr/local/keepalived/etc/sysconfig/keepalived# Options for keepalived. See keepalived --help output and keepalived(8) and# keepalived.conf(5) man pages for a list of all options. Here are the most# common ones :## --vrrp -P Only run with VRRP subsystem.# --check -C Only run with Health-checker subsystem.# --dont-release-vrrp -V Dont remove VRRP VIPs VROUTEs on daemon stop.# --dont-release-ipvs -I Dont remove IPVS topology on daemon stop.# --dump-conf -d Dump the configuration data.# --log-detail -D Detailed log messages.# --log-facility -S 0-7 Set local syslog facility (defaultLOG_DAEMON)#KEEPALIVED_OPTIONS-D“-D”输出日志的选项-f修改默认配置文件路径-S修改日志文件的默认路径-S 0 表示local0.* 具体路径看/etc/rsyslog.conf文件配置vi/usr/local/keepalived/etc/sysconfig/keepalived KEEPALIVED_OPTIONS-f /etc/keepalived/keepalived.conf -D -S 0echolocal0.* /etc/keepalived/log/keepalived.log/etc/rsyslog.conf systemctl restart rsyslog systemctl restart keepalived问题三Keepalived主节点故障发生漂移Keepalived主节点故障此时VIP会发生漂移备库会被提升为主库此时数据库会出现双主现象# 模拟主节点故障systemctl stop keepalived# 若此时停原主库当备库重新加入集群会失败pg_ctl stop echostandby_mode on$PGDATA/standby.signal pg_ctlstart# 数据库日志打印报错消息2024-04-2215:11:02.412CST[8653]FATAL: couldnotstartWAL streaming: ERROR: requestedstartingpoint0/8A000000ontimeline4isnotinthis servers history DETAIL: This servers history forkedfromtimeline4at0/89028940.2024-04-2215:11:03.015CST[8467]LOG: new timeline5forkedoffcurrentdatabasesystem timeline4beforecurrentrecoverypoint0/8A0000A0# 此时可以查看数据库的时间线发现当新的备库时间线确实与新主库不一致pg_controldata|grep TimeLineID# 可用pg_rewind来同步时间线pg_ctl stop[postgresdb07~]$ pg_rewind--target-pgdata/home/postgres/pgdata --source-serverhost172.16.107.156 port54321 userpostgres dbnamepostgres passwordpostgrespg_rewind: servers diverged at WAL location0/89028940ontimeline4pg_rewind: rewindingfromlastcommoncheckpointat0/89027978ontimeline4pg_rewind: Done!# 修复后调整最后一行流复制的参数配置再重新拉起同上面搭建备库步骤一样vi $PGDATA/postgresql.conf primary_conninfoapplication_namepgdb02 host172.16.107.156 port54321 userreplicator passwordreplicatorechostandby_mode on$PGDATA/standby.signal pg_ctlstartpg_rewind使用要求目标服务器wal_log_hintsonfull_page_writesoninitdb初始化库时启用了数据校验checksums问题四数据库编译时报错数据库编译时缺少icu依赖# 处理方式一安装icu依赖yum install-y libicu libicu-devel# 处理方式二指定不编译icu模块./configure--prefix/home/postgres/pghome/ --without-icu问题五keepalive编译时报错keepalive编译时告警缺少icu依赖libnl依赖# 处理方式一安装libnl依赖yum install-y libnl libnl-devel libnl-3libnl3-devel# 处理方式二忽略IPV6的相关功能影响不大!实践建议配置SMTP邮件告警https://www.modb.pro/db/1785122224978071552做好备份冗余发现了一个前沿巨牛的宝藏人工智能学习网站通俗易懂风趣幽默忍不住给大家分享一下。戳一下跳转到学习