[关闭]
@zhangyy 2021-07-23T23:02:29.000000Z 字数 1484 阅读 241

greenplum 的故障处理

greenplum系列



一:greenplum 分布式集群 standby 故障处理

  1. 模拟master standby 坏掉,删除目录,重启
  2. rm -rf /greenplum/gpdata/master/*

  1. 正常情况
  2. gpstate -f

image_1eli3gpr8anm1sij8v9pm8ahv9.png-360.9kB


  1. rm -rf /greeplum/gpdata/master/*
  2. reboot

image_1eli3mjir1lsv123l1msn46q5gmm.png-288.8kB

image_1eli3ntlc1ed1fcatah2b11bfl13.png-387.9kB


  1. 主节点:
  2. gpinitstandby -r 清除standby 节点

image_1eli410t7qbs13hv1f4511i715mc1t.png-233.1kB


  1. 添加一个备库
  2. gpinitstandby -s node02.flyfish.cn

image_1eli42fl7tmk906fq55sfimq2a.png-409.8kB

image_1eli406gm69r9dnhcj8c9e3g1g.png-200.8kB

image_1eli435en1lcd1e6h47spp0mmq2n.png-349.1kB


二:greenplum主库挂掉了

  1. 模拟master坏掉,删除目录,重启
  2. rm -rf /greenplum/gpdata/master/*
  3. reboot
  4. 就standby 节点故障:
  5. 在master节点上面执行:
  6. 删除故障的standby节点
  7. gpinitstandby -r -a
  8. 重新同步standby节点:
  9. gpinitstandby -s node02.flyfish.cn
  1. 1. 先切换备库作为主库
  2. 激活:standby
  3. gpactivatestandby -d /greenplum/gpdata/master/gpseg-1
  4. gpstate -f

image_1eli5drbr104frntgkup11rlv9.png-202kB

  1. psql -c "select * from gp_segment_configuration order by content asc,dbid;"

image_1eli5ftbk1on31nk07upgjl1588m.png-145.4kB

  1. 然后在增加一个standby
  2. gpinitstandby -s node01.flyfish.cn
  3. 这样(node01.flyfish.cn) master 就成为standby
  4. (node02.flyfish.cn) 就成为master

image_1eli5kbrojadk1u10v8cu31d0j13.png-330.7kB


  1. 主机切回
  2. node02.flyfish.cn 关闭主库:
  3. gpstop -m -a

image_1eli5o33hpu6c2v1nquh01ppj1g.png-230.9kB

  1. node01.flyfish.cn 成为主库:
  2. gpactivatestandby -d /greenplum/gpdata/master/gpseg-1

image_1eli60imo157hkg71639140s1342n.png-353.9kB

  1. node02.flyfish.cn:
  2. rm -rf /greenplum/gpdata/master/gpseg-1
  3. reboot

image_1eli5vret1v57kr12m51jbub5v2a.png-106.3kB


  1. node01.flyfish.cn
  2. 执行 加添node02.flyfish.cn 成为备库
  3. gpinitstandby -a -s node02.flyfish.cn

image_1eli5v8021fke1loprt19sedfa1t.png-373.6kB

image_1eli61dbj1kdqj2k1vea191or2f34.png-352.4kB

image_1eli634pp1foiquf8c3lfjqd23h.png-141.7kB

image_1eli6490g1d2i447dcugjfen03u.png-129.5kB


三:segment 主机故障:

  1. 模拟segment 主机故障
  2. 集群由于负载均衡高,坏cpu,坏内存,坏硬盘
  3. ---
  4. 比如: node03.flyfish.cn segment 主坏掉
  5. 新买一台服务器:
  6. 安装系统,配置,配置SSH 无密钥认证,ip一样,主机名一样,跟增加主机节点一样。
  7. ---
  8. rm -rf /greenplum/gpdata/primary/*
  9. rm -rf /greenlum/gpdata/mirror/*

image_1eli72hl31d5l199vr691oej7o54b.png-80.1kB


  1. gpstate -f
  2. node03.flyfish.cn 挂掉了

image_1eli777gfoqv1dtuib8b9tcj74o.png-210.3kB

image_1eli7akf72sh1hmo1ijb1e8rqdo55.png-223.2kB

  1. node01.flyfish.cn 上面恢复
  2. gprecoverseg -o ./recover

image_1eli7fa0f1qrqsq1q7k6231vn05i.png-301.9kB

  1. gprecoverseg -i ./recover 修复一下
  2. gprecoverseg -i ./recover -F 全部修复

image_1eli7hsms2l4v5o1qfs1mo51ek45v.png-310.1kB

image_1eli7ied316871jme188arcv1g506c.png-286.8kB

image_1eli7sjaq1dcqv9diacq8o1j3q6p.png-365.3kB

image_1eli7t8qd1coa1boi17ba5h411m776.png-324.8kB


  1. 一个段损坏
  2. cd /greenplum/gpdata/mirror/gpseg3
  3. rm -rf pg_tblspc/
  4. 杀掉一个这个段的进程
  5. kill -9 93213

image_1eli8upcut6h19po1450m9fkvs7j.png-263.6kB


  1. 进行恢复
  2. gprecoverseg -o ./recover1
  3. cat ./recover1

image_1eli92hstijib0rqglnp71s2u80.png-216.3kB

  1. 恢复:
  2. gprecoverseg -i recover1 -F

image_1eli94jk41ar91ifd1rc1rvlodt8d.png-191.8kB


  1. gprecoverseg -r
  2. 从新平衡segment

image_1f2im674e15n91eijccsb2b15v09.png-398.8kB

image_1f2im6rkl80q4r95fj1pe1117nm.png-290.8kB

  1. 重启集群:
  2. gpstop -r -a
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注