@zhangyy
2020-10-26T14:49:51.000000Z
字数 4183
阅读 158
greenplum系列
数据量的增加,磁盘空间不足
查询慢,性能跟不上,CPU,磁盘IO
网络传输量增加,网卡限制
扩容一般指扩segment:
横向扩展:增加segment 机器,3台主机:3seg >>> 6 台主机 3 个seg,一般是物理主机性能不够,数据库性能足够/不足
纵向扩展:扩单个主机上面的segment服务,3台主机:3 seg >> 3 台 主机:6 seg 一般是 物理主机性能足够,数据库性能不足
纵向扩展,加机器,整体流程
1. 新机器环境准备,信任关系
2. 新机器需要安装GP数据库软件
3. 原集群的一些操作准备
4. 扩容操作(6.x 之后可以在线扩容)
5. 数据重新分布
6. 统计信息需要更新
7.元数据一致性验证
1.数据准备
create role itpux with login password 'itpux';
alter role itpux superuser createdb createrole replication;
create database itpuxdb WITH OWNER itpux ENCODING ='UTF8';
---
2. 导入数据验证:
psql -U itpux -d itpuxdb -h192.168.100.11 < itpuxdb-itpux_yg.sql
\c itpuxdb
\du+
\dt+
select gp_segment_id,count(*) from itpux_yg group by gp_segment_id;
select * from pg_indexes where tablename='itpux_yg';
select count(*) from itpux_yg;
存在索引
创建主机文件
all_hosts.txt 原来的所有主机
---
node01.vpc.flyfish.cn
node02.vpc.flyfish.cn
node03.vpc.flyfish.cn
node04.vpc.flyfish.cn
node05.vpc.flyfish.cn
----
all_seg.txt 所有的segment 的 主机
---
node03.vpc.flyfish.cn
node04.vpc.flyfish.cn
node05.vpc.flyfish.cn
---
新主机文件:
all_hosts_20201023.txt
-------
node01.vpc.flyfish.cn
node02.vpc.flyfish.cn
node03.vpc.flyfish.cn
node04.vpc.flyfish.cn
node05.vpc.flyfish.cn
node06.vpc.flyfish.cn
node07.vpc.flyfish.cn
-------
all_seg_20201023.txt
---
node03.vpc.flyfish.cn
node04.vpc.flyfish.cn
node05.vpc.flyfish.cn
node06.vpc.flyfish.cn
node07.vpc.flyfish.cn
----
seg_host_20201023.txt
---
node06.vpc.flyfish.cn
node07.vpc.flyfish.cn
---
cd /usr/local/greenplum-db/config
创建all_hosts_20201023.txt all_seg_20201023.txt seg_host_20201023.txt
三个文件将内容 填进去
打开新增加的主机
node06.vpc.flyfish.cn
node07.vpc.flyfish.cn
两台主机
node06.vpc.flyfish.cn 环境初始化:同安装时候一样
node07.vpc.flyfish.cn 环境初始化:同安装时候一样
一直到:
.......
----
chown -R gpadmin:gpadmin /usr/local/greenplum*
chmod -R 775 /usr/local/greenplum*
chmod -R 775 /greenplum
chown -R gpadmin:gpadmin /greenplum
----
然后做gpadmin无密钥登录:
增加 新增节点的的密钥之后测试:
gpssh -f /usr/local/greenplum-db/config/all_hosts_20201023.txt -e 'ls -ls /greenplum'
gpssh -f /usr/local/greenplum-db/config/seg_hosts_20201023.txt -e 'ls -ls /greenplum'
gpssh -f /usr/local/greenplum-db/config/all_hosts_20201023.txt -e 'date'
元数据监测
gpcheckcat -O -v
唯一建的处理
select * from pg_indexes where indexdef like 'CREATE UNIQUE INDEX%' and tablename not
like 'pg_%' and tablename not like 'gp_%';
倒出唯一建
cd /greenplum/soft/
pg_dump -s -f itpuxdb-index.sql --gp-syntax itpuxdb
cat itpuxdb-index.sql
删掉 索引
drop index indx_itpux_yg_jobid;
备份数据
pg_dump / pgbackup
pg_restore / pgrestore
扩容操作:
cd /usr/local/greenplum-db/config
gpexpand -f seg_hosts_20201026.txt (蒋增加的节点写进去)
cat gpexpand_inputfile_20201026_120304
在线扩容避免业务使用
可以改pg_hba.conf
host all all 0.0.0.0/0 reject
然后重启 greenplum
gpexpand -i gpexpand_inputfile_20201026_120304
psql -c "select hostname from gp_segment_configuration group by hostname;"
psql -c "select * from gp_segment_configuration where status != 'u' or role !=
preferred_role or mode != 's';"
psql -c "select * from gp_segment_configuration order by content asc,dbid;"
select * from gpexpand.expansion_progress;
select status,count(1),round(sum(source_bytes/1024/1024/1024),2) as size_gb from gpexpand.status_detail group by status order by 1;
select fq_name,expansion_started,expansion_finished,source_bytes from gpexpand.status_detail where status='IN PROGRESS';
select gp_segment_id,count(*) from itpux_yg group by gp_segment_id;
重新分布
gpexpand -a -n 2 -d 00:05:00
gpexpand -a -n 2 -d 05:30:00 >> gpexpand20200502.log &
生产环境数据比较大要很多个小时
select gp_segment_id,count(*) from itpux_yg group by gp_segment_id;
查看平衡进度
select * from gpexpand.expansion_progress;
select status,count(1),round(sum(source_bytes/1024/1024/1024),2) as size_gb from gpexpand.status_detail group by status order by 1;
select fq_name,expansion_started,expansion_finished,source_bytes from gpexpand.status_detail where status='IN PROGRESS';
select * from gpexpand.status;
select * from gp_distribution_policy;
select gp_segment_id,count(*) from itpux_yg group by gp_segment_id;
psql -d itpuxdb -h 192.168.1.11 < itpuxdb-index.sql 将数据索引导入进去
在重新检查数据
gpcheckcat -O -v
清掉扩容信息
gpextend -c
5个主segment主机,3p+3M=6个*5=30 segment
5个主segment主机,个扩展1P+1M, 4P+4M=8个*5=40 segment
psql -c "select * from gp_segment_configuration order by content asc,dbid;"
segment 所有主机
cat all_seg_20201026.txt
gpexpand -f all_seg_20201026.txt
gpssh -f /usr/local/greenplum-db/config/all_seg_20201026.txt -e 'ls -ls /greenplum/gpdata/primary'
psql -c "select * from gp_segment_configuration order by content asc,dbid;"
数据重新分布
gpexpand -a -n 2 -d 00:05:00
select gp_segment_id,count(*) from itpux_yg group by gp_segment_id;
将配置信息清掉
gpexpand -c