@zhangyy
2020-12-28T05:28:43.000000Z
字数 6310
阅读 607
kubernetes系列
- 一: kubernetes 高可用简介
- 二: kubernetes 高可用部署
高可用架构(扩容多Master架构)Kubernetes作为容器集群系统,通过健康检查+重启策略实现了Pod故障自我修复能力,通过调度算法实现将Pod分布式部署,并保持预期副本数,根据Node失效状态自动在其他Node拉起Pod,实现了应用层的高可用性。针对Kubernetes集群,高可用性还应包含以下两个层面的考虑:Etcd数据库的高可用性和Kubernetes Master组件的高可用性。而Etcd我们已经采用3个节点组建集群实现高可用,本节将对Master节点高可用进行说明和实施。Master节点扮演着总控中心的角色,通过不断与工作节点上的Kubelet进行通信来维护整个集群的健康工作状态。如果Master节点故障,将无法使用kubectl工具或者API做任何集群管理。Master节点主要有三个服务kube-apiserver、kube-controller-mansger和kube-scheduler,其中kube-controller-mansger和kube-scheduler组件自身通过选择机制已经实现了高可用,所以Master高可用主要针对kube-apiserver组件,而该组件是以HTTP API提供服务,因此对他高可用与Web服务器类似,增加负载均衡器对其负载均衡即可,并且可水平扩容。


在node04.flyfish 节点上面部署docker2.1 解压二进制包tar zxvf docker-19.03.9.tgzmv docker/* /usr/bin


2.2 systemd管理dockercat > /usr/lib/systemd/system/docker.service << EOF[Unit]Description=Docker Application Container EngineDocumentation=https://docs.docker.comAfter=network-online.target firewalld.serviceWants=network-online.target[Service]Type=notifyExecStart=/usr/bin/dockerdExecReload=/bin/kill -s HUP $MAINPIDLimitNOFILE=infinityLimitNPROC=infinityLimitCORE=infinityTimeoutStartSec=0Delegate=yesKillMode=processRestart=on-failureStartLimitBurst=3StartLimitInterval=60s[Install]WantedBy=multi-user.targetEOF
2.3 创建配置文件mkdir /etc/dockercat > /etc/docker/daemon.json << EOF{"registry-mirrors": ["https://b9pmyelo.mirror.aliyuncs.com"]}EOFregistry-mirrors 阿里云镜像加速器

2.4 启动并设置开机启动systemctl daemon-reloadsystemctl start dockersystemctl enable docker

部署Master2 Node(192.168.100.14)Master2 与已部署的Master1所有操作一致。所以我们只需将Master1所有K8s文件拷贝过来,再修改下服务器IP和主机名启动即可。1. 创建etcd证书目录在Master2创建etcd证书目录:mkdir -p /opt/etcd/ssl

2. 拷贝文件(Master1操作)拷贝Master1上所有K8s文件和etcd证书到Master2:scp -r /opt/kubernetes root@192.168.100.14:/optscp -r /opt/cni/ root@192.168.100.14:/optscp -r /opt/etcd/ssl root@192.168.100.14:/opt/etcdscp /usr/lib/systemd/system/kube* root@192.168.100.14:/usr/lib/systemd/systemscp /usr/bin/kubectl root@192.168.100.14:/usr/bin

3. 删除证书文件删除kubelet证书和kubeconfig文件:rm -f /opt/kubernetes/cfg/kubelet.kubeconfigrm -f /opt/kubernetes/ssl/kubelet*

4. 修改配置文件IP和主机名修改apiserver、kubelet和kube-proxy配置文件为本地IP:vim /opt/kubernetes/cfg/kube-apiserver.conf...--bind-address=192.168.100.14 \--advertise-address=192.168.100.14 \...vim /opt/kubernetes/cfg/kubelet.conf--hostname-override=node04.flyfishvim /opt/kubernetes/cfg/kube-proxy-config.ymlhostnameOverride: node04.flyfish



5. 启动设置开机启动systemctl daemon-reloadsystemctl start kube-apiserversystemctl start kube-controller-managersystemctl start kube-schedulersystemctl start kubeletsystemctl start kube-proxysystemctl enable kube-apiserversystemctl enable kube-controller-managersystemctl enable kube-schedulersystemctl enable kubeletsystemctl enable kube-proxy

kubectl get cs

7. 批准kubelet证书申请在node01.flyfish 节点 上面 批准授权kubectl get csrkubectl certificate approve node-csr-fyeyjxpS4JMpC2QvfmLOyeBbYUiMoYTSTGQETWVlqD4


kubectl get node

kube-apiserver高可用架构图:

在node05.flyfish 与node07.flyfish 上面部署 nginx 与keepalive注意在 node06.flyfish 上面部署了vmware harboryum install epel-release -yyum install nginx keepalived -y
cat > /etc/nginx/nginx.conf << "EOF"user nginx;worker_processes auto;error_log /var/log/nginx/error.log;pid /run/nginx.pid;include /usr/share/nginx/modules/*.conf;events {worker_connections 1024;}# 四层负载均衡,为两台Master apiserver组件提供负载均衡stream {log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';access_log /var/log/nginx/k8s-access.log main;upstream k8s-apiserver {server 192.168.100.11:6443; # Master1 APISERVER IP:PORTserver 192.168.100.14:6443; # Master2 APISERVER IP:PORT}server {listen 6443;proxy_pass k8s-apiserver;}}http {log_format main '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$http_x_forwarded_for"';access_log /var/log/nginx/access.log main;sendfile on;tcp_nopush on;tcp_nodelay on;keepalive_timeout 65;types_hash_max_size 2048;include /etc/nginx/mime.types;default_type application/octet-stream;server {listen 80 default_server;server_name _;location / {}}}EOF
cat > /etc/keepalived/keepalived.conf << EOFglobal_defs {notification_email {acassen@firewall.locfailover@firewall.locsysadmin@firewall.loc}notification_email_from Alexandre.Cassen@firewall.locsmtp_server 127.0.0.1smtp_connect_timeout 30router_id NGINX_MASTER}vrrp_script check_nginx {script "/etc/keepalived/check_nginx.sh"}vrrp_instance VI_1 {state MASTERinterface ens33virtual_router_id 51 # VRRP 路由 ID实例,每个实例是唯一的priority 100 # 优先级,备服务器设置 90advert_int 1 # 指定VRRP 心跳包通告间隔时间,默认1秒authentication {auth_type PASSauth_pass 1111}# 虚拟IPvirtual_ipaddress {192.168.100.100/24}track_script {check_nginx}}EOF
vrrp_script:指定检查nginx工作状态脚本(根据nginx状态判断是否故障转移)virtual_ipaddress:虚拟IP(VIP)
检查nginx状态脚本:cat > /etc/keepalived/check_nginx.sh << "EOF"#!/bin/bashcount=$(ps -ef |grep nginx |egrep -cv "grep|$$")if [ "$count" -eq 0 ];thenexit 1elseexit 0fiEOFchmod +x /etc/keepalived/check_nginx.sh
cat > /etc/keepalived/keepalived.conf << EOFglobal_defs {notification_email {acassen@firewall.locfailover@firewall.locsysadmin@firewall.loc}notification_email_from Alexandre.Cassen@firewall.locsmtp_server 127.0.0.1smtp_connect_timeout 30router_id NGINX_BACKUP}vrrp_script check_nginx {script "/etc/keepalived/check_nginx.sh"}vrrp_instance VI_1 {state BACKUPinterface ens33virtual_router_id 51 # VRRP 路由 ID实例,每个实例是唯一的priority 90advert_int 1authentication {auth_type PASSauth_pass 1111}virtual_ipaddress {192.168.31.88/24}track_script {check_nginx}}EOF
上述配置文件中检查nginx运行状态脚本:cat > /etc/keepalived/check_nginx.sh << "EOF"#!/bin/bashcount=$(ps -ef |grep nginx |egrep -cv "grep|$$")if [ "$count" -eq 0 ];thenexit 1elseexit 0fiEOFchmod +x /etc/keepalived/check_nginx.sh注:keepalived根据脚本返回状态码(0为工作正常,非0不正常)判断是否故障转移。
5. 启动并设置开机启动systemctl daemon-reloadsystemctl start nginxsystemctl start keepalivedsystemctl enable nginxsystemctl enable keepalived


6. 查看keepalived工作状态ip addr在node05.flyfish 上面 有一个 虚拟VIP

7. Nginx+Keepalived高可用测试关闭主节点Nginx,测试VIP是否漂移到备节点服务器。杀掉node05.flyfish 的nginxpkill nginx查看浮动IP 是否 飘到了node07.flyfish 节点

可以看到 浮动VIP 已经飘到了node07.flyfish 主机上面了

去任意一个k8s 节点查看 服务器VIP 是否能够 获取到kube-apiserver 的 信息curl -k https://192.168.100.100:6443/version

检查 nignx 日志

虽然我们增加了Master2和负载均衡器,但是我们是从单Master架构扩容的,也就是说目前所有的Node组件连接都还是Master1,如果不改为连接VIP走负载均衡器,那么Master还是单点故障。因此接下来就是要改所有Node组件配置文件,由原来192.168.100.11修改为192.168.100.100(VIP):

所有node 节点执行命令sed -i 's#192.168.31.71:6443#192.168.31.88:6443#' /opt/kubernetes/cfg/*systemctl restart kubeletsystemctl restart kube-proxykubectl get node

至此 k8s 多节点master 集群配置完成