@tony-yin
2017-08-20T08:44:58.000000Z
字数 1701
阅读 1160
Magicloud
pid文件路径: /var/run/mcs3-smart-monitor.piddaemon所在位置: /etc/init.d/log打印位置:/var/log/mccloudstor/mcs3-disk-mon.log操作方式:
service mcs3-smart-monitor startservice mcs3-smart-monitor stopservice mcs3-smart-monitor restart改动
daemon相关代码,需restart daemon才可以生效
1. 单位时间避免邮件重复发送: 通过声明一个全局变量send_email_time, 记录当前时间的一个小时的时间,一旦发送邮件立即更新该变量为当前时间
sent_mail_time = datetime.datetime.now() - datetime.timedelta(0,3600) # 1 hourdef send_disk_status_notification(disk_status):global sent_mail_timenow = datetime.datetime.now()if now < sent_mail_time + datetime.timedelta(0,3600):logger.info("Notification sent within one hour before. System will not send again.")returnhost = socket.gethostname()title = "Host {} Disk Health Status Warning!".format(host)message = disk_statustry:utils.send_notification(title, message)sent_mail_time = datetime.datetime.now()except Exception as e:logger.error(str(e))
2. 执行命令报错影响其他代码的执行,有以下几处命令执行:
line48:
output = utils.do_cmd("zpool status|grep state", force=True)
line71~75
VDSTATE1 = do_cmd(MEGACLI_BIN + " -cfgdsply -aALL -NoLog | grep State")VDSTATE2 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep Degraded")VDSTATE3 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep Offline")PDSTATE1 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep \"Critical Disks\"")PDSTATE2 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep \"Failed Disks\"")
暂时处理方法:在这些方法的外面单独包一层
try catch
3. SSD Disk信息获取
smartctl -a -d megaraid,{} {}|grep 'Media_Wearout_Indicator'
smartctl -a -d megaraid,{} {}|grep 'Serial Number'
1则为机械硬盘,如果为0则为SSD Disk
cat /sys/block/{}/queue/rotational // param such as sda or sdb etc
