@tony-yin
2017-08-20T16:44:58.000000Z
字数 1701
阅读 911
Magicloud
pid
文件路径: /var/run/mcs3-smart-monitor.pid
daemon
所在位置: /etc/init.d/
log
打印位置:/var/log/mccloudstor/mcs3-disk-mon.log
操作方式:
service mcs3-smart-monitor start
service mcs3-smart-monitor stop
service mcs3-smart-monitor restart
改动
daemon
相关代码,需restart daemon
才可以生效
1. 单位时间避免邮件重复发送: 通过声明一个全局变量send_email_time
, 记录当前时间的一个小时的时间,一旦发送邮件立即更新该变量为当前时间
sent_mail_time = datetime.datetime.now() - datetime.timedelta(0,3600) # 1 hour
def send_disk_status_notification(disk_status):
global sent_mail_time
now = datetime.datetime.now()
if now < sent_mail_time + datetime.timedelta(0,3600):
logger.info("Notification sent within one hour before. System will not send again.")
return
host = socket.gethostname()
title = "Host {} Disk Health Status Warning!".format(host)
message = disk_status
try:
utils.send_notification(title, message)
sent_mail_time = datetime.datetime.now()
except Exception as e:
logger.error(str(e))
2. 执行命令报错影响其他代码的执行,有以下几处命令执行:
line48
:
output = utils.do_cmd("zpool status|grep state", force=True)
line71~75
VDSTATE1 = do_cmd(MEGACLI_BIN + " -cfgdsply -aALL -NoLog | grep State")
VDSTATE2 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep Degraded")
VDSTATE3 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep Offline")
PDSTATE1 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep \"Critical Disks\"")
PDSTATE2 = do_cmd(MEGACLI_BIN + " -AdpAllInfo -aALL -NoLog | grep \"Failed Disks\"")
暂时处理方法:在这些方法的外面单独包一层
try catch
3. SSD Disk
信息获取
smartctl -a -d megaraid,{} {}|grep 'Media_Wearout_Indicator'
smartctl -a -d megaraid,{} {}|grep 'Serial Number'
1
则为机械硬盘,如果为0
则为SSD Disk
cat /sys/block/{}/queue/rotational // param such as sda or sdb etc