nagios自己也有日志記錄呢!趕快打開看一眼,發(fā)現(xiàn)里面有不少Warning,抽一個(gè)出來,其內(nèi)容如下:
[1217166816] HOST NOTIFICATION: sery;mail-server;DOWN;host-notify-by-email;CRITICAL - Plugin timed out after 10 seconds
[1217166816] Warning: Attempting to execute the command "/usr/bin/printf "%b" "***** Nagios 2.9 *****\n\nNotification Type: PROBLEM\nHost: mail-server\nState: DOWN\nAddress: 211.155.115.66\nInfo: CRITICAL - Plugin timed out after 10 seconds\n\nDate/Time: Sun Jul 27 13:53:36 UTC 2008\n" | /bin/mail -s "Host DOWN alert for mail-server!" ABC@163.com" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists...
原因:郵件路徑不對(duì)
其他的行也更這個(gè)類似;最有用的信息我用紅色標(biāo)記,其大意是不能執(zhí)行上面的2進(jìn)制或可執(zhí)行文件。在這個(gè)條目中,只有2個(gè)執(zhí)行文件—printf及mail。我把它按原樣單獨(dú)拿出來執(zhí)行,操作過程如下:
(1)/usr/bin/printf “"%b" "***** Nagios 2.9 *****\n” 輸出 ***** Nagios 2.9 *****,這是正常的結(jié)果。
(2)/bin/mail -s "Host DOWN alert for mail-server!" sery@163.com 輸出su: /bin/mail: No such file or directory,沒找到路徑或目錄。前面還手動(dòng)發(fā)了郵件的,明明有mail這個(gè)客戶端程序呀!可能這個(gè)路徑不對(duì),是linux的mail路徑。查一下freebsd的mail路徑,執(zhí)行find / -name 得到mail在freebsd的路徑為/usr/bin/mail 。
到這里,我們知道了為啥不能發(fā)郵件的根本原因,接下來,我把nagios的配置文件commands.cfg的host-notify-by-email、service-notify-by-email的”/bin/mail”替換為“/usr/bin/mail”。其完整形式為:
# 'host-notify-by-email' command definition
define command{
command_name host-notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 2.9 *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "Host $HOSTSTATE$ alert for $HOSTNAME$!" $CONTACTEMAIL$
}
# 'notify-by-email' command definition
define command{
command_name service-notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 2.9 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
修改完配置文件commands.cfg后重啟 Nagios,再查看nagios日志,不再有“Make sure the script or binary you are trying to execute actually exists...”報(bào)錯(cuò),并且有發(fā)送報(bào)警郵件的記錄了:
[root@nagios /usr/local/nagios/var]# tail -f nagios.log
[1217170467] SERVICE ALERT: mail-server;check_tcp 995;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1217170534] Auto-save of retention data completed successfully.
[1217170577] HOST ALERT: mail-server;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
[1217170587] HOST ALERT: mail-server;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
[1217170597] HOST ALERT: mail-server;DOWN;SOFT;3;CRITICAL - Plugin timed out after 10 seconds
[1217170607] HOST ALERT: mail-server;DOWN;SOFT;4;CRITICAL - Plugin timed out after 10 seconds
[1217170607] HOST ALERT: mail-server;UP;SOFT;5;PING OK - Packet loss = 0%, RTA = 111.63 ms
[1217170607] SERVICE ALERT: mail-server;check_tcp 995;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds
[1217170687] SERVICE ALERT: mail-server;check_tcp 995;OK;SOFT;3;TCP OK - 3.137 second response time on port 995
[1217171057] SERVICE NOTIFICATION: sery;fav-0;check_tcp 443;CRITICAL;service-notify-by-email;CRITICAL - Socket timeout after 10 seconds
收郵件,迫不及待,哈哈,我的163郵箱收到久違的報(bào)警信息了。再回去瞧一眼郵件日志/var/log/malllog,也記錄了這個(gè)發(fā)送情況。
經(jīng)驗(yàn)總結(jié):通過日志記錄,對(duì)于我們排查故障確實(shí)有著不可估量的好處。在實(shí)際的工作中,我們應(yīng)該隨時(shí)檢查系統(tǒng)日志以及應(yīng)用程序相關(guān)的日志,從記錄項(xiàng)中尋找蛛絲馬跡,從而得出解決問題的方法。
posted on 2009-05-28 21:50
Blog of JoJo 閱讀(1414)
評(píng)論(0) 編輯 收藏 所屬分類:
每日一記