日歷
| 日 | 一 | 二 | 三 | 四 | 五 | 六 |
---|
26 | 27 | 28 | 29 | 30 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 1 | 2 | 3 | 4 | 5 | 6 |
|
統(tǒng)計(jì)
- 隨筆 - 11
- 文章 - 0
- 評(píng)論 - 28
- 引用 - 0
導(dǎo)航
常用鏈接
留言簿(2)
隨筆分類
隨筆檔案
搜索
最新評(píng)論

閱讀排行榜
評(píng)論排行榜
|
一般主動(dòng)告警系統(tǒng)的告警信息采集主要有5種方法:
? ?1. 在告警服務(wù)器ping各種設(shè)備, 判斷設(shè)備是否存活和掉包率 ?2. 接收設(shè)備發(fā)過(guò)來(lái)的系統(tǒng)日志(syslog), 并通過(guò)相應(yīng)的規(guī)則庫(kù)(正則表達(dá)式)匹配判斷是否需要告警 ?3. 接收設(shè)備發(fā)過(guò)來(lái)的snmp Trap信息, 進(jìn)行判斷告警 ?4. 提取網(wǎng)管系統(tǒng)的告警信息 ?5. 通過(guò)snmp協(xié)議, 取回相應(yīng)oid的值, 進(jìn)行判斷告警 ? ? 什么是snmp:
? ?Simple Network Management Protocol (SNMP)提供了一些"簡(jiǎn)單"的操作, 允許你更容易的監(jiān)控和管理網(wǎng)絡(luò)設(shè)備, 例如路由器,交換機(jī),服務(wù)器,打印機(jī)等等. 通過(guò)snmp你可以監(jiān)控很多信息, 例如端口流量, 路由器里面的溫度, cpu使用率等等. 學(xué)習(xí)snmp其實(shí)并不是特別簡(jiǎn)單, 請(qǐng)通過(guò)別的資料學(xué)習(xí)更多的方面, 特別是mib,oid之類的概念. ?推薦學(xué)習(xí)Essential SNMP, 2nd Edition這本書(shū). ? ? 如何收集數(shù)據(jù):? ?如果安裝了NET-SNMP, 可以從 http://net-snmp.sourceforge.net/獲取NET-SNMP的RPM包以及源代碼。下載 解壓后 su?-
cd?ucd-snmp-4.2.3
./configure?--prefix=/usr??<--?缺省是/usr/local
make?clean?all
make?install snmpget?<target>?public?system.sysDescr.0 應(yīng)該可以看到一個(gè)關(guān)于系統(tǒng)的簡(jiǎn)短描述,類似這樣:
system.sysDescr.0?=?Sun?SNMP?Agent,?Ultra-60 上述命令中的public可以理解為SNMP agent的口令,術(shù)語(yǔ)叫做"community string"。 許多網(wǎng)絡(luò)設(shè)備、操作系統(tǒng)都用"public"做為缺省"community string",潛在帶來(lái)安全 問(wèn)題。應(yīng)該修改這個(gè)缺省"community string"。 上述命令還可以寫(xiě)成: snmpget?<target>?public?.1.3.6.1.2.1.1.1.0 "system.sysDescr.0"只是".1.3.6.1.2.1.1.1.0"的另一種表述方式,最終還是要轉(zhuǎn) 換成數(shù)字形式的OID(對(duì)象標(biāo)識(shí)符)。
snmpget返回一個(gè)值, 類型可以是數(shù)值或者字符串等, 還有一個(gè)snmpwalk的操作, 大概就是返回一個(gè)數(shù)組的結(jié)果. 本系統(tǒng)使用java語(yǔ)言實(shí)現(xiàn), 在網(wǎng)上下載了一個(gè)開(kāi)源的snmp實(shí)現(xiàn), 假設(shè)有以下工具類:
public?class?Poller
  {
????public?Poller(?String?host,?String?community,?int?version?)
????????throws?IOException
 ???? {
????????//? 
????}
????
????public?String?get(?String?oid?)
????????????throws?IOException
 ???? {
????????//? 
????????return?null;
????}
????
????
????public?Map<String,?String>?walk(?String?base,?int?startIndex,
????????????int?indexCount?)
 ???? {
????????//? 
????????return?null;
????}????
????
????public?void?close()
 ???? {
????}
????
????public?static?void?main(?String[]?args?)
 ???? {
????????Poller?poller?=?new?Poller( );?//?該ip對(duì)應(yīng)的設(shè)備是cisco-6509
????????
????????//?1.?cpu告警
????????String?cpuStr?=?poller.get(?"1.3.6.1.4.1.9.9.109.1.1.1.1.5.9"?);?//?cisco-6509的CPU使用率
????????long?cpu?=?Long.parseLong(?valueStr?);
????????
????????if?(?cpu?>?85?)
 ???????? {
????????????System.out.println(?"告警!?cisco-6509的CPU使用率超過(guò)85%"?)?;
????????}
????????
????????//?2.?板卡告警
????????String?statusStr?=?poller.get(?"1.3.6.1.4.1.9.5.1.3.1.1.10.1"?);?//?cisco-6509的第一個(gè)板卡狀態(tài)
????????long?status?=?Long.parseLong(?statusStr?);
????????
????????if?(?value?!=?2?&&?value?!=?1?)?//?1:未知?2:normal?3:minorFault?4:majorFault
 ???????? {
????????????System.out.println(?"告警!?cisco-6509的第一個(gè)板卡狀態(tài)不正常"?)?;
????????}
????????
????????//?3.?流量告警
????????String?octetStr?=?poller.get(?"ifHCInOctets.10"?);?//?cisco-6509的第10個(gè)接口的輸入流量,?單位Byte
????????long?value?=?Long.parseLong(?octetStr?);
????????long?time?=?System.currentTimeMillis()/1000;
????????long?lastValue?=?getLastValue( );?//?從數(shù)據(jù)庫(kù)或文件取上次的流量值
????????long?lastTime?=?getLastTime( );?//?從數(shù)據(jù)庫(kù)或文件取上次采集的時(shí)間
????????
????????if?(?(value-lastValue)/(time-lastTime)*8>800000000?)?//?一般流量單位是?bit/s,?所以要乘以8
 ???????? {
????????????System.out.println(?"告警!?cisco-6509的第10個(gè)接口的輸入流量超過(guò)800M"?)?;
????????}
????????
????????
????????poller.close();????????
????}
} 在上面的main函數(shù), 我們已經(jīng)基本可以實(shí)現(xiàn)snmp的告警功能了, 可是這樣相當(dāng)不靈活, 全部都是硬編碼, 每添加一個(gè)新的snmp告警都要新加代碼模塊
?經(jīng)過(guò)分析, 大部分的snmp采集告警都是這樣的過(guò)程: ? ?1. 取得某設(shè)備的對(duì)象ID(oid) ?2. 通過(guò)snmp協(xié)議得到該oid相應(yīng)的值, 賦值給value這個(gè)變量 ?3. 取當(dāng)前的時(shí)間(秒), 賦值給time這個(gè)變量 ?4. 取上次采集的值和時(shí)間, 分別賦值給lastValue, lastValue ?5. 根據(jù)該oid返回值代表含義, 構(gòu)造一個(gè)表達(dá)式, 這個(gè)表達(dá)式只能包括value, time, lastValue, lastTime這4個(gè)變量, ?有時(shí)不必全部用上, 而且該表達(dá)式應(yīng)回一個(gè)布爾類型的值, 如果為真則需要告警 ?6. 保存value, time為lastValue, lastTime, 用來(lái)在下次采集判斷時(shí)使用 ? ?這個(gè)時(shí)候就比較清楚了, 如果有一種動(dòng)態(tài)語(yǔ)言或動(dòng)態(tài)腳本在java環(huán)境里能運(yùn)行就能夠比較靈活的實(shí)現(xiàn)snmp告警了, 不需要硬編碼所有的告警情況, 只需要在ui界面添加修改告警表達(dá)式就ok了 ?經(jīng)過(guò)在http://www.open-open.com或http://java-source.net上搜索, 發(fā)現(xiàn)BeanShell這個(gè)項(xiàng)目, 官方網(wǎng)站是http://www.beanshell.org/?
?Beanshell是用Java寫(xiě)成的,一個(gè)小型的、免費(fèi)的、可以下載的、嵌入式的Java源代碼解釋器,具有對(duì)象腳本語(yǔ)言特性。BeanShell執(zhí)行標(biāo)準(zhǔn)Java語(yǔ)句和表達(dá)式,另外包括一些腳本命令和語(yǔ)法。它將腳本化對(duì)象看作簡(jiǎn)單閉包方法(simple method closure)來(lái)支持,就如同在Perl和JavaScript中的一樣。
以下是用BeanShell改寫(xiě)的snmp告警模塊:
package?com.kelefa.warnlet.job;

import?java.io.IOException;
import?java.util.Date;

import?org.apache.log4j.Logger;
import?org.hibernate.HibernateException;
import?org.hibernate.classic.Session;

import?bsh.EvalError;
import?bsh.Interpreter;

import?com.kelefa.common.util.HibernateUtil;
import?com.kelefa.warnlet.dao.WarningDAO;
import?com.kelefa.warnlet.interpreter.SimpleInterpreter;
import?com.kelefa.warnlet.snmp.Poller;
import?com.kelefa.warnlet.vo.Device;
import?com.kelefa.warnlet.vo.SnmpObject;
import?com.kelefa.warnlet.vo.Warning;

public?class?SnmpTask
????????implements?Runnable
  {
????private?final?static?Logger?log?=?Logger.getLogger(?SnmpTask.class?);

????private?SnmpObject?snmpObject;

????private?WarningDAO?warningDAO;

????private?static?final?String?BSH?=?"bsh://";

????public?SnmpTask(?SnmpObject?snmpObject,?WarningDAO?warningDAO?)
 ???? {
????????this.snmpObject?=?snmpObject;
????????this.warningDAO?=?warningDAO;
????}

????public?void?run()
 ???? {
????????log.debug(?"----snmpObject.id="?+?snmpObject.getId()?);
????????try
 ???????? {
????????????Session?session?=?HibernateUtil.currentSession();
????????????HibernateUtil.beginTransaction();
????????????session.refresh(?snmpObject?);

????????????doSnmpTask();

????????????HibernateUtil.commitTransaction();
????????}
????????catch?(?Exception?ex?)
 ???????? {
????????????HibernateUtil.rollbackTransaction();
????????????log.warn(?ex.getMessage()?);
????????}
????????finally
 ???????? {
????????????HibernateUtil.closeSession();
????????}
????????log.debug(?"++++snmpObject.id="?+?snmpObject.getId()?);
????}

 ????/**?*//**
?????*?執(zhí)行snmp任務(wù),?包括:?
?????*?1.?用snmp協(xié)議取相應(yīng)oid的值,?如果網(wǎng)絡(luò)異常或oid設(shè)置錯(cuò)誤則直接結(jié)束?
?????*?2.?如果返回的字符串不是數(shù)字則直接結(jié)束
?????*?3.?用BSH運(yùn)算告警表達(dá)式,?表達(dá)式錯(cuò)誤結(jié)束?
?????*?4.?告警表達(dá)式返回真,?進(jìn)行告警?
?????*?5.?更新最后時(shí)間,值
?????*?
?????*/
????private?void?doSnmpTask()
 ???? {
????????Device?device?=?snmpObject.getDevice();

????????String?valueStr;
????????try
 ???????? {
????????????valueStr?=?snmpget(?device.getIp(),?device.getCommunity(),?device
????????????????????.getSnmpVersion(),?snmpObject.getOid()?);
????????}
????????catch?(?IOException?e?)
 ???????? {?//?1.?如果網(wǎng)絡(luò)異常或oid設(shè)置錯(cuò)誤則直接結(jié)束
????????????log.warn(?e.getMessage()?);
????????????return;
????????}

????????if?(?valueStr?==?null?||?valueStr.trim().length()?==?0?)
????????????return;

????????Long?value?=?null;
????????try
 ???????? {
????????????value?=?Long.valueOf(?valueStr?);
????????}
????????catch?(?NumberFormatException?ex?)
 ???????? {//?2.?如果返回的字符串不是數(shù)字則直接結(jié)束
????????????log.warn(?"NumberFormatException:?"?+?ex.getMessage()?+?"\t"
????????????????????+?device.getCommunity()?+?"@"?+?device.getIp()?+?":?"
????????????????????+?snmpObject.getOid()?);
????????????return;
????????}

????????Date?now?=?new?Date();
????????Long?time?=?new?Long(?(now.getTime()?+?500)?/?1000?);

????????if?(?snmpObject.getLastValue()?>?0?&&?snmpObject.getLastTime()?>?0?)
 ???????? {?//?第一次不執(zhí)行bsh腳本
????????????Long?lastValue?=?new?Long(?snmpObject.getLastValue()?);
????????????Long?lastTime?=?new?Long(?snmpObject.getLastTime()?);

????????????boolean?doWarn?=?false;
????????????try
 ???????????? {?//?3.?用BSH運(yùn)算告警表達(dá)式
????????????????doWarn?=?evalExpr(?value,?time,?lastValue,?lastTime?);
????????????}
????????????catch?(?EvalError?ex?)
 ???????????? {
????????????????log.warn(?ex.getMessage(),?ex?);
????????????????updateSnmpObject(?value,?time?);
????????????????return;
????????????}

????????????if?(?log.isDebugEnabled()?)
 ???????????? {
????????????????logResult(?time,?value,?lastValue,?lastTime,?doWarn?);
????????????}

????????????if?(?doWarn?)
 ???????????? {?//?4.?告警表達(dá)式返回真,?進(jìn)行告警
????????????????Warning?warning?=?newWarning(?now,?time,?value,?lastValue,?lastTime?);

????????????????try
 ???????????????? {
????????????????????warningDAO.insertWarning(?warning?);
????????????????}
????????????????catch?(?Exception?ex?)
 ???????????????? {
????????????????????throw?new?HibernateException(?ex.getMessage()?);
????????????????}
????????????}
????????}

????????//?5.?更新最后時(shí)間,值
????????updateSnmpObject(?value,?time?);
????}

 ????/**?*//**
?????*?更新監(jiān)控對(duì)象的最后的執(zhí)行時(shí)間(lastTime)以及最新值(lastValue)
?????*?
?????*?@param?value
?????*?@param?time
?????*/
????private?void?updateSnmpObject(?Long?value,?Long?time?)
 ???? {
????????snmpObject.setLastTime(?time.longValue()?);
????????snmpObject.setLastValue(?value.longValue()?);
????}

 ????/**?*//**
?????*?執(zhí)行動(dòng)態(tài)bsh表達(dá)式,?并返回該表達(dá)式的結(jié)果值
?????*?
?????*?@param?value
?????*?@param?time
?????*?@param?lastValue
?????*?@param?lastTime
?????*?@return
?????*?@throws?EvalError
?????*/
????private?boolean?evalExpr(?Long?value,?Long?time,?Long?lastValue,?Long?lastTime?)
????????????throws?EvalError
 ???? {
????????Interpreter?bsh?=?new?Interpreter();

????????bsh.set(?"value",?value?);
????????bsh.set(?"time",?time?);
????????bsh.set(?"lastValue",?lastValue?);
????????bsh.set(?"lastTime",?lastTime?);

????????//?執(zhí)行bsh腳本,返回true則需要告警
????????Boolean?doWarn?=?(Boolean)?bsh.eval(?snmpObject.getWarnExpr()?);

????????return?doWarn.booleanValue();
????}

 ????/**?*//**
?????*?通過(guò)snmpget或snmpwalk命令取snmpObject的oid對(duì)應(yīng)的值,?oid可能是單獨(dú)的oid例如?1.3.6.1.4.5,
?????*?也可能是包括sum,?count,?max,?min,?avg等函數(shù)的表達(dá)式.?如果是單獨(dú)的oid,?返回snmpget相應(yīng)的值即可;
?????*?如果是復(fù)合函數(shù),?用snmpwalk,?再進(jìn)行運(yùn)算,?返回最后結(jié)果值
?????*?
?????*?@param?device
?????*??????????ip,?community,?version從這個(gè)對(duì)象取
?????*?@return
?????*?@throws?IOException
?????*/
????public?static?String?snmpget(?final?String?ip,?final?String?community,
????????????final?int?snmpversion,?final?String?oid?)
????????????throws?IOException
 ???? {
????????String?valueStr?=?null;
????????Poller?poller?=?null;
????????try
 ???????? {
????????????poller?=?new?Poller(?ip,?community,?snmpversion,?100?);

????????????log.debug(?"pollering?"?+?oid?);

????????????if?(?oid.indexOf(?'('?)?==?-1?)
 ???????????? {//?單獨(dú)一個(gè)oid
????????????????valueStr?=?poller.get(?oid?);
????????????????if?(?log.isDebugEnabled()?)
????????????????????log.debug(?"snmpget("?+?oid?+?")="?+?valueStr?);
????????????}
????????????else
 ???????????? {//?包括sum,?count,?max,?min,?avg等函數(shù)的表達(dá)式,?例如:
????????????????//?sum(ippoolSize)*100/sum(ippoolUse)
????????????????SimpleInterpreter?si?=?new?SimpleInterpreter(?poller,?oid?);
????????????????Long?result?=?si.interprete();
????????????????if?(?log.isDebugEnabled()?)
????????????????????log.debug(?oid?+?"="?+?result?);
????????????????if?(?result?!=?null?)
????????????????????valueStr?=?result.toString();
????????????}
????????}
????????finally
 ???????? {
????????????if?(?poller?!=?null?)
????????????????poller.close();
????????}

????????return?valueStr;
????}

????private?Warning?newWarning(?Date?now,?Long?time,?Long?value,?Long?lastValue,
????????????Long?lastTime?)
 ???? {
????????Warning?warning?=?new?Warning();
????????warning.setDeviceID(?snmpObject.getDeviceID()?);
????????warning.setWarnType(?snmpObject.getWarnType()?);
????????warning.setWarnLevel(?snmpObject.getWarnLevel()?);
????????warning.setPrimarykey(?snmpObject.getOid()?);

????????String?sms?=?snmpObject.getWarnSms();
????????sms?=?getBshWarnMsg(?sms,?value,?lastValue,?time,?lastTime?);
????????if?(?sms?==?null?||?sms.trim().length()?==?0?)
????????????warning.setWarnSms(?snmpObject.getWarnType()?);
????????else
????????????warning.setWarnSms(?sms.trim()?);

????????String?email?=?snmpObject.getWarnEmail();
????????email?=?getBshWarnMsg(?email,?value,?lastValue,?time,?lastTime?);
????????if?(?email?==?null?||?email.trim().length()?==?0?)
????????????warning.setWarnEmail(?snmpObject.getWarnType()?);
????????else
????????????warning.setWarnEmail(?email.trim()?);

????????warning.setWarnTTS(?snmpObject.getWarnTTS()?);

????????warning.setFirstTime(?now?);
????????warning.setLastTime(?now?);
????????warning.setSuggestion(?snmpObject.getSuggestion()?);
????????return?warning;
????}

????private?void?logResult(?Long?time,?Long?value,?Long?lastValue,?Long?lastTime,
????????????boolean?doWarn?)
 ???? {
????????StringBuffer?buf?=?new?StringBuffer();
????????buf.append(?"OID="?).append(?snmpObject.getOid()?);
????????buf.append(?",time="?).append(?time?);
????????buf.append(?",value="?).append(?value?);
????????buf.append(?",lastTime="?).append(?snmpObject.getLastTime()?);
????????buf.append(?",lastValue="?).append(?snmpObject.getLastValue()?);
????????buf.append(?"\n\t"?).append(?snmpObject.getWarnExpr()?).append(?"="?)
????????????????.append(?doWarn?);

????????if?(?snmpObject.getWarnExpr().indexOf(?"(value-lastValue)/(time-lastTime)"?)?>?-1?)
 ???????? {
????????????buf.append(?"\n\t(value-lastValue)/(time-lastTime)="?).append(
????????????????????((value?-?lastValue)?/?(time?-?lastTime))?);
????????}

????????log.debug(?buf.toString()?);
????}

 ????/**?*//**
?????*?如果參數(shù)是以"bsh://"開(kāi)頭則通過(guò)BSH計(jì)算一個(gè)字符串表達(dá)式,返回最后結(jié)果;?否則直接返回。
?????*?表達(dá)式參數(shù)包括value,lastValue,time,lastTime,例如:
?????*?bsh://"端口45流量大于800M:"+((value-lastValue)/(time-lastTime)*8/1000000)+"M"
?????*?
?????*?@param?msgExpr
?????*??????????字符串表達(dá)式
?????*?@return?String
?????*/
????private?static?String?getBshWarnMsg(?String?msgExpr,?Long?value,
????????????Long?lastValue,?Long?time,?Long?lastTime?)
 ???? {
????????if?(?msgExpr?==?null?||?!msgExpr.startsWith(?BSH?)?)
????????????return?msgExpr;

????????msgExpr?=?msgExpr.substring(?BSH.length()?);
????????try
 ???????? {
????????????Interpreter?bsh?=?new?Interpreter();

????????????bsh.set(?"value",?value?);
????????????bsh.set(?"time",?time?);
????????????bsh.set(?"lastValue",?lastValue?);
????????????bsh.set(?"lastTime",?lastTime?);

????????????//?執(zhí)行bsh腳本,返回實(shí)際的告警信息
????????????msgExpr?=?(String)?bsh.eval(?msgExpr?);
????????}
????????catch?(?EvalError?ex?)
 ???????? {
????????????log.warn(?ex.getMessage()?);
????????}

????????return?msgExpr;
????}
}
評(píng)論:
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2007-01-18 12:56
非常好,獲益良多 回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2007-05-21 23:01
很好啊 非常好 回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)[未登錄](méi)
Posted @ 2007-05-25 23:52
恩 獲益匪淺 回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2007-09-27 17:10
真的太好了。受益匪淺啊
回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2007-10-18 09:27
狂頂,太好了,恩人啊! 回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2007-11-07 15:32
太好了
網(wǎng)絡(luò)設(shè)備所有的告警能實(shí)現(xiàn)嗎?
要是有能貼出來(lái)的話就更好了
謝謝 回復(fù) 更多評(píng)論
-
# re: 網(wǎng)絡(luò)設(shè)備主動(dòng)告警系統(tǒng)之snmp告警的實(shí)現(xiàn)
Posted @ 2008-10-13 13:10
把下面這些代碼也貼出來(lái)吧,否則不能運(yùn)行啊
import com.kelefa.common.util.HibernateUtil;
import com.kelefa.warnlet.dao.WarningDAO;
import com.kelefa.warnlet.interpreter.SimpleInterpreter;
import com.kelefa.warnlet.snmp.Poller;
import com.kelefa.warnlet.vo.Device;
import com.kelefa.warnlet.vo.SnmpObject;
import com.kelefa.warnlet.vo.Warning;
回復(fù) 更多評(píng)論
|