公司有幾百臺(tái)服務(wù)器,很多服務(wù)器使用了LVS,同一個(gè)應(yīng)用會(huì)部署在很多不同的服務(wù)器上,然后在上層加LVS,這個(gè)時(shí)候,當(dāng)后臺(tái)一臺(tái)或幾臺(tái)服務(wù)服務(wù)器宕掉了,前端應(yīng)用是正常的,通過對(duì)URL的監(jiān)控,不能發(fā)現(xiàn)問題.
上周末托管在深圳電信的機(jī)器,有一個(gè)機(jī)柜9臺(tái)服務(wù)器同時(shí)斷掉,經(jīng)過查找,最后是外網(wǎng)交換機(jī)出現(xiàn)了問題.但這個(gè)時(shí)候前端應(yīng)用是正常的,而監(jiān)控,沒有發(fā)出報(bào)警信息,昨天在監(jiān)控上面加上新功能,穿過LVS,直接到后端服務(wù)器進(jìn)行監(jiān)控.
這個(gè)服務(wù)器的監(jiān)控,分為兩種.
1:通過SNMP對(duì)服務(wù)器進(jìn)行監(jiān)控.
2:通過對(duì)應(yīng)用的URL對(duì)服務(wù)器進(jìn)行監(jiān)控.
SNMP主要監(jiān)控服務(wù)器的運(yùn)行狀態(tài).
URL監(jiān)控,主要監(jiān)控應(yīng)用的實(shí)時(shí)運(yùn)行狀態(tài).
費(fèi)話少說(shuō),對(duì)應(yīng)用加IP的探測(cè)代碼如下:

public static Long getResponseTimeByIp(String urlAddress, String ip)
{
URL url;
StringBuffer sb = new StringBuffer("");
HttpURLConnection conn = null;
Long responseTime = new Long(0);

try
{
Long openTime = System.currentTimeMillis();
// url = new URL("http://m.easou.com/");
url = new URL(urlAddress);
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(buildInetAddress(ip), 80));
conn = (HttpURLConnection) url.openConnection(proxy);
conn.setConnectTimeout(50000);
conn.setReadTimeout(50000);
conn.setRequestMethod("GET");
conn.setDoOutput(true);
conn.setDoInput(true);
BufferedReader bReader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String temp;
boolean remaining = true;

while (remaining)
{
temp = bReader.readLine();

if (null != temp)
{
sb.append(temp);

} else
{
remaining = false;
}
}
int code = conn.getResponseCode();

if (code == 200)
{
Long returnTime = System.currentTimeMillis();
responseTime = returnTime - openTime;

} else
{
responseTime = new Long("50000" + new Long(code).toString());
}

} catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
responseTime = new Long("60000000");

} catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
responseTime = new Long("60000000");

} finally
{

if (null != conn)
{
conn.disconnect();
}
}
return responseTime;
}

使用這段代碼,就可以對(duì)于做了負(fù)載均衡的服務(wù)器,進(jìn)行URL的實(shí)時(shí)監(jiān)控了.
發(fā)送的報(bào)警信息,會(huì)探測(cè)出目前哪臺(tái)服務(wù)器的狀況更差,更有針對(duì)性,方便系統(tǒng)組用戶處理服務(wù)器異常.