公司有幾百臺服務(wù)器,很多服務(wù)器使用了LVS,同一個應(yīng)用會部署在很多不同的服務(wù)器上,然后在上層加LVS,這個時候,當(dāng)后臺一臺或幾臺服務(wù)服務(wù)器宕掉了,前端應(yīng)用是正常的,通過對URL的監(jiān)控,不能發(fā)現(xiàn)問題.
上周末托管在深圳電信的機(jī)器,有一個機(jī)柜9臺服務(wù)器同時斷掉,經(jīng)過查找,最后是外網(wǎng)交換機(jī)出現(xiàn)了問題.但這個時候前端應(yīng)用是正常的,而監(jiān)控,沒有發(fā)出報警信息,昨天在監(jiān)控上面加上新功能,穿過LVS,直接到后端服務(wù)器進(jìn)行監(jiān)控.
這個服務(wù)器的監(jiān)控,分為兩種.
1:通過SNMP對服務(wù)器進(jìn)行監(jiān)控.
2:通過對應(yīng)用的URL對服務(wù)器進(jìn)行監(jiān)控.
SNMP主要監(jiān)控服務(wù)器的運行狀態(tài).
URL監(jiān)控,主要監(jiān)控應(yīng)用的實時運行狀態(tài).
費話少說,對應(yīng)用加IP的探測代碼如下:

public static Long getResponseTimeByIp(String urlAddress, String ip)
{
URL url;
StringBuffer sb = new StringBuffer("");
HttpURLConnection conn = null;
Long responseTime = new Long(0);

try
{
Long openTime = System.currentTimeMillis();
// url = new URL("http://m.easou.com/");
url = new URL(urlAddress);
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(buildInetAddress(ip), 80));
conn = (HttpURLConnection) url.openConnection(proxy);
conn.setConnectTimeout(50000);
conn.setReadTimeout(50000);
conn.setRequestMethod("GET");
conn.setDoOutput(true);
conn.setDoInput(true);
BufferedReader bReader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String temp;
boolean remaining = true;

while (remaining)
{
temp = bReader.readLine();

if (null != temp)
{
sb.append(temp);

} else
{
remaining = false;
}
}
int code = conn.getResponseCode();

if (code == 200)
{
Long returnTime = System.currentTimeMillis();
responseTime = returnTime - openTime;

} else
{
responseTime = new Long("50000" + new Long(code).toString());
}

} catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
responseTime = new Long("60000000");

} catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
responseTime = new Long("60000000");

} finally
{

if (null != conn)
{
conn.disconnect();
}
}
return responseTime;
}

使用這段代碼,就可以對于做了負(fù)載均衡的服務(wù)器,進(jìn)行URL的實時監(jiān)控了.
發(fā)送的報警信息,會探測出目前哪臺服務(wù)器的狀況更差,更有針對性,方便系統(tǒng)組用戶處理服務(wù)器異常.