公司有幾百臺(tái)服務(wù)器,很多服務(wù)器使用了LVS,同一個(gè)應(yīng)用會(huì)部署在很多不同的服務(wù)器上,然后在上層加LVS,這個(gè)時(shí)候,當(dāng)后臺(tái)一臺(tái)或幾臺(tái)服務(wù)服務(wù)器宕掉了,前端應(yīng)用是正常的,通過對(duì)URL的監(jiān)控,不能發(fā)現(xiàn)問題.

 

  上周末托管在深圳電信的機(jī)器,有一個(gè)機(jī)柜9臺(tái)服務(wù)器同時(shí)斷掉,經(jīng)過查找,最后是外網(wǎng)交換機(jī)出現(xiàn)了問題.但這個(gè)時(shí)候前端應(yīng)用是正常的,而監(jiān)控,沒有發(fā)出報(bào)警信息,昨天在監(jiān)控上面加上新功能,穿過LVS,直接到后端服務(wù)器進(jìn)行監(jiān)控.

 

   這個(gè)服務(wù)器的監(jiān)控,分為兩種.

  1:通過SNMP對(duì)服務(wù)器進(jìn)行監(jiān)控.

  2:通過對(duì)應(yīng)用的URL對(duì)服務(wù)器進(jìn)行監(jiān)控.

 

  SNMP主要監(jiān)控服務(wù)器的運(yùn)行狀態(tài).

  URL監(jiān)控,主要監(jiān)控應(yīng)用的實(shí)時(shí)運(yùn)行狀態(tài).

 

  費(fèi)話少說(shuō),對(duì)應(yīng)用加IP的探測(cè)代碼如下:

public static Long getResponseTimeByIp(String urlAddress, String ip) {   
        URL url;   
        StringBuffer sb 
= new StringBuffer("");   
        HttpURLConnection conn 
= null;   
        Long responseTime 
= new Long(0);   
        
try {   
            Long openTime 
= System.currentTimeMillis();   
            
// url = new URL("http://m.easou.com/");   
            url = new URL(urlAddress);   
            Proxy proxy 
= new Proxy(Proxy.Type.HTTP, new InetSocketAddress(buildInetAddress(ip), 80));   
            conn 
= (HttpURLConnection) url.openConnection(proxy);   
            conn.setConnectTimeout(
50000);   
            conn.setReadTimeout(
50000);   
            conn.setRequestMethod(
"GET");   
            conn.setDoOutput(
true);   
            conn.setDoInput(
true);   
            BufferedReader bReader 
= new BufferedReader(new InputStreamReader(conn.getInputStream()));   
            String temp;   
            
boolean remaining = true;   
            
while (remaining) {   
                temp 
= bReader.readLine();   
                
if (null != temp) {   
                    sb.append(temp);   
                }
 else {   
                    remaining 
= false;   
                }
   
            }
   
            
int code = conn.getResponseCode();   
            
if (code == 200{   
                Long returnTime 
= System.currentTimeMillis();   
                responseTime 
= returnTime - openTime;   
            }
 else {   
                responseTime 
= new Long("50000" + new Long(code).toString());   
            }
   
        }
 catch (MalformedURLException e) {   
            
// TODO Auto-generated catch block   
            e.printStackTrace();   
            responseTime 
= new Long("60000000");   
        }
 catch (IOException e) {   
            
// TODO Auto-generated catch block   
            e.printStackTrace();   
            responseTime 
= new Long("60000000");   
        }
 finally {   
            
if (null != conn) {   
                conn.disconnect();   
            }
   
        }
   
        
return responseTime;   
    }
  

使用這段代碼,就可以對(duì)于做了負(fù)載均衡的服務(wù)器,進(jìn)行URL的實(shí)時(shí)監(jiān)控了.

 

發(fā)送的報(bào)警信息,會(huì)探測(cè)出目前哪臺(tái)服務(wù)器的狀況更差,更有針對(duì)性,方便系統(tǒng)組用戶處理服務(wù)器異常.