Posted on 2006-12-11 15:55
Fisher 閱讀(10034)
評(píng)論(10) 編輯 收藏 所屬分類:
Java應(yīng)用
在用Java的HttpURLConnection 來下載網(wǎng)頁,發(fā)現(xiàn)訪問google的網(wǎng)站時(shí),會(huì)被google拒絕掉。
?????? try
??????? {
??????????? url = new URL(urlStr);
??????????? httpConn = (HttpURLConnection) url.openConnection();
??????????? HttpURLConnection.setFollowRedirects(true);
??????????? // logger.info(httpConn.getResponseMessage());
??????????? in = httpConn.getInputStream();
??????????? out = new FileOutputStream(new File(outPath));
??????????? chByte = in.read();
??????????? while (chByte != -1)
??????????? {
??????????????? out.write(chByte);
??????????????? chByte = in.read();
??????????? }
??????? }
??????? catch (MalformedURLException e)
????????{
?????????}
??????? }
經(jīng)過一段時(shí)間的研究和查找資料,發(fā)現(xiàn)是由于上面的代碼缺少了一些必要的信息導(dǎo)致,增加更加詳細(xì)的屬性
??????????? httpConn.setRequestMethod("GET");
??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
完整代碼如下:
?? public static void DownLoadPages(String urlStr, String outPath)
??? {
??????? int chByte = 0;
??????? URL url = null;
??????? HttpURLConnection httpConn = null;
??????? InputStream in = null;
??????? FileOutputStream out = null;
??????? try
??????? {
??????????? url = new URL(urlStr);
??????????? httpConn = (HttpURLConnection) url.openConnection();
??????????? HttpURLConnection.setFollowRedirects(true);
??????????? httpConn.setRequestMethod("GET");
??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
???????????
??????????? // logger.info(httpConn.getResponseMessage());
??????????? in = httpConn.getInputStream();
??????????? out = new FileOutputStream(new File(outPath));
??????????? chByte = in.read();
??????????? while (chByte != -1)
??????????? {
??????????????? out.write(chByte);
??????????????? chByte = in.read();
??????????? }
??????? }
??????? catch (MalformedURLException e)
??????? {
??????????? e.printStackTrace();
??????? }
??????? catch (IOException e)
??????? {
??????????? e.printStackTrace();
??????? }
??????? finally
??????? {
??????????? try
??????????? {
??????????????? out.close();
??????????????? in.close();
??????????????? httpConn.disconnect();
??????????? }
??????????? catch (Exception ex)
??????????? {
??????????????? ex.printStackTrace();
??????????? }
??????? }
??? }
此外,還有第二種方法可以訪問Google的網(wǎng)站,就是用apache的一個(gè)工具HttpClient 模仿一個(gè)瀏覽器來訪問Google
??????? Document document = null;
??????? HttpClient httpClient = new HttpClient();
???????
??????? GetMethod getMethod = new GetMethod(url);
??????? getMethod.setFollowRedirects(true);
??????? int statusCode = httpClient.executeMethod(getMethod);
???????
??????? if (statusCode == HttpStatus.SC_OK)
??????? {
??????????? InputStream in = getMethod.getResponseBodyAsStream();
??????????? InputSource is = new InputSource(in);
??????????? DOMParser domParser = new DOMParser();?? //nekoHtml 將取得的網(wǎng)頁轉(zhuǎn)換成dom
??????????? domParser.parse(is);
??????????? document = domParser.getDocument();
???????????
??????????? System.out.println(getMethod.getURI());
???????????
??????? }
??????? return document;
推薦使用第一種方式,使用HttpConnection 比較輕量級(jí),速度也比第二種HttpClient 的快。