1.浣跨敤Dom4j瑙f瀽澶ф枃浠舵椂鍐呭瓨婧㈠嚭鐨勯棶棰?/strong>
闂鏄繖鏍風殑,褰撴垜鐢╠om4j鍘昏В鏋愪竴涓嚑鍗丮鐨剎ml鏃?灝卞嚭鐜皁ut of memory.褰撶劧浜?榪欎篃鏄牴鎹綘鐨勬満鍣ㄦц兘鑰屽畾鐨?鎴戜滑閮界煡閬揹om4j鍦ㄥ悇縐岲OM瑙f瀽鍣ㄤ腑搴旇綆楁槸鎬ц兘鏈濂界殑,榪炲ぇ鍚嶉紟榧庣殑Hibernate閮芥槸鐢╠om4j鏉ヨВ鏋怷ML閰嶇疆鏂囦歡鐨?br />
闂鍑哄湪浜庝嬌鐢╠om4j鐨凷AXReader鏄細鎶婃暣涓猉ML鏂囦歡涓嬈℃ц鍏?濡傛灉XML鏂囦歡榪囧ぇ灝變細鎶涘嚭out of memory,浣嗗嵆浣挎槸浣跨敤SAXParser鎵歸噺璇誨叆瑙f瀽,浣嗗畠涔熸槸涓嬈¤В鏋愬畬,鍋囪XML鏂囦歡鏈夊嚑涓囨潯鏁版嵁,閭d箞瑙f瀽鍚庡氨蹇呴』鍦ㄥ唴瀛樻斁鍏ヨ繖鍑犱竾鏉″璞?
甯哥敤鐨凞om4j鏂囦歡瑙f瀽鏂瑰紡錛?br />
InputStream is = new FileInputStream(filePath);
SAXReader reader = new SAXReader(); //灝嗘暣涓猉ML鏋勫緩涓轟竴涓狣ocument
Document doc = reader.read(is);
Element root = doc.getRootElement(); // 鑾峰緱鏍硅妭鐐?/span>
for (Object obj : root.elements()) { // 閬嶅巻姣忎釜鑺傜偣
Element e = (Element)obj; // 瀵瑰綋鍓嶈妭鐐硅繘琛屾搷浣?/span>
}
瑙e喅鏂規硶:浣跨敤ElementHandler瑙f瀽鏂囦歡
閫氳繃鏌PI鍙互鍙戠幇ElementHandler鎺ュ彛,涓嬮潰鏄帴鍙g殑浠嬬粛
ElementHandler interface defines a handler of Element objects. It is used primarily in event based processing models such as for
processing large XML documents as they are being parsed rather than waiting until the whole document is parsed.
濂戒簡,瀹冨氨鏄垜浠兂瑕佺殑,閫氳繃瀹炵幇浠ヤ笅涓や釜method,灝卞彲浠ヨ揪鎴愭垜浠殑闇姹?br />
onEnd(ElementPath elementPath)
Called by an event based processor when an elements closing tag is encountered.
onStart(ElementPath elementPath)
Called by an event based processor when an elements openning tag is encountered.
涓嬮潰鏄唬鐮?br />
FileInputStream fis = new FileInputStream(addPath);
SAXReader reader = new SAXReader();
ElementHandler addHandler = new MyElementHandler(); //寤虹珛 MyElementHandler 鐨勫疄渚?/span>
reader.addHandler("/root/test1", addHandler); // 鑺傜偣
reader.addHandler("/root/test2", addHandler); // 鑺傜偣
reader.read(fis);
...
class MyElementHandler implements ElementHandler {
public void onStart(ElementPath ep) {}
public void onEnd(ElementPath ep) {
Element e = ep.getCurrent(); // 鑾峰緱褰撳墠鑺傜偣
// 瀵硅妭鐐硅繘琛屾搷浣溿傘傘?/span>
e.detach(); // 澶勭悊瀹屽綋鍓嶈妭鐐瑰悗錛屽皢鍏朵粠dom鏍戜腑鍓櫎
}
}
鍥犱負姣忔澶勭悊瀹屼竴涓妭鐐瑰悗騫舵病鏈変繚瀛樺湪dom鏍戜腑錛屾墍浠ヤ笉浼氬嚭鐜板唴瀛樻孩鍑虹殑鎯呭喌
涓婇潰鐨勭渷鐣ヤ簡涓浜涗笟鍔′唬鐮?涓嶇煡閬撲綘鏄惁鏄庣櫧鎴栨湁鏇村ソ鐨勬柟娉?鍙互鍜屾垜鑱旂郴QQ:34174409
2.BOM澶撮棶棰?/strong>
浣跨敤java.io.Reader璇誨彇XML鏂囦歡榪涜瑙f瀽鏃跺嚭鐜板紓甯?br />
org.dom4j.DocumentException: Error on line 1 of document : Content is not allowed in prolog.
Nested exception:
org.xml.sax.SAXParseException: Content is not allowed in prolog.
鍘熷洜鍦ㄤ簬:UTF-8緙栫爜鏂囦歡瀛樺湪BOM澶達紝Reader綾繪棤娉曟紜瘑鍒?br />
瑙e喅鏂規硶錛?br />
(1).浣跨敤16榪涘埗緙栬緫鍣ㄦ墜鍔ㄥ垹闄OM澶?br />
榪欎釜...鑷瑙e喅
(2).InputStream璇誨彇嫻佷腑鍓嶉潰鐨勫瓧絎︼紝鐪嬫槸鍚︽湁BOM,濡傛灉鏈塀OM,騫叉帀BOM澶?br />
PushbackInputStream pis = new PushbackInputStream(in);
int ch = pis.read();
if (ch != 0xEF){
testin.unread(ch);
} else if ((ch = pis.read()) != 0xBB){
pis.unread(ch);
pis.unread(0xef);
} else if ((ch = pis.read()) != 0xBF){
throw new IOException("wrong format");
} else
{
}
(3).InputStream璇誨彇瀹屾枃浠訛紝騫叉帀BOM澶?br />
FileInputStream fin = new FileInputStream(fileName);
//鍐欏叆涓存椂鏂囦歡
InputStream in = getInputStream(fin);
String tmpFileName = fileName + ".tmp";
FileOutputStream out = new FileOutputStream(tmpFileName);
byte b[] = new byte[4096];
int len = 0;
while (in.available() > 0){
len = in.read(b, 0, 4096);
out.write(b, 0, len);
}
in.close();
fin.close();
out.close();
//涓存椂鏂囦歡鍐欏畬錛屽紑濮嬪皢涓存椂鏂囦歡鍐欏洖鏈枃浠躲?/span>
in = new FileInputStream(tmpFileName);
System.out.println("[" + fileName + "]");
out = new FileOutputStream(fileName);
while (in.available() > 0){
len = in.read(b, 0, 4096);
out.write(b, 0, len);
}
in.close();
out.close();
(3).闈炴硶XML瀛楃涓?/strong>
瑙f瀽XML鏂囦歡鏃跺嚭鐜伴潪娉曞瓧絎︾殑Exception錛堝嵆浣胯瀛楃浣嶄簬CDATA孌靛唴錛夛細 org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xb) was found in the CDATA section.
鍘熷洜鍦ㄤ簬:鏍規嵁W3C鏍囧噯錛屾湁涓浜涘瓧絎︿笉鑳藉嚭鐜板湪XML鏂囦歡涓細
0x00 - 0x08
0x0b - 0x0c
0x0e - 0x1f
瑙f瀽XML鏃墮亣鍒拌繖浜涘瓧絎﹀氨浼氬嚭閿?br />
瑙e喅鏂規硶錛?br />
瀵規湁鍙兘鍑洪棶棰樼殑XML鏂囦歡錛岃繘琛屽瓧絎﹁繃婊ゅ悗鍐嶈繘琛岃В鏋愩?br />
public static String stripNonValidXMLChars(String str) {
if (str == null || "".equals(str)) {
return str;
}
return str.replaceAll("[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]", "");
}
----------------------------------------
by 闄堜簬鍠?
QQ:34174409
Mail: chenyz@corp.netease.com

]]>