锘??xml version="1.0" encoding="utf-8" standalone="yes"?>久久精品九九亚洲精品,亚洲av成人中文无码专区,亚洲男人天堂2022http://m.tkk7.com/wangxinsh55/category/54532.htmlzh-cnMon, 02 Mar 2015 13:45:56 GMTMon, 02 Mar 2015 13:45:56 GMT60Storm闆嗘垚Kafka緙栫▼妯″瀷http://m.tkk7.com/wangxinsh55/archive/2015/03/01/423114.htmlSIMONESIMONESun, 01 Mar 2015 07:47:00 GMThttp://m.tkk7.com/wangxinsh55/archive/2015/03/01/423114.htmlhttp://m.tkk7.com/wangxinsh55/comments/423114.htmlhttp://m.tkk7.com/wangxinsh55/archive/2015/03/01/423114.html#Feedback0http://m.tkk7.com/wangxinsh55/comments/commentRss/423114.htmlhttp://m.tkk7.com/wangxinsh55/services/trackbacks/423114.html闃呰鍏ㄦ枃

SIMONE 2015-03-01 15:47 鍙戣〃璇勮
]]>
Hadoop浣滀笟璋冧紭鍙傛暟鏁寸悊鍙婂師鐞?/title><link>http://m.tkk7.com/wangxinsh55/archive/2014/11/19/420297.html</link><dc:creator>SIMONE</dc:creator><author>SIMONE</author><pubDate>Wed, 19 Nov 2014 05:42:00 GMT</pubDate><guid>http://m.tkk7.com/wangxinsh55/archive/2014/11/19/420297.html</guid><wfw:comment>http://m.tkk7.com/wangxinsh55/comments/420297.html</wfw:comment><comments>http://m.tkk7.com/wangxinsh55/archive/2014/11/19/420297.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://m.tkk7.com/wangxinsh55/comments/commentRss/420297.html</wfw:commentRss><trackback:ping>http://m.tkk7.com/wangxinsh55/services/trackbacks/420297.html</trackback:ping><description><![CDATA[<div>http://www.linuxidc.com/Linux/2012-01/51615.htm</div><br /><div><h2><span>1 </span>Map side tuning <span style="font-family: 瀹嬩綋">鍙傛暟</span> </h2> <h3><span>1.1 </span>MapTask <span style="font-family: 瀹嬩綋">榪愯鍐呴儴鍘熺悊</span> </h3> <p><img alt="" src="http://www.linuxidc.com/upload/2012_01/120116103468161.gif" /> <br /></p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">褰搈ap task 寮濮嬭繍綆楋紝騫朵駭鐢熶腑闂存暟鎹椂錛屽叾浜х敓鐨勪腑闂寸粨鏋滃茍闈炵洿鎺ュ氨綆鍗曠殑鍐欏叆紓佺洏銆傝繖涓棿鐨勮繃紼嬫瘮杈冨鏉傦紝騫朵笖鍒╃敤鍒頒簡鍐呭瓨buffer 鏉ヨ繘琛屽凡緇忎駭鐢熺殑閮ㄥ垎緇撴灉鐨勭紦瀛橈紝騫跺湪鍐呭瓨buffer 涓繘琛屼竴浜涢鎺掑簭鏉ヤ紭鍖栨暣涓猰ap 鐨勬ц兘銆傚涓婂浘鎵紺猴紝姣忎竴涓猰ap 閮戒細瀵瑰簲瀛樺湪涓涓唴瀛榖uffer 錛圡apOutputBuffer 錛屽嵆涓婂浘鐨刡uffer in memory 錛夛紝map 浼氬皢宸茬粡浜х敓鐨勯儴鍒嗙粨鏋滃厛鍐欏叆鍒拌buffer 涓紝榪欎釜buffer 榛樿鏄?00MB 澶у皬錛屼絾鏄繖涓ぇ灝忔槸鍙互鏍規嵁job 鎻愪氦鏃剁殑鍙傛暟璁懼畾鏉ヨ皟鏁寸殑錛岃鍙傛暟鍗充負錛?/span> <strong><span style="color: red; font-family: 瀹嬩綋">io.sort.mb</span> </strong><span style="color: black; font-family: 瀹嬩綋">銆傚綋map 鐨勪駭鐢熸暟鎹潪甯稿ぇ鏃訛紝騫朵笖鎶奿o.sort.mb 璋冨ぇ錛岄偅涔坢ap 鍦ㄦ暣涓綆楄繃紼嬩腑spill 鐨勬鏁板氨鍔垮繀浼氶檷浣庯紝map task 瀵圭鐩樼殑鎿嶄綔灝變細鍙樺皯錛屽鏋渕ap tasks 鐨勭摱棰堝湪紓佺洏涓婏紝榪欐牱璋冩暣灝變細澶уぇ鎻愰珮map 鐨勮綆楁ц兘銆俶ap 鍋歴ort 鍜宻pill 鐨勫唴瀛樼粨鏋勫涓嬪鎵紺猴細</span> </p> <p style="line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋"><img alt="" src="http://www.linuxidc.com/upload/2012_01/120116103468162.gif" height="404" width="575" /> <br /></span></p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">map</span> <span style="color: black; font-family: 瀹嬩綋">鍦ㄨ繍琛岃繃紼嬩腑錛屼笉鍋滅殑鍚戣buffer 涓啓鍏ュ凡鏈夌殑璁$畻緇撴灉錛屼絾鏄buffer 騫朵笉涓瀹氳兘灝嗗叏閮ㄧ殑map 杈撳嚭緙撳瓨涓嬫潵錛屽綋map 杈撳嚭瓚呭嚭涓瀹氶槇鍊鹼紙姣斿100M 錛夛紝閭d箞map 灝卞繀欏誨皢璇uffer 涓殑鏁版嵁鍐欏叆鍒扮鐩樹腑鍘伙紝榪欎釜榪囩▼鍦╩apreduce 涓彨鍋歴pill 銆俶ap 騫朵笉鏄絳夊埌灝嗚buffer 鍏ㄩ儴鍐欐弧鏃舵墠榪涜spill 錛屽洜涓哄鏋滃叏閮ㄥ啓婊′簡鍐嶅幓鍐檚pill 錛屽娍蹇呬細閫犳垚map 鐨勮綆楅儴鍒嗙瓑寰卋uffer 閲婃斁絀洪棿鐨勬儏鍐點傛墍浠ワ紝map 鍏跺疄鏄綋buffer 琚啓婊″埌涓瀹氱▼搴︼紙姣斿80% 錛夋椂錛屽氨寮濮嬭繘琛宻pill 銆傝繖涓槇鍊間篃鏄敱涓涓猨ob 鐨勯厤緗弬鏁版潵鎺у埗錛屽嵆</span> <strong><span style="color: red; font-family: 瀹嬩綋">io.sort.spill.percent</span> </strong><span style="color: black; font-family: 瀹嬩綋">錛岄粯璁や負0.80 鎴?0% 銆傝繖涓弬鏁板悓鏍蜂篃鏄獎鍝峴pill 棰戠箒紼嬪害錛岃繘鑰屽獎鍝峬ap task 榪愯鍛ㄦ湡瀵圭鐩樼殑璇誨啓棰戠巼鐨勩備絾闈炵壒孌婃儏鍐典笅錛岄氬父涓嶉渶瑕佷漢涓虹殑璋冩暣銆傝皟鏁磇o.sort.mb 瀵圭敤鎴鋒潵璇存洿鍔犳柟渚褲?/span> </p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">褰搈ap task 鐨勮綆楅儴鍒嗗叏閮ㄥ畬鎴愬悗錛屽鏋渕ap 鏈夎緭鍑猴紝灝變細鐢熸垚涓涓垨鑰呭涓猻pill 鏂囦歡錛岃繖浜涙枃浠跺氨鏄痬ap 鐨勮緭鍑虹粨鏋溿俶ap 鍦ㄦ甯擱鍑轟箣鍓嶏紝闇瑕佸皢榪欎簺spill 鍚堝茍錛坢erge 錛夋垚涓涓紝鎵浠ap 鍦ㄧ粨鏉熶箣鍓嶈繕鏈変竴涓猰erge 鐨勮繃紼嬨俶erge 鐨勮繃紼嬩腑錛屾湁涓涓弬鏁板彲浠ヨ皟鏁磋繖涓繃紼嬬殑琛屼負錛岃鍙傛暟涓猴細</span> <strong><span style="color: red; font-family: 瀹嬩綋">io.sort.factor</span> </strong><span style="color: black; font-family: 瀹嬩綋">銆傝鍙傛暟榛樿涓?0 銆傚畠琛ㄧず褰搈erge spill 鏂囦歡鏃訛紝鏈澶氳兘鏈夊灝戝茍琛岀殑stream 鍚憁erge 鏂囦歡涓啓鍏ャ傛瘮濡傚鏋渕ap 浜х敓鐨勬暟鎹潪甯哥殑澶э紝浜х敓鐨剆pill 鏂囦歡澶т簬10 錛岃宨o.sort.factor 浣跨敤鐨勬槸榛樿鐨?0 錛岄偅涔堝綋map 璁$畻瀹屾垚鍋歮erge 鏃訛紝灝辨病鏈夊姙娉曚竴嬈″皢鎵鏈夌殑spill 鏂囦歡merge 鎴愪竴涓紝鑰屾槸浼氬垎澶氭錛屾瘡嬈℃渶澶?0 涓猻tream 銆傝繖涔熷氨鏄錛屽綋map 鐨勪腑闂寸粨鏋滈潪甯稿ぇ錛岃皟澶o.sort.factor 錛屾湁鍒╀簬鍑忓皯merge 嬈℃暟錛岃繘鑰屽噺灝憁ap 瀵圭鐩樼殑璇誨啓棰戠巼錛屾湁鍙兘杈懼埌浼樺寲浣滀笟鐨勭洰鐨勩?/span> </p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">褰搄ob 鎸囧畾浜哻ombiner 鐨勬椂鍊欙紝鎴戜滑閮界煡閬搈ap 浠嬬粛鍚庝細鍦╩ap 绔牴鎹甤ombiner 瀹氫箟鐨勫嚱鏁板皢map 緇撴灉榪涜鍚堝茍銆傝繍琛宑ombiner 鍑芥暟鐨勬椂鏈烘湁鍙兘浼氭槸merge 瀹屾垚涔嬪墠錛屾垨鑰呬箣鍚庯紝榪欎釜鏃舵満鍙互鐢變竴涓弬鏁版帶鍒訛紝鍗?/span> <strong><span style="color: red; font-family: 瀹嬩綋">min.num.spill.for.combine</span> </strong><span style="color: black; font-family: 瀹嬩綋">錛坉efault 3 錛夛紝褰搄ob 涓瀹氫簡combiner 錛屽茍涓攕pill 鏁版渶灝戞湁3 涓殑鏃跺欙紝閭d箞combiner 鍑芥暟灝變細鍦╩erge 浜х敓緇撴灉鏂囦歡涔嬪墠榪愯銆傞氳繃榪欐牱鐨勬柟寮忥紝灝卞彲浠ュ湪spill 闈炲父澶氶渶瑕乵erge 錛屽茍涓斿緢澶氭暟鎹渶瑕佸仛conbine 鐨勬椂鍊欙紝鍑忓皯鍐欏叆鍒扮鐩樻枃浠剁殑鏁版嵁鏁伴噺錛屽悓鏍鋒槸涓轟簡鍑忓皯瀵圭鐩樼殑璇誨啓棰戠巼錛屾湁鍙兘杈懼埌浼樺寲浣滀笟鐨勭洰鐨勩?/span> </p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">鍑忓皯涓棿緇撴灉璇誨啓榪涘嚭紓佺洏鐨勬柟娉曚笉姝㈣繖浜涳紝榪樻湁灝辨槸鍘嬬緝銆備篃灝辨槸璇磎ap 鐨勪腑闂達紝鏃犺鏄痵pill 鐨勬椂鍊欙紝榪樻槸鏈鍚巑erge 浜х敓鐨勭粨鏋滄枃浠訛紝閮芥槸鍙互鍘嬬緝鐨勩傚帇緙╃殑濂藉鍦ㄤ簬錛岄氳繃鍘嬬緝鍑忓皯鍐欏叆璇誨嚭紓佺洏鐨勬暟鎹噺銆傚涓棿緇撴灉闈炲父澶э紝紓佺洏閫熷害鎴愪負map 鎵ц鐡墮鐨刯ob 錛屽挨鍏舵湁鐢ㄣ傛帶鍒秏ap 涓棿緇撴灉鏄惁浣跨敤鍘嬬緝鐨勫弬鏁頒負錛?/span> <strong><span style="color: red; font-family: 瀹嬩綋">mapred.compress.map.output</span> </strong><span style="color: black; font-family: 瀹嬩綋">(true/false)</span> <span style="color: black; font-family: 瀹嬩綋">銆傚皢榪欎釜鍙傛暟璁劇疆涓簍rue 鏃訛紝閭d箞map 鍦ㄥ啓涓棿緇撴灉鏃訛紝灝變細灝嗘暟鎹帇緙╁悗鍐嶅啓鍏ョ鐩橈紝璇葷粨鏋滄椂涔熶細閲囩敤鍏堣В鍘嬪悗璇誨彇鏁版嵁銆傝繖鏍峰仛鐨勫悗鏋滃氨鏄細鍐欏叆紓佺洏鐨勪腑闂寸粨鏋滄暟鎹噺浼氬彉灝戯紝浣嗘槸cpu 浼氭秷鑰椾竴浜涚敤鏉ュ帇緙╁拰瑙e帇銆傛墍浠ヨ繖縐嶆柟寮忛氬父閫傚悎job 涓棿緇撴灉闈炲父澶э紝鐡墮涓嶅湪cpu 錛岃屾槸鍦ㄧ鐩樼殑璇誨啓鐨勬儏鍐點傝鐨勭洿鐧戒竴浜涘氨鏄敤cpu 鎹O 銆傛牴鎹瀵燂紝閫氬父澶ч儴鍒嗙殑浣滀笟cpu 閮戒笉鏄摱棰堬紝闄ら潪榪愮畻閫昏緫寮傚父澶嶆潅銆傛墍浠ュ涓棿緇撴灉閲囩敤鍘嬬緝閫氬父鏉ヨ鏄湁鏀剁泭鐨勩備互涓嬫槸涓涓獁ordcount 涓棿緇撴灉閲囩敤鍘嬬緝鍜屼笉閲囩敤鍘嬬緝浜х敓鐨刴ap 涓棿緇撴灉鏈湴紓佺洏璇誨啓鐨勬暟鎹噺瀵規瘮錛?/span> </p> <p style="line-height: normal;" align="left"><strong><span style="color: black; font-family: 瀹嬩綋">map</span> </strong><strong><span style="color: black; font-family: 瀹嬩綋">涓棿緇撴灉涓嶅帇緙╋細</span> </strong></p> <p style="line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋"><img alt="" src="http://www.linuxidc.com/upload/2012_01/120116103468163.gif" /> <br /></span></p> <p style="line-height: normal;" align="left"><strong><span style="color: black; font-family: 瀹嬩綋">map</span> </strong><strong><span style="color: black; font-family: 瀹嬩綋">涓棿緇撴灉鍘嬬緝錛?/span> </strong></p> <p style="line-height: normal;" align="left"><strong><span style="color: black; font-family: 瀹嬩綋"><img alt="" src="http://www.linuxidc.com/upload/2012_01/120116103468164.gif" /> <br /></span></strong></p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">鍙互鐪嬪嚭錛屽悓鏍風殑job 錛屽悓鏍風殑鏁版嵁錛屽湪閲囩敤鍘嬬緝鐨勬儏鍐典笅錛宮ap 涓棿緇撴灉鑳界緝灝忓皢榪?0 鍊嶏紝濡傛灉map 鐨勭摱棰堝湪紓佺洏錛岄偅涔坖ob 鐨勬ц兘鎻愬崌灝嗕細闈炲父鍙銆?/span> </p> <p style="text-indent: 21pt; line-height: normal;" align="left"><span style="color: black; font-family: 瀹嬩綋">褰撻噰鐢╩ap 涓棿緇撴灉鍘嬬緝鐨勬儏鍐典笅錛岀敤鎴瘋繕鍙互閫夋嫨鍘嬬緝鏃訛拷錕斤拷鐢ㄥ摢縐嶅帇緙╂牸寮忚繘琛屽帇緙╋紝鐜板湪<a target="_blank" title="Hadoop">Hadoop</a> 鏀寔鐨勫帇緙╂牸寮忔湁錛?/span> GzipCodec <span style="font-family: 瀹嬩綋">錛?/span> LzoCodec <span style="font-family: 瀹嬩綋">錛?/span> BZip2Codec <span style="font-family: 瀹嬩綋">錛?/span> LzmaCodec <span style="font-family: 瀹嬩綋">絳夊帇緙╂牸寮忋傞氬父鏉ヨ錛屾兂瑕佽揪鍒版瘮杈冨鉤琛$殑</span> cpu <span style="font-family: 瀹嬩綋">鍜岀鐩樺帇緙╂瘮錛?/span> LzoCodec <span style="font-family: 瀹嬩綋">姣旇緝閫傚悎銆備絾涔熻鍙栧喅浜?/span> job <span style="font-family: 瀹嬩綋">鐨勫叿浣撴儏鍐點傜敤鎴瘋嫢鎯寵鑷閫夋嫨涓棿緇撴灉鐨勫帇緙╃畻娉曪紝鍙互璁劇疆閰嶇疆鍙傛暟錛?/span> <strong><span style="color: red">mapred.map.output.compression.codec</span> </strong>=org.apache.hadoop.io.compress.DefaultCodec <span style="font-family: 瀹嬩綋">鎴栬呭叾浠栫敤鎴瘋嚜琛岄夋嫨鐨勫帇緙╂柟寮忋?/span> </p></div><br /><div><h3><span>1.2 </span>Map side <span style="font-family: 瀹嬩綋">鐩稿叧鍙傛暟璋冧紭</span> </h3> <table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" border="1" cellpadding="0" cellspacing="0"> <tbody> <tr> <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; border-left: black 1pt solid; width: 175.3pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="234"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">閫夐」</span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">綾誨瀷</span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">榛樿鍊?/span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">鎻忚堪</span> </strong></p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">io.sort.mb </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">100 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">緙撳瓨</span> map <span style="font-family: 瀹嬩綋">涓棿緇撴灉鐨?/span> buffer <span style="font-family: 瀹嬩綋">澶у皬</span> (in MB) </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">io.sort.record.percent </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">float </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">0.05 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal">io.sort.mb <span style="font-family: 瀹嬩綋">涓敤鏉ヤ繚瀛?/span> map output <span style="font-family: 瀹嬩綋">璁板綍杈圭晫鐨勭櫨鍒嗘瘮錛屽叾浠栫紦瀛樼敤鏉ヤ繚瀛樻暟鎹?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">io.sort.spill.percent </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">float </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">0.80 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal">map <span style="font-family: 瀹嬩綋">寮濮嬪仛</span> spill <span style="font-family: 瀹嬩綋">鎿嶄綔鐨勯槇鍊?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">io.sort.factor </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">10 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">鍋?/span> merge <span style="font-family: 瀹嬩綋">鎿嶄綔鏃跺悓鏃舵搷浣滅殑</span> stream <span style="font-family: 瀹嬩綋">鏁頒笂闄愩?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">min.num.spill.for.combine </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">3 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal">combiner <span style="font-family: 瀹嬩綋">鍑芥暟榪愯鐨勬渶灝?/span> spill <span style="font-family: 瀹嬩綋">鏁?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.compress.map.output </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">boolean </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal">false </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal">map <span style="font-family: 瀹嬩綋">涓棿緇撴灉鏄惁閲囩敤鍘嬬緝</span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.map.output.compression.codec </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 73.8pt; padding-top: 0cm" valign="top" width="98"> <p style="line-height: normal">class name </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 111.4pt; padding-top: 0cm" valign="top" width="149"> <p style="line-height: normal;" align="left">org.apache.<a target="_blank" title="Hadoop">Hadoop</a>.io. </p> <p style="line-height: normal">compress.DefaultCodec </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 65.6pt; padding-top: 0cm" valign="top" width="87"> <p style="line-height: normal">map <span style="font-family: 瀹嬩綋">涓棿緇撴灉鐨勫帇緙╂牸寮?/span> </p></td></tr></tbody></table> <p>  </p> <h2><span>2 </span>Reduce side tuning <span style="font-family: 瀹嬩綋">鍙傛暟</span> </h2> <h3><span>2.1 </span>ReduceTask <span style="font-family: 瀹嬩綋">榪愯鍐呴儴鍘熺悊</span> </h3> <p><img alt="" src="http://www.linuxidc.com/upload/2012_01/120116103478481.gif" /> <br /></p> <p style="text-indent: 21pt">reduce <span style="font-family: 瀹嬩綋">鐨勮繍琛屾槸鍒嗘垚涓変釜闃舵鐨勩傚垎鍒負</span> copy->sort->reduce <span style="font-family: 瀹嬩綋">銆傜敱浜?/span> job <span style="font-family: 瀹嬩綋">鐨勬瘡涓涓?/span> map <span style="font-family: 瀹嬩綋">閮戒細鏍規嵁</span> reduce(n) <span style="font-family: 瀹嬩綋">鏁板皢鏁版嵁鍒嗘垚</span> map <span style="font-family: 瀹嬩綋">杈撳嚭緇撴灉鍒嗘垚</span> n <span style="font-family: 瀹嬩綋">涓?/span> partition <span style="font-family: 瀹嬩綋">錛屾墍浠?/span> map <span style="font-family: 瀹嬩綋">鐨勪腑闂寸粨鏋滀腑鏄湁鍙兘鍖呭惈姣忎竴涓?/span> reduce <span style="font-family: 瀹嬩綋">闇瑕佸鐞嗙殑閮ㄥ垎鏁版嵁鐨勩傛墍浠ワ紝涓轟簡浼樺寲</span> reduce <span style="font-family: 瀹嬩綋">鐨勬墽琛屾椂闂達紝</span> hadoop <span style="font-family: 瀹嬩綋">涓槸絳?/span> job <span style="font-family: 瀹嬩綋">鐨勭涓涓?/span> map <span style="font-family: 瀹嬩綋">緇撴潫鍚庯紝鎵鏈夌殑</span> reduce <span style="font-family: 瀹嬩綋">灝卞紑濮嬪皾璇曚粠瀹屾垚鐨?/span> map <span style="font-family: 瀹嬩綋">涓笅杞借</span> reduce <span style="font-family: 瀹嬩綋">瀵瑰簲鐨?/span> partition <span style="font-family: 瀹嬩綋">閮ㄥ垎鏁版嵁銆傝繖涓繃紼嬪氨鏄氬父鎵璇寸殑</span> shuffle <span style="font-family: 瀹嬩綋">錛屼篃灝辨槸</span> copy <span style="font-family: 瀹嬩綋">榪囩▼銆?/span> </p> <p><span>       Reduce task</span> <span style="font-family: 瀹嬩綋">鍦ㄥ仛</span> shuffle <span style="font-family: 瀹嬩綋">鏃訛紝瀹為檯涓婂氨鏄粠涓嶅悓鐨勫凡緇忓畬鎴愮殑</span> map <span style="font-family: 瀹嬩綋">涓婂幓涓嬭澆灞炰簬鑷繁榪欎釜</span> reduce <span style="font-family: 瀹嬩綋">鐨勯儴鍒嗘暟鎹紝鐢變簬</span> map <span style="font-family: 瀹嬩綋">閫氬父鏈夎澶氫釜錛屾墍浠ュ涓涓?/span> reduce <span style="font-family: 瀹嬩綋">鏉ヨ錛屼笅杞戒篃鍙互鏄茍琛岀殑浠庡涓?/span> map <span style="font-family: 瀹嬩綋">涓嬭澆錛岃繖涓茍琛屽害鏄彲浠ヨ皟鏁寸殑錛岃皟鏁村弬鏁頒負錛?/span> <strong><span style="color: red">mapred.reduce.parallel.copies</span> </strong><span style="font-family: 瀹嬩綋">錛?/span> default 5 <span style="font-family: 瀹嬩綋">錛夈傞粯璁ゆ儏鍐典笅錛屾瘡涓彧浼氭湁</span> 5 <span style="font-family: 瀹嬩綋">涓茍琛岀殑涓嬭澆綰跨▼鍦ㄤ粠</span> map <span style="font-family: 瀹嬩綋">涓嬫暟鎹紝濡傛灉涓涓椂闂存鍐?/span> job <span style="font-family: 瀹嬩綋">瀹屾垚鐨?/span> map <span style="font-family: 瀹嬩綋">鏈?/span> 100 <span style="font-family: 瀹嬩綋">涓垨鑰呮洿澶氾紝閭d箞</span> reduce <span style="font-family: 瀹嬩綋">涔熸渶澶氬彧鑳藉悓鏃朵笅杞?/span> 5 <span style="font-family: 瀹嬩綋">涓?/span> map <span style="font-family: 瀹嬩綋">鐨勬暟鎹紝鎵浠ヨ繖涓弬鏁版瘮杈冮傚悎</span> map <span style="font-family: 瀹嬩綋">寰堝騫朵笖瀹屾垚鐨勬瘮杈冨揩鐨?/span> job <span style="font-family: 瀹嬩綋">鐨勬儏鍐典笅璋冨ぇ錛屾湁鍒╀簬</span> reduce <span style="font-family: 瀹嬩綋">鏇村揩鐨勮幏鍙栧睘浜庤嚜宸遍儴鍒嗙殑鏁版嵁銆?/span> </p> <p><span>       reduce</span> <span style="font-family: 瀹嬩綋">鐨勬瘡涓涓笅杞界嚎紼嬪湪涓嬭澆鏌愪釜</span> map <span style="font-family: 瀹嬩綋">鏁版嵁鐨勬椂鍊欙紝鏈夊彲鑳藉洜涓洪偅涓?/span> map <span style="font-family: 瀹嬩綋">涓棿緇撴灉鎵鍦ㄦ満鍣ㄥ彂鐢熼敊璇紝鎴栬呬腑闂寸粨鏋滅殑鏂囦歡涓㈠け錛屾垨鑰呯綉緇滅灛鏂瓑絳夋儏鍐碉紝榪欐牱</span> reduce <span style="font-family: 瀹嬩綋">鐨勪笅杞藉氨鏈夊彲鑳藉け璐ワ紝鎵浠?/span> reduce <span style="font-family: 瀹嬩綋">鐨勪笅杞界嚎紼嬪茍涓嶄細鏃犱紤姝㈢殑絳夊緟涓嬪幓錛屽綋涓瀹氭椂闂村悗涓嬭澆浠嶇劧澶辮觸錛岄偅涔堜笅杞界嚎紼嬪氨浼氭斁寮冭繖嬈′笅杞斤紝騫跺湪闅忓悗灝濊瘯浠庡彟澶栫殑鍦版柟涓嬭澆錛堝洜涓鴻繖孌墊椂闂?/span> map <span style="font-family: 瀹嬩綋">鍙兘閲嶈窇錛夈傛墍浠?/span> reduce <span style="font-family: 瀹嬩綋">涓嬭澆綰跨▼鐨勮繖涓渶澶х殑涓嬭澆鏃墮棿孌墊槸鍙互璋冩暣鐨勶紝璋冩暣鍙傛暟涓猴細</span> <strong><span style="color: red">mapred.reduce.copy.backoff</span> </strong><span style="font-family: 瀹嬩綋">錛?/span> default 300 <span style="font-family: 瀹嬩綋">縐掞級銆傚鏋滈泦緹ょ幆澧冪殑緗戠粶鏈韓鏄摱棰堬紝閭d箞鐢ㄦ埛鍙互閫氳繃璋冨ぇ榪欎釜鍙傛暟鏉ラ伩鍏?/span> reduce <span style="font-family: 瀹嬩綋">涓嬭澆綰跨▼琚鍒や負澶辮觸鐨勬儏鍐點備笉榪囧湪緗戠粶鐜姣旇緝濂界殑鎯呭喌涓嬶紝娌℃湁蹇呰璋冩暣銆傞氬父鏉ヨ涓撲笟鐨勯泦緹ょ綉緇滀笉搴旇鏈夊お澶ч棶棰橈紝鎵浠ヨ繖涓弬鏁伴渶瑕佽皟鏁寸殑鎯呭喌涓嶅銆?/span> </p> <p><span>       Reduce</span> <span style="font-family: 瀹嬩綋">灝?/span> map <span style="font-family: 瀹嬩綋">緇撴灉涓嬭澆鍒版湰鍦版椂錛屽悓鏍蜂篃鏄渶瑕佽繘琛?/span> merge <span style="font-family: 瀹嬩綋">鐨勶紝鎵浠?/span> io.sort.factor <span style="font-family: 瀹嬩綋">鐨勯厤緗夐」鍚屾牱浼氬獎鍝?/span> reduce <span style="font-family: 瀹嬩綋">榪涜</span> merge <span style="font-family: 瀹嬩綋">鏃剁殑琛屼負錛岃鍙傛暟鐨勮緇嗕粙緇嶄笂鏂囧凡緇忔彁鍒幫紝褰撳彂鐜?/span> reduce <span style="font-family: 瀹嬩綋">鍦?/span> shuffle <span style="font-family: 瀹嬩綋">闃舵</span> iowait <span style="font-family: 瀹嬩綋">闈炲父鐨勯珮鐨勬椂鍊欙紝灝辨湁鍙兘閫氳繃璋冨ぇ榪欎釜鍙傛暟鏉ュ姞澶т竴嬈?/span> merge <span style="font-family: 瀹嬩綋">鏃剁殑騫跺彂鍚炲悙錛屼紭鍖?/span> reduce <span style="font-family: 瀹嬩綋">鏁堢巼銆?/span> </p> <p><span>       Reduce</span> <span style="font-family: 瀹嬩綋">鍦?/span> shuffle <span style="font-family: 瀹嬩綋">闃舵瀵逛笅杞芥潵鐨?/span> map <span style="font-family: 瀹嬩綋">鏁版嵁錛屽茍涓嶆槸绔嬪埢灝卞啓鍏ョ鐩樼殑錛岃屾槸浼氬厛緙撳瓨鍦ㄥ唴瀛樹腑錛岀劧鍚庡綋浣跨敤鍐呭瓨杈懼埌涓瀹氶噺鐨勬椂鍊欐墠鍒峰叆紓佺洏銆傝繖涓唴瀛樺ぇ灝忕殑鎺у埗灝變笉鍍?/span> map <span style="font-family: 瀹嬩綋">涓鏍峰彲浠ラ氳繃</span> io.sort.mb <span style="font-family: 瀹嬩綋">鏉ヨ瀹氫簡錛岃屾槸閫氳繃鍙﹀涓涓弬鏁版潵璁劇疆錛?/span> <strong><span style="color: red">mapred.job.shuffle.input.buffer.percent </span></strong><span style="font-family: 瀹嬩綋">錛?/span> default 0.7 <span style="font-family: 瀹嬩綋">錛夛紝榪欎釜鍙傛暟鍏跺疄鏄竴涓櫨鍒嗘瘮錛屾剰鎬濇槸璇達紝</span> shuffile <span style="font-family: 瀹嬩綋">鍦?/span> reduce <span style="font-family: 瀹嬩綋">鍐呭瓨涓殑鏁版嵁鏈澶氫嬌鐢ㄥ唴瀛橀噺涓猴細</span> 0.7 <span style="font-family: 瀹嬩綋">×</span> maxHeap of reduce task <span style="font-family: 瀹嬩綋">銆備篃灝辨槸璇達紝濡傛灉璇?/span> reduce task <span style="font-family: 瀹嬩綋">鐨勬渶澶?/span> heap <span style="font-family: 瀹嬩綋">浣跨敤閲忥紙閫氬父閫氳繃</span> mapred.child.java.opts <span style="font-family: 瀹嬩綋">鏉ヨ緗紝姣斿璁劇疆涓?/span> -Xmx1024m <span style="font-family: 瀹嬩綋">錛夌殑涓瀹氭瘮渚嬬敤鏉ョ紦瀛樻暟鎹傞粯璁ゆ儏鍐典笅錛?/span> reduce <span style="font-family: 瀹嬩綋">浼氫嬌鐢ㄥ叾</span> heapsize <span style="font-family: 瀹嬩綋">鐨?/span> 70% <span style="font-family: 瀹嬩綋">鏉ュ湪鍐呭瓨涓紦瀛樻暟鎹傚鏋?/span> reduce <span style="font-family: 瀹嬩綋">鐨?/span> heap <span style="font-family: 瀹嬩綋">鐢變簬涓氬姟鍘熷洜璋冩暣鐨勬瘮杈冨ぇ錛岀浉搴旂殑緙撳瓨澶у皬涔熶細鍙樺ぇ錛岃繖涔熸槸涓轟粈涔?/span> reduce <span style="font-family: 瀹嬩綋">鐢ㄦ潵鍋氱紦瀛樼殑鍙傛暟鏄竴涓櫨鍒嗘瘮錛岃屼笉鏄竴涓浐瀹氱殑鍊間簡銆?/span> </p> <p style="text-indent: 21.2pt"><span style="font-family: 瀹嬩綋">鍋囪</span> mapred.job.shuffle.input.buffer.percent <span style="font-family: 瀹嬩綋">涓?/span> 0.7 <span style="font-family: 瀹嬩綋">錛?/span> reduce task <span style="font-family: 瀹嬩綋">鐨?/span> max heapsize <span style="font-family: 瀹嬩綋">涓?/span> 1G <span style="font-family: 瀹嬩綋">錛岄偅涔堢敤鏉ュ仛涓嬭澆鏁版嵁緙撳瓨鐨勫唴瀛樺氨涓哄ぇ姒?/span> 700MB <span style="font-family: 瀹嬩綋">宸﹀彸錛岃繖</span> 700M <span style="font-family: 瀹嬩綋">鐨勫唴瀛橈紝璺?/span> map <span style="font-family: 瀹嬩綋">绔竴鏍鳳紝涔熶笉鏄絳夊埌鍏ㄩ儴鍐欐弧鎵嶄細寰紓佺洏鍒風殑錛岃屾槸褰撹繖</span> 700M <span style="font-family: 瀹嬩綋">涓浣跨敤鍒頒簡涓瀹氱殑闄愬害錛堥氬父鏄竴涓櫨鍒嗘瘮錛夛紝灝變細寮濮嬪線紓佺洏鍒楓傝繖涓檺搴﹂槇鍊間篃鏄彲浠ラ氳繃</span> job <span style="font-family: 瀹嬩綋">鍙傛暟鏉ヨ瀹氱殑錛岃瀹氬弬鏁頒負錛?/span> <strong><span style="color: red">mapred.job.shuffle.merge.percent</span> </strong><span style="font-family: 瀹嬩綋">錛?/span> default 0.66 <span style="font-family: 瀹嬩綋">錛夈傚鏋滀笅杞介熷害寰堝揩錛屽緢瀹規槗灝辨妸鍐呭瓨緙撳瓨鎾戝ぇ錛岄偅涔堣皟鏁翠竴涓嬭繖涓弬鏁版湁鍙兘浼氬</span> reduce <span style="font-family: 瀹嬩綋">鐨勬ц兘鏈夋墍甯姪銆?/span> </p> <p style="text-indent: 21.2pt"><span style="font-family: 瀹嬩綋">褰?/span> reduce <span style="font-family: 瀹嬩綋">灝嗘墍鏈夌殑</span> map <span style="font-family: 瀹嬩綋">涓婂搴旇嚜宸?/span> partition <span style="font-family: 瀹嬩綋">鐨勬暟鎹笅杞藉畬鎴愬悗錛屽氨浼氬紑濮嬬湡姝g殑</span> reduce <span style="font-family: 瀹嬩綋">璁$畻闃舵錛堜腑闂存湁涓?/span> sort <span style="font-family: 瀹嬩綋">闃舵閫氬父鏃墮棿闈炲父鐭紝鍑犵閽熷氨瀹屾垚浜嗭紝鍥犱負鏁翠釜涓嬭澆闃舵灝卞凡緇忔槸杈逛笅杞借竟</span> sort <span style="font-family: 瀹嬩綋">錛岀劧鍚庤竟</span> merge <span style="font-family: 瀹嬩綋">鐨勶級銆傚綋</span> reduce task <span style="font-family: 瀹嬩綋">鐪熸榪涘叆</span> reduce <span style="font-family: 瀹嬩綋">鍑芥暟鐨勮綆楅樁孌電殑鏃跺欙紝鏈変竴涓弬鏁頒篃鏄彲浠ヨ皟鏁?/span> reduce <span style="font-family: 瀹嬩綋">鐨勮綆楄涓恒備篃灝辨槸錛?/span> <strong><span style="color: red">mapred.job.reduce.input.buffer.percent</span> </strong><span style="font-family: 瀹嬩綋">錛?/span> default 0.0 <span style="font-family: 瀹嬩綋">錛夈傜敱浜?/span> reduce <span style="font-family: 瀹嬩綋">璁$畻鏃惰偗瀹氫篃鏄渶瑕佹秷鑰楀唴瀛樼殑錛岃屽湪璇誨彇</span> reduce <span style="font-family: 瀹嬩綋">闇瑕佺殑鏁版嵁鏃訛紝鍚屾牱鏄渶瑕佸唴瀛樹綔涓?/span> buffer <span style="font-family: 瀹嬩綋">錛岃繖涓弬鏁版槸鎺у埗錛岄渶瑕佸灝戠殑鍐呭瓨鐧懼垎姣旀潵浣滀負</span> reduce <span style="font-family: 瀹嬩綋">璇誨凡緇?/span> sort <span style="font-family: 瀹嬩綋">濂界殑鏁版嵁鐨?/span> buffer <span style="font-family: 瀹嬩綋">鐧懼垎姣斻傞粯璁ゆ儏鍐典笅涓?/span> 0 <span style="font-family: 瀹嬩綋">錛屼篃灝辨槸璇達紝榛樿鎯呭喌涓嬶紝</span> reduce <span style="font-family: 瀹嬩綋">鏄叏閮ㄤ粠紓佺洏寮濮嬭澶勭悊鏁版嵁銆傚鏋滆繖涓弬鏁板ぇ浜?/span> 0 <span style="font-family: 瀹嬩綋">錛岄偅涔堝氨浼氭湁涓瀹氶噺鐨勬暟鎹緙撳瓨鍦ㄥ唴瀛樺茍杈撻佺粰</span> reduce <span style="font-family: 瀹嬩綋">錛屽綋</span> reduce <span style="font-family: 瀹嬩綋">璁$畻閫昏緫娑堣楀唴瀛樺緢灝忔椂錛屽彲浠ュ垎涓閮ㄥ垎鍐呭瓨鐢ㄦ潵緙撳瓨鏁版嵁錛屽弽姝?/span> reduce <span style="font-family: 瀹嬩綋">鐨勫唴瀛橀棽鐫涔熸槸闂茬潃銆?/span> </p> <h3><span>2.2 </span>Reduce side <span style="font-family: 瀹嬩綋">鐩稿叧鍙傛暟璋冧紭</span> </h3> <table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" border="1" cellpadding="0" cellspacing="0"> <tbody> <tr> <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; border-left: black 1pt solid; width: 175.3pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="234"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">閫夐」</span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">綾誨瀷</span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">榛樿鍊?/span> </strong></p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; background: #c4bc96; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal"><strong><span style="font-family: 瀹嬩綋">鎻忚堪</span> </strong></p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.reduce.parallel.copies </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">5 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">姣忎釜</span> reduce <span style="font-family: 瀹嬩綋">騫惰涓嬭澆</span> map <span style="font-family: 瀹嬩綋">緇撴灉鐨勬渶澶х嚎紼嬫暟</span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.reduce.copy.backoff </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">300 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal">reduce <span style="font-family: 瀹嬩綋">涓嬭澆綰跨▼鏈澶х瓑寰呮椂闂達紙</span> in sec <span style="font-family: 錕斤拷浣?>錛?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">io.sort.factor </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">int </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">10 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">鍚屼笂</span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.job.shuffle.input.buffer.percent </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">float </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">0.7 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">鐢ㄦ潵緙撳瓨</span> shuffle <span style="font-family: 瀹嬩綋">鏁版嵁鐨?/span> reduce task heap <span style="font-family: 瀹嬩綋">鐧懼垎姣?/span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.job.shuffle.merge.percent </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">float </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">0.66 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal"><span style="font-family: 瀹嬩綋">緙撳瓨鐨勫唴瀛樹腑澶氬皯鐧懼垎姣斿悗寮濮嬪仛</span> merge <span style="font-family: 瀹嬩綋">鎿嶄綔</span> </p></td></tr> <tr> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 175.3pt; padding-top: 0cm" valign="top" width="234"> <p style="line-height: normal">mapred.job.reduce.input.buffer.percent </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 42.75pt; padding-top: 0cm" valign="top" width="57"> <p style="line-height: normal">float </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 49.6pt; padding-top: 0cm" valign="top" width="66"> <p style="line-height: normal">0.0 </p></td> <td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 158.45pt; padding-top: 0cm" valign="top" width="211"> <p style="line-height: normal">sort <span style="font-family: 瀹嬩綋">瀹屾垚鍚?/span> reduce <span style="font-family: 瀹嬩綋">璁$畻闃舵鐢ㄦ潵緙撳瓨鏁版嵁鐨勭櫨鍒嗘瘮</span> </p></td></tr></tbody></table><a target="_blank"><img src="http://www.linuxidc.com/linuxfile/logo.gif" alt="linux" height="17" width="15" /></a><a target="_blank"></a></div><img src ="http://m.tkk7.com/wangxinsh55/aggbug/420297.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://m.tkk7.com/wangxinsh55/" target="_blank">SIMONE</a> 2014-11-19 13:42 <a href="http://m.tkk7.com/wangxinsh55/archive/2014/11/19/420297.html#Feedback" target="_blank" style="text-decoration:none;">鍙戣〃璇勮</a></div>]]></description></item><item><title>mapreduce job璁╀竴涓枃浠跺彧鐢變竴涓猰ap鏉ュ鐞?/title><link>http://m.tkk7.com/wangxinsh55/archive/2014/09/16/417971.html</link><dc:creator>SIMONE</dc:creator><author>SIMONE</author><pubDate>Tue, 16 Sep 2014 01:28:00 GMT</pubDate><guid>http://m.tkk7.com/wangxinsh55/archive/2014/09/16/417971.html</guid><wfw:comment>http://m.tkk7.com/wangxinsh55/comments/417971.html</wfw:comment><comments>http://m.tkk7.com/wangxinsh55/archive/2014/09/16/417971.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://m.tkk7.com/wangxinsh55/comments/commentRss/417971.html</wfw:commentRss><trackback:ping>http://m.tkk7.com/wangxinsh55/services/trackbacks/417971.html</trackback:ping><description><![CDATA[<div><div> <p><div>http://www.rigongyizu.com/mapreduce-job-one-map-process-one-file/</div><br /></p><p>鏈変竴鎵規暟鎹敤<a title="鏌ョ湅hadoop涓殑鍏ㄩ儴鏂囩珷" target="_blank">hadoop</a> mapreduce job澶勭悊鏃訛紝涓氬姟鐗圭偣瑕佹眰涓涓枃浠跺搴斾竴涓猰ap鏉ュ鐞嗭紝濡傛灉涓や釜鎴栧涓猰ap澶勭悊浜嗗悓涓涓枃浠訛紝鍙兘浼氭湁闂銆傚紑濮嬫兂閫氳繃璁劇疆 dfs.blocksize 鎴栬?mapreduce.input.file<a title="鏌ョ湅inputformat涓殑鍏ㄩ儴鏂囩珷" target="_blank">inputformat</a>.split.minsize/maxsize 鍙傛暟鏉ユ帶鍒秏ap鐨勪釜鏁幫紝鍚庢潵鎯沖埌鍏跺疄涓嶇敤榪欎箞澶嶆潅錛屽湪鑷畾涔夌殑InputFormat閲岄潰鐩存帴璁╂枃浠朵笉瑕佽繘琛宻plit灝卞彲浠ヤ簡銆?/p> <div nogutter="" "="" id="highlighter_665528"><div><div alt1"=""><table><tbody><tr><td><code>public</code> <code>class</code> <code>CustemDocInputFormat </code><code>extends</code> <code>TextInputFormat {</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td> </td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>    </code><code>@Override</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>    </code><code>public</code> <code>RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) {</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>        </code><code>DocRecordReader reader = </code><code>null</code><code>;</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>        </code><code>try</code> <code>{</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>            </code><code>reader = </code><code>new</code> <code>DocRecordReader(); </code><code>// 鑷畾涔夌殑reader</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>        </code><code>} </code><code>catch</code> <code>(IOException e) {</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>            </code><code>e.printStackTrace();</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>        </code><code>}</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>        </code><code>return</code> <code>reader;</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>    </code><code>}</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td> </td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>    </code><code>@Override</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>    </code><code>protected</code> <code>boolean</code> <code>isSplitable(JobContext context, Path file) {</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>        </code><code>return</code> <code>false</code><code>;</code></td></tr></tbody></table></div><div alt1"=""><table><tbody><tr><td><code>    </code><code>}</code></td></tr></tbody></table></div><div alt2"=""><table><tbody><tr><td><code>}</code></td></tr></tbody></table></div></div></div> <p>榪欐牱錛岃緭鍏ユ枃浠舵湁澶氬皯涓紝job灝變細鍚姩澶氬皯涓猰ap浜嗐?/p> <div wp_rp_plain"="" id="wp_rp_first"><div><h3>鐩稿叧鏂囩珷</h3><ul wp_rp"=""><li data-position="0" data-poid="in-1093" data-post-type="none"><small>2014騫?鏈?9鏃?/small> <a >Hadoop : 涓涓洰褰曚笅鐨勬暟鎹彧鐢變竴涓猰ap澶勭悊</a></li><li data-position="1" data-poid="in-1074" data-post-type="none"><small>2014騫?鏈?7鏃?/small> <a >涓涓狧adoop紼嬪簭鐨勪紭鍖栬繃紼?– 鏍規嵁鏂囦歡瀹為檯澶у皬瀹炵幇CombineFileInputFormat</a></li><li data-position="2" data-poid="in-818" data-post-type="none"><small>2013騫?鏈?3鏃?/small> <a >hadoop鐢∕ultipleInputs/MultiInputFormat瀹炵幇涓涓猰apreduce job涓鍙栦笉鍚屾牸寮忕殑鏂囦歡</a></li><li data-position="3" data-poid="in-289" data-post-type="none"><small>2012騫?鏈?鏃?/small> <a >hadoop mapreduce鍜宧ive涓嬌鐢⊿equeceFile+lzo鏍煎紡鏁版嵁</a></li><li data-position="4" data-poid="in-939" data-post-type="none"><small>2014騫?鏈?1鏃?/small> <a >hadoop闆嗙兢DataNode璧蜂笉鏉ワ細“DiskChecker$DiskErrorException: Invalid volume failure config value: 1”</a></li></ul></div></div> </div></div><img src ="http://m.tkk7.com/wangxinsh55/aggbug/417971.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://m.tkk7.com/wangxinsh55/" target="_blank">SIMONE</a> 2014-09-16 09:28 <a href="http://m.tkk7.com/wangxinsh55/archive/2014/09/16/417971.html#Feedback" target="_blank" style="text-decoration:none;">鍙戣〃璇勮</a></div>]]></description></item><item><title>hadoop鐢∕ultipleInputs/MultiInputFormat瀹炵幇涓涓猰apreduce job涓鍙栦笉鍚屾牸寮忕殑鏂囦歡http://m.tkk7.com/wangxinsh55/archive/2014/09/16/417969.htmlSIMONESIMONETue, 16 Sep 2014 01:27:00 GMThttp://m.tkk7.com/wangxinsh55/archive/2014/09/16/417969.htmlhttp://m.tkk7.com/wangxinsh55/comments/417969.htmlhttp://m.tkk7.com/wangxinsh55/archive/2014/09/16/417969.html#Feedback0http://m.tkk7.com/wangxinsh55/comments/commentRss/417969.htmlhttp://m.tkk7.com/wangxinsh55/services/trackbacks/417969.htmlhttp://www.rigongyizu.com/use-multiinputformat-read-different-files-in-one-job/

hadoop涓彁渚涗簡 MultiOutputFormat 鑳藉皢緇撴灉鏁版嵁杈撳嚭鍒頒笉鍚岀殑鐩綍錛屼篃鎻愪緵浜?FileInputFormat 鏉ヤ竴嬈¤鍙栧涓洰褰曠殑鏁版嵁錛屼絾鏄粯璁や竴涓猨ob鍙兘浣跨敤 job.setInputFormatClass 璁劇疆浣跨敤涓涓猧nputfomat澶勭悊涓縐嶆牸寮忕殑鏁版嵁銆傚鏋滈渶瑕佸疄鐜?鍦ㄤ竴涓猨ob涓悓鏃惰鍙栨潵鑷笉鍚岀洰褰曠殑涓嶅悓鏍煎紡鏂囦歡 鐨勫姛鑳斤紝灝遍渶瑕佽嚜宸卞疄鐜頒竴涓?MultiInputFormat 鏉ヨ鍙栦笉鍚屾牸寮忕殑鏂囦歡浜?鍘熸潵宸茬粡鎻愪緵浜?a title="MultipleInputs" target="_blank">MultipleInputs)銆?/p>

渚嬪錛氭湁涓涓猰apreduce job闇瑕佸悓鏃惰鍙栦袱縐嶆牸寮忕殑鏁版嵁錛屼竴縐嶆牸寮忔槸鏅氱殑鏂囨湰鏂囦歡錛岀敤 LineRecordReader 涓琛屼竴琛岃鍙栵紱鍙﹀涓縐嶆枃浠舵槸浼猉ML鏂囦歡錛岀敤鑷畾涔夌殑AJoinRecordReader璇誨彇銆?/p>

鑷繁瀹炵幇浜嗕竴涓畝鍗曠殑 MultiInputFormat 濡備笅錛?/p>

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.LineRecordReader;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 
public class MultiInputFormat extends TextInputFormat {
 
    @Override
    public RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) {
        RecordReader reader = null;
        try {
            String inputfile = ((FileSplit) split).getPath().toString();
            String xmlpath = context.getConfiguration().get("xml_prefix");
            String textpath = context.getConfiguration().get("text_prefix");
 
            if (-1 != inputfile.indexOf(xmlpath)) {
                reader = new AJoinRecordReader();
            } else if (-1 != inputfile.indexOf(textpath)) {
                reader = new LineRecordReader();
            } else {
                reader = new LineRecordReader();
            }
        } catch (IOException e) {
            // do something ...
        }
 
        return reader;
    }
}

鍏跺疄鍘熺悊寰堢畝鍗曪紝灝辨槸鍦?createRecordReader 鐨勬椂鍊欙紝閫氳繃 ((FileSplit) split).getPath().toString() 鑾峰彇鍒板綋鍓嶈澶勭悊鐨勬枃浠跺悕錛岀劧鍚庢牴鎹壒寰佸尮閰嶏紝閫夊彇瀵瑰簲鐨?RecordReader 鍗沖彲銆倄ml_prefix鍜宼ext_prefix鍙互鍦ㄧ▼搴忓惎鍔ㄦ椂閫氳繃 -D 浼犵粰Configuration銆?/p>

姣斿鏌愭鎵ц鎵撳嵃鐨勫煎涓嬶細

inputfile=hdfs://test042092.sqa.cm4:9000/test/input_xml/common-part-00068
xmlpath_prefix=hdfs://test042092.sqa.cm4:9000/test/input_xml
textpath_prefix=hdfs://test042092.sqa.cm4:9000/test/input_txt

榪欓噷鍙槸閫氳繃綆鍗曠殑鏂囦歡璺緞鍜屾爣紺虹鍖歸厤鏉ュ仛錛屼篃鍙互閲囩敤鏇村鏉傜殑鏂規硶錛屾瘮濡傛枃浠跺悕銆佹枃浠跺悗緙絳夈?/p>

鎺ョ潃鍦╩ap綾諱腑錛屼篃鍚屾牱鍙互鏍規嵁涓嶅悓鐨勬枃浠跺悕鐗瑰緛榪涜涓嶅悓鐨勫鐞嗭細

@Override
public void map(LongWritable offset, Text inValue, Context context)
        throws IOException {
 
    String inputfile = ((FileSplit) context.getInputSplit()).getPath()
            .toString();
 
    if (-1 != inputfile.indexOf(textpath)) {
        ......
    } else if (-1 != inputfile.indexOf(xmlpath)) {
        ......
    } else {
        ......
    }
}

榪欑鏂瑰紡澶湡浜嗭紝鍘熸潵hadoop閲岄潰宸茬粡鎻愪緵浜?MultipleInputs 鏉ュ疄鐜板涓涓洰褰曟寚瀹氫竴涓?a title="鏌ョ湅inputformat涓殑鍏ㄩ儴鏂囩珷" target="_blank">inputformat鍜屽搴旂殑map澶勭悊綾匯?/p>

MultipleInputs.addInputPath(conf, new Path("/foo"), TextInputFormat.class,
   MapClass.class);
MultipleInputs.addInputPath(conf, new Path("/bar"),
   KeyValueTextInputFormat.class, MapClass2.class);


SIMONE 2014-09-16 09:27 鍙戣〃璇勮
]]>
涓涓狧adoop紼嬪簭鐨勪紭鍖栬繃紼?鈥?鏍規嵁鏂囦歡瀹為檯澶у皬瀹炵幇CombineFileInputFormathttp://m.tkk7.com/wangxinsh55/archive/2014/09/16/417968.htmlSIMONESIMONETue, 16 Sep 2014 01:25:00 GMThttp://m.tkk7.com/wangxinsh55/archive/2014/09/16/417968.htmlhttp://m.tkk7.com/wangxinsh55/comments/417968.htmlhttp://m.tkk7.com/wangxinsh55/archive/2014/09/16/417968.html#Feedback1http://m.tkk7.com/wangxinsh55/comments/commentRss/417968.htmlhttp://m.tkk7.com/wangxinsh55/services/trackbacks/417968.htmlhttp://www.rigongyizu.com/hadoop-job-optimize-combinefileinputformat/

鏌愭棩錛屾帴鎵嬩簡鍚屼簨鍐欑殑浠?a title="鏌ョ湅hadoop涓殑鍏ㄩ儴鏂囩珷" target="_blank">Hadoop闆嗙兢鎷瘋礉鏁版嵁鍒板彟澶栦竴涓泦緹ょ殑紼嬪簭錛岃紼嬪簭鏄繍琛屽湪Hadoop闆嗙兢涓婄殑job銆傝繖涓猨ob鍙湁map闃舵錛岃鍙杊dfs鐩綍涓嬫暟鎹殑鏁版嵁錛岀劧鍚庡啓鍏ュ埌鍙﹀涓涓泦緹ゃ?/p>

鏄劇劧錛岃繖涓▼搴忔病鏈夎冭檻澶ф暟鎹噺鐨勬儏鍐碉紝濡傛灉杈撳叆鐩綍涓嬫枃浠跺緢澶氭垨鏁版嵁閲忓緢澶э紝灝變細瀵艱嚧map鏁板緢澶氥傝屽疄闄呬笂鎴戜滑闇瑕佹嫹璐濈殑涓涓暟鎹簮灝辨湁榪?6T錛宩ob鍚姩璧鋒潵鏈?w澶氫釜map錛屼竴涓嬪瓙鏁翠釜queue鐨勮祫婧愬氨鍗犳弧浜嗐傝櫧鐒墮氳繃璋冩暣涓浜涘弬鏁板彲浠ユ帶鍒秏ap鏁?涔熷氨鏄茍鍙戞暟)錛屼絾鏄棤娉曞噯紜殑鎺?鍒秏ap鏁幫紝鑰屼笖鎹釜鏁版嵁婧愬張寰楅噸鏂伴厤緗弬鏁般?/p>

絎竴涓敼榪涚殑鐗堟湰鏄紝鍔犱簡Reduce榪囩▼錛屼互鏈熸湜閫氳繃璁劇疆Reduce鏁伴噺鏉ユ帶鍒跺茍鍙戞暟銆傝繖鏍瘋櫧鐒惰兘綺劇‘鍦版帶鍒跺茍鍙戞暟錛屼絾鏄鍔犱簡shuffle 榪囩▼錛屽疄闄呰繍琛屼腑鍙戠幇杈撳叆鏁版嵁鏈夊炬枩錛堣宲artition鐨刱ey鐢變簬涓氬姟闇瑕佹棤娉曟洿鏀癸級錛屽鑷撮儴鍒嗘満鍣ㄧ綉緇滆鎵撴弧錛屼粠鑰屽獎鍝嶅埌浜嗛泦緹や腑鐨勫叾浠栧簲鐢ㄣ傚嵆 浣塊氳繃 mapred.reduce.parallel.copies 鍙傛暟鏉ラ檺鍒秙huffle涔熸槸娌繪爣涓嶆不鏈傝繖涓鉤鐧藉鍔犵殑shuffle榪囩▼瀹為檯涓婃氮璐逛簡寰堝緗戠粶甯﹀鍜孖O銆?/p>

鏈鐞嗘兂鐨勬儏鍐靛綋鐒舵槸鍙湁map闃舵錛岃屼笖鑳藉鍑嗙‘鐨勬帶鍒跺茍鍙戞暟浜嗐?/p>

浜庢槸錛岀浜屼釜浼樺寲鐗堟湰璇炵敓浜嗐傝繖涓猨ob鍙湁map闃舵錛岄噰鐢?a title="CombineFileInputFormat" target="_blank">CombineFileInputFormat錛?瀹冨彲浠ュ皢澶氫釜灝忔枃浠舵墦鍖呮垚涓涓狪nputSplit鎻愪緵緇欎竴涓狹ap澶勭悊錛岄伩鍏嶅洜涓哄ぇ閲忓皬鏂囦歡闂錛屽惎鍔ㄥぇ閲弇ap銆傞氳繃 mapred.max.split.size 鍙傛暟鍙互澶ф鍦版帶鍒跺茍鍙戞暟銆傛湰浠ヤ負榪欐牱灝辮兘瑙e喅闂浜嗭紝緇撴灉鍙堝彂鐜頒簡鏁版嵁鍊炬枩鐨勯棶棰樸傝繖縐嶇矖鐣ュ湴鍒唖plits鐨勬柟寮忥紝瀵艱嚧鏈夌殑map澶勭悊鐨勬暟鎹皯錛屾湁鐨?map澶勭悊鐨勬暟鎹錛屽茍涓嶅潎鍖銆傚嚑涓嫋鍚庨鐨刴ap灝卞鑷磈ob鐨勫疄闄呰繍琛屾椂闂撮暱浜嗕竴鍊嶅銆?/p>

鐪嬫潵鍙湁璁╂瘡涓猰ap澶勭悊鐨勬暟鎹噺涓鏍峰錛屾墠鑳藉畬緹庣殑瑙e喅榪欎釜闂浜嗐?/p>

絎笁涓増鏈篃璇炵敓浜嗭紝榪欐鏄噸鍐欎簡CombineFileInputFormat錛岃嚜宸卞疄鐜癵etSplits鏂規硶銆傜敱浜庤緭鍏ユ暟鎹負SequenceFile鏍煎紡錛屽洜姝ら渶瑕佷竴涓猄equenceFileRecordReaderWrapper綾匯?/p>

瀹炵幇浠g爜濡備笅錛?br /> CustomCombineSequenceFileInputFormat.java

import java.io.IOException;
 
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader;
import org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReaderWrapper;
import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
 
/**
 * Input format that is a <code>CombineFileInputFormat</code>-equivalent for
 * <code>SequenceFileInputFormat</code>.
 *
 * @see CombineFileInputFormat
 */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class CustomCombineSequenceFileInputFormat<K, V> extends MultiFileInputFormat<K, V> {
    @SuppressWarnings({"rawtypes", "unchecked"})
    public RecordReader<K, V> createRecordReader(InputSplit split, TaskAttemptContext context)
            throws IOException {
        return new CombineFileRecordReader((CombineFileSplit) split, context,
                SequenceFileRecordReaderWrapper.class);
    }
 
    /**
     * A record reader that may be passed to <code>CombineFileRecordReader</code> so that it can be
     * used in a <code>CombineFileInputFormat</code>-equivalent for
     * <code>SequenceFileInputFormat</code>.
     *
     * @see CombineFileRecordReader
     * @see CombineFileInputFormat
     * @see SequenceFileInputFormat
     */
    private static class SequenceFileRecordReaderWrapper<K, V>
            extends CombineFileRecordReaderWrapper<K, V> {
        // this constructor signature is required by CombineFileRecordReader
        public SequenceFileRecordReaderWrapper(CombineFileSplit split, TaskAttemptContext context,
                Integer idx) throws IOException, InterruptedException {
            super(new SequenceFileInputFormat<K, V>(), split, context, idx);
        }
    }
}

MultiFileInputFormat.java

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
 
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit;
 
/**
 * multiple files can be combined in one InputSplit so that InputSplit number can be limited!
 */
public abstract class MultiFileInputFormat<K, V> extends CombineFileInputFormat<K, V> {
 
    private static final Log LOG = LogFactory.getLog(MultiFileInputFormat.class);
    public static final String CONFNAME_INPUT_SPLIT_MAX_NUM = "multifileinputformat.max_split_num";
    public static final Integer DEFAULT_MAX_SPLIT_NUM = 50;
 
    public static void setMaxInputSplitNum(Job job, Integer maxSplitNum) {
        job.getConfiguration().setInt(CONFNAME_INPUT_SPLIT_MAX_NUM, maxSplitNum);
    }
 
    @Override
    public List<InputSplit> getSplits(JobContext job) throws IOException {
        // get all the files in input path
        List<FileStatus> stats = listStatus(job);
        List<InputSplit> splits = new ArrayList<InputSplit>();
        if (stats.size() == 0) {
            return splits;
        }
        // 璁$畻split鐨勫鉤鍧囬暱搴?/code>
        long totalLen = 0;
        for (FileStatus stat : stats) {
            totalLen += stat.getLen();
        }
        int maxSplitNum = job.getConfiguration().getInt(CONFNAME_INPUT_SPLIT_MAX_NUM, DEFAULT_MAX_SPLIT_NUM);
        int expectSplitNum = maxSplitNum < stats.size() ? maxSplitNum : stats.size();
        long averageLen = totalLen / expectSplitNum;
        LOG.info("Prepare InputSplit : averageLen(" + averageLen + ") totalLen(" + totalLen
                + ") expectSplitNum(" + expectSplitNum + ") ");
        // 璁劇疆inputSplit
        List<Path> pathLst = new ArrayList<Path>();
        List<Long> offsetLst = new ArrayList<Long>();
        List<Long> lengthLst = new ArrayList<Long>();
        long currentLen = 0;
        for (int i = 0; i < stats.size(); i++) {
            FileStatus stat = stats.get(i);
            pathLst.add(stat.getPath());
            offsetLst.add(0L);
            lengthLst.add(stat.getLen());
            currentLen += stat.getLen();
            if (splits.size() < expectSplitNum - 1   && currentLen > averageLen) {
                Path[] pathArray = new Path[pathLst.size()];
                CombineFileSplit thissplit = new CombineFileSplit(pathLst.toArray(pathArray),
                    getLongArray(offsetLst), getLongArray(lengthLst), new String[0]);
                LOG.info("combineFileSplit(" + splits.size() + ") fileNum(" + pathLst.size()
                        + ") length(" + currentLen + ")");
                splits.add(thissplit);
                //
                pathLst.clear();
                offsetLst.clear();
                lengthLst.clear();
                currentLen = 0;
            }
        }
        if (pathLst.size() > 0) {
            Path[] pathArray = new Path[pathLst.size()];
            CombineFileSplit thissplit =
                    new CombineFileSplit(pathLst.toArray(pathArray), getLongArray(offsetLst),
                            getLongArray(lengthLst), new String[0]);
            LOG.info("combineFileSplit(" + splits.size() + ") fileNum(" + pathLst.size()
                    + ") length(" + currentLen + ")");
            splits.add(thissplit);
        }
        return splits;
    }
 
    private long[] getLongArray(List<Long> lst) {
        long[] rst = new long[lst.size()];
        for (int i = 0; i < lst.size(); i++) {
            rst[i] = lst.get(i);
        }
        return rst;
    }
}

閫氳繃 multifileinputformat.max_split_num 鍙傛暟灝卞彲浠ヨ緝涓哄噯紜殑鎺у埗map鏁伴噺錛岃屼笖浼氬彂鐜版瘡涓猰ap澶勭悊鐨勬暟鎹噺寰堝潎鍖銆傝嚦姝わ紝闂鎬葷畻瑙e喅浜嗐?/p>



SIMONE 2014-09-16 09:25 鍙戣〃璇勮
]]>
主站蜘蛛池模板: 中文字幕免费在线观看动作大片| 亚洲AV成人精品日韩一区| 拍拍拍无挡免费视频网站| 久久亚洲AV无码西西人体| 国产一级一毛免费黄片| 亚洲国产第一站精品蜜芽| 最好免费观看高清在线| 久久亚洲日韩精品一区二区三区| 免费人成网站在线观看不卡| 亚洲天天做日日做天天看| 日韩免费高清大片在线| 亚洲成A∨人片在线观看无码| 在线天堂免费观看.WWW| 亚洲精品av无码喷奶水糖心| 免费看国产一级片| 国产精品成人69XXX免费视频| 亚洲日韩精品射精日| 84pao强力永久免费高清| 亚洲免费视频观看| 日本特黄特色免费大片| 一级女人18片毛片免费视频| 亚洲AV无码一区二区乱孑伦AS | 免费福利视频导航| 中文字幕无码精品亚洲资源网久久| 男女交性永久免费视频播放| 九九久久国产精品免费热6 | 中文字幕人成人乱码亚洲电影| 久久午夜无码免费| 亚洲AV成人一区二区三区在线看| 免费a级毛片无码av| 久久99青青精品免费观看| 亚洲综合成人婷婷五月网址| 国产成人亚洲影院在线观看| 69视频免费观看l| 国产亚洲视频在线观看| 亚洲网站在线观看| 亚洲av区一区二区三| 久久午夜伦鲁片免费无码| 国产亚洲精品仙踪林在线播放| 亚洲AV综合色区无码一区| 国内自产拍自a免费毛片|