锘??xml version="1.0" encoding="utf-8" standalone="yes"?>亚洲一区日韩高清中文字幕亚洲,激情小说亚洲图片,亚洲日本中文字幕区http://m.tkk7.com/Skynet/category/41197.htmlzh-cnFri, 11 Dec 2009 14:39:25 GMTFri, 11 Dec 2009 14:39:25 GMT60鍜?涓氬姟璁ㄨ鐨?鎺ㄨ崘http://m.tkk7.com/Skynet/archive/2009/12/11/305591.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Fri, 11 Dec 2009 08:20:00 GMThttp://m.tkk7.com/Skynet/archive/2009/12/11/305591.htmlhttp://m.tkk7.com/Skynet/comments/305591.htmlhttp://m.tkk7.com/Skynet/archive/2009/12/11/305591.html#Feedback2http://m.tkk7.com/Skynet/comments/commentRss/305591.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/305591.html
瀹氫箟錛?
鐏扮緤緹?nbsp;  錛堟棤涓昏鐨勭敤鎴風兢浣擄級
榛戠緤   錛?瀵硅嚜宸遍渶瑕佷粈涔堟湁鏄庣‘鐨勮璇嗭紝鎴戜滑涓鑸О涓轟笓瀹剁敤鎴楓?nbsp; 錛?br />

1. 鍖哄垎 鐏幫紙鏃犱富瑙侊級 榛?緹婄兢

2.
user session 鍏寵仈 #褰?鍏寵仈鍏崇郴緇存姢浣跨敤 鐢ㄦ埛鐨勪細璇滻D錛堢敤鎴蜂笉鍚屽績鎯咃紝璧峰鍦ㄦ暟鎹腑灝卞簲璇ユ槸涓嶅悓鍒嗙被鐨勶級
user 鎺ㄨ崘         # 鑰屾帹鑽愬嚭浜у搧 榪樻槸 璺?鐢ㄦ埛鍞竴緙栧彿鏈夊叧
#鍦ㄦ帹鑽愪腑闇瑕佹弿榪?nbsp; 鐢ㄦ埛鐨勫瑙掑害 闂

3.
铔姏鎺ㄨ崘錛?鍏ㄦ暟鎹?錛涙弿榪板垵鏈熸竻媧楀悗鐨勬暟鎹?錛?閫傚悎 浜у搧鍏寵仈 
娓呮礂鍚庢湡鐨勬暟鎹紙鍖呭惈鐢ㄦ埛澶氱淮搴︽弿榪幫級 閫傚悎   鐢ㄦ埛鍏寵仈


4.
涓撳璺熼殢鎺ㄨ崘
鎻忚堪錛?br />   鐢ㄦ埛鍒嗙被 鎵懼埌榛戠壞緹? 
  鎵懼埌 涓緹ょ伆緇電緤 鍜?涓鍙粦緇電緤鐨勫叧鑱斿叧緋?br />   璁?涓緹ょ伆緇電緤 鍙互鐪?榛戠壞緹?鍔ㄤ綔





]]>
鏂囦歡瀛樺偍 - 鏁版嵁緇撴瀯( py )http://m.tkk7.com/Skynet/archive/2009/11/04/301072.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Wed, 04 Nov 2009 07:16:00 GMThttp://m.tkk7.com/Skynet/archive/2009/11/04/301072.htmlhttp://m.tkk7.com/Skynet/comments/301072.htmlhttp://m.tkk7.com/Skynet/archive/2009/11/04/301072.html#Feedback0http://m.tkk7.com/Skynet/comments/commentRss/301072.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/301072.html  浣嗘槸褰撴垜鍒濈暐瀛︿範涓?鏁版嵁鎸栨帢鏂歸潰鐨勪竴浜涚煡璇嗗彂鐜幫紝鍏崇郴鏁版嵁搴撹繙榪滀笉澶熸潵瀛樺偍錛屾煡璇?etl 鍚庣殑鏁版嵁

姣斿錛氭垜甯屾湜鍘熷鏃ュ織鏁版嵁榪涜鏌愪竴瀛楁鐨勬帓搴?鏄笉鏄緢綆鍗?銆?
  鏈変漢璇?nbsp; - 鏁版嵁瀵煎叆鏁版嵁搴?load into table ... 錛?select order by 銆備箣
  榪樻湁浜鴻 - linux sort -n...

鎭╋紒寰堝ソ錛屼笅闈㈡垜浠澶у皬涓?1TB 鐨勬暟鎹紑濮嬭繘琛岃繖涓畝鍗曠殑鎿嶄綔   -- 鍌葷溂浜?錛侊紒
   鍏充簬鎸栨帢 - TB 綰у埆鐨勬暟閲忓湪鎴戠洰鍓嶅涔犳寲鎺樹笉鍒板崐騫達紝灝遍亣鍒拌繃3-4嬈′箣澶?br />
瑙e喅鍔炴硶:
瀵逛簬榪欎釜闂 - 鎴戠幇鍦ㄥ笇鏈涜兘鏈変釜 澶х殑閾捐〃 - 錛堝ぇ鍒板唴瀛樿涓嶄笅錛?/strong>錛?br />   閾捐〃涓殑struct 緇撴瀯涓?:
   >> 鎺掑簭灞炴ф枃浠跺綊灞?br />    >> 鎺掑簭灞炴ф暣鏉℃暟鎹湪鏂囦歡涓殑 璧峰浣嶇疆 - 緇撴潫浣嶇疆
   >> 鍦ㄦ帓搴忎腑鐨勬帓浣?錛?閾捐〃緇撴瀯,鍙鍏ユ瘮鑷繁灝忕殑 灞炴у湪姝ら摼琛ㄧ殑浣嶇疆  錛?br />

姣斿 :
  1. 鏂囦歡1鍐呭 =>
璇存槑:
瀹屾暣鏁版嵁鎻忚堪 : 姝ゆ暟鎹湪鏂囦歡涓殑 璧峰浣嶇疆錛堝綋鐒舵槸閫氳繃紼嬪簭鍙栧緱鐨勶紝榪欎負浜嗘柟渚挎垜鏍囧嚭錛?/strong>
..c.  0 
- 22
..a.  
23 - 55
..b.  
56- 76
..d.  
77 - 130
..f.  
131 - 220
..e.  
221 - 243

  2. 鏁版嵁緇撴瀯棰勫紑絀洪棿 100 byte
  3. 鏂囦歡瀛樺偍鍦ㄦ弿榪?: # 閾捐〃鎺掑簭鎴戝氨涓嶄粙緇嶄簡錛屾暟鎹粨鏋勭殑鏈鍩烘湰鎶鑳斤紝淇敼鏁版嵁緇撴瀯涓殑姣旇嚜宸卞皬鐨勬寚鍚?
      鎴戣繖灝辯粰鍑虹粨鏋?br /> { /tmp/鏂囦歡1, 0-22 ,  300 }   #璇存槑 c 錛?鍦ㄩ摼琛ㄤ綅緗?0
{ /tmp/鏂囦歡1, 23-55 , 200 }       # a 錛?100
{ /tmp/鏂囦歡1, 56-76 , 0 }     # b : 200
{ /tmp/鏂囦歡1, 77-130 , 500 }  # d : 300
{ /tmp/鏂囦歡1, 131-220 ,  } # f : 400
{ /tmp/鏂囦歡1, 221-243 , 400 } # e : 500

4. 鍊掑彊杈撳嚭 鐢卞皬鍒板埌
     鍋囪棰勫瓨鏈灝?涓?nbsp; 200 閾捐〃浣嶇疆
     鎵懼嚭 浣跨敤 open /tmp/鏂囦歡1 
       騫朵嬌鐢?seek 鏂囦歡娓告爣 瀹氫綅  23-55 鍙栧嚭  ..a...
       鏍規嵁 閾捐〃涓?200 鍒?seek 56 76 鍙栧嚭 ..b...
       絳夌瓑

褰撶劧 涓婇潰
  鏁版嵁緇撴瀯浣犲彲浠ヤ嬌鐢?鍙屽悜閾捐〃錛?btree , 綰㈤粦 , 鏂愭嘗閭e銆傘傘傦紙 鏁版嵁緇撴瀯緇堜簬鎰熻鏈夌敤浜嗭紝涓嶆瀴璐規垜鑰冪殑杞瘉鍟婏紒錛?br />

閫氳繃璇存槑錛屾垜榪?緇欏ぇ瀹舵彁渚涗釜 鍙兘闇瑕佺殑 鎶鏈粏鑺?(py),涓嶈凍涔嬪 嬈㈣繋鎷嶇爾錛侊紒

1. 浜岃繘鍒舵枃浠?緇撴瀯鍖?鍐欙紝淇敼
#鎸囧畾淇敼 190 byte 澶勭殑 鍐呭
import os
from struct import *
fd 
= os.open( "pack1.txt", os.O_RDWR|os.O_CREAT )

ss 
= pack('ii11s'34'google')
os.lseek(fs, len(ss)
*10, 0) 
os.write(fs,ss) 
os.fsync(fs)

#os.close( fs )



2. seek 鎸囧畾浣嶇疆緇撴瀯鍖栬鍙?br />


from struct import *
file_object 
= open('pack1.txt''rb')

def ts(si,ss=len(ss)):
    file_object.seek(si
*ss)
    chunk 
= file_object.read(ss)
    a,b,c
=unpack('ii11s', chunk )
    
print a,b,c

ts(10)
#杈撳嚭 
3 4 google





1. 鍏朵粬璇█鐨?浣跨敤
struct 緇撴瀯瀹氫箟 ,鍦?python 涓?浣跨敤  struct 鍖咃紝榪欐牱搴忓垪鍑烘潵鐨勬暟鎹埌鏂囦歡涓叾浠栬璦涔熷彲浠ヤ嬌鐢?
 鍙傝? http://www.pythonid.com/bbs/archiver/?tid-285.html
pack1.py
from struct import *

# i 涓?int錛?錛?nbsp; 11s 涓洪鐣?11 浣嶇疆 鐨?string
# 姝ゆ暟鎹被鍨?涓?19 byte ss 
= pack('ii11s'12'hello world')

= open("pack1.txt""wb")
f.write(ss)
f.close()


涓婇潰鐨勪唬鐮佸線C鐨勭粨鏋勪腑鍐欏叆鏁版嵁錛岀粨鏋勫寘鎷袱涓暣鍨嬪拰涓涓瓧絎︿覆銆?br /> pack1.c
#include <stdio.h>
#
include <string.h>

struct AA
{
    int a;
    int b;
    char    c[
64];
};

int main()
{
    struct AA   aa;
    FILE    
*fp;
    int     size, readsize;
      
    memset(
&aa, 0, sizeof(struct AA));
   
    fp 
= fopen("pack1.txt""rb");
    
if (NULL == fp) {
        printf(
"open file error!"n");
        
return 0;
    }
   
    readsize 
= sizeof(struct AA);
    printf(
"readsize: %d"n", readsize);
  
    size 
= fread(&aa, 1, readsize, fp);   
    printf(
"read: %d"n", size);
    printf(
"a=%d, b=%d, c=%s"n", aa.a, aa.b, aa.c);
   
    fclose(fp);
   
    
return 0;
}

緇撴灉杈撳嚭:
C:"Documents and Settings"lky"妗岄潰"dataStructure>a
readsize: 72
read: 57
a=1, b=2, c=hello word



   
鏈鍚庣綏鍡︿笅錛?/strong>
  鑳界敤鏁版嵁緇撴瀯浜嗭紝寰堝涓滆タ閮藉彲浠ユ牴鎹嚜宸遍昏緫瀹氬埗 瀛樺偍寰堟柟渚?銆?涓嶅啀鍙?鍏崇郴鏁版嵁搴?, key 鏁版嵁搴?鎴?mapreduce 鐨勯檺鍒?
  
鍙傝?
http://docs.python.org/library/struct.html#module-struct    #瀹樻柟struct 鍖?璇存槑
http://blog.csdn.net/JGood/archive/2009/06/22/4290158.aspx  # 浣跨敤 struct  鐨勫墠杈堢暀涓嬬殑
http://www.tutorialspoint.com/python/os_lseek.htm #涓涓皬demo
Python澶╁ぉ緹庡懗(17) - open璇誨啓鏂囦歡










]]>
鏁版嵁鎸栨帢 嫻佺▼綆浠?/title><link>http://m.tkk7.com/Skynet/archive/2009/11/03/300946.html</link><dc:creator>鍒樺嚡姣?/dc:creator><author>鍒樺嚡姣?/author><pubDate>Tue, 03 Nov 2009 09:44:00 GMT</pubDate><guid>http://m.tkk7.com/Skynet/archive/2009/11/03/300946.html</guid><wfw:comment>http://m.tkk7.com/Skynet/comments/300946.html</wfw:comment><comments>http://m.tkk7.com/Skynet/archive/2009/11/03/300946.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://m.tkk7.com/Skynet/comments/commentRss/300946.html</wfw:commentRss><trackback:ping>http://m.tkk7.com/Skynet/services/trackbacks/300946.html</trackback:ping><description><![CDATA[鎴戜滑榪欏氨鏄湁 浼佷笟鎸栨帢涓渶甯哥敤鐨?銆婃祦澶辯敤鎴峰垎鏋愩嬫潵璇存槑錛?br /> <br /> 鏁版嵁鎸栨帢嫻佺▼:<br /> 1. 瀹氫箟涓婚 錛?strong>澶╁晩錛屾垜鍦ㄥ共浠涔堬紒</strong>錛?姝ゆā鍧楃粷澶у鏁頒富瑙傛剰璇嗕笂瀹屾垚錛屾湁灝戦噺瀹㈣楠岃瘉錛?br />   1.1 鏄庣‘涓婚鐢ㄦ埛鍦ㄥ悇鐢ㄦ埛緹や腑鐨勫垎甯?- 嫻佸け鐢ㄦ埛鍦ㄥ悇鐢ㄦ埛緹や腑姣斾緥<br />     涓嶅悓瀹㈡埛緹ょ殑嫻佸け紼嬪害濡傦細鏌愭笭閬擄紝鏌愯蔣浠剁増鏈?欏甸潰甯冨眬錛屽姛鑳界瓑涓昏涓婂幓鍒嗘瀽銆?br />     灝介噺鎶婂獎鍝嶆祦澶辨瘮杈冨ぇ鐨勫洜绱犺緇嗙綏鍒楀嚭鏉?濡傦細 姒傜巼鍒嗗竷錛岄〉闈㈠竷灞鍙樺寲褰卞搷絳?br />   1.2 鏄庣‘涓婚鐢ㄦ埛鐗瑰緛 -  嫻佸け鐢ㄦ埛鐗瑰緛<br />      瀵規祦澶辯敤鎴峰獎鍝嶆瘮杈冨ぇ鐨勫瓧孌靛錛氶噾棰濓紝杞歡鐗堟湰錛堢己灝戞渶闇瑕佺殑鍔熻兘錛?瀹㈡湇瀵歸棶棰樼殑澶勭悊鐨勬椂闂?br />  <br /> <br /> 2. 鏁版嵁閫夋嫨 錛?strong>浠涔堟牱鐨勯夋皯錛岄夊嚭浠涔堟牱鐨勬葷粺</strong>錛?br />    鍦ㄦ妯″潡涓湁涓瘮杈冮毦鎶婃彙鐨勫湴鏂癸細 緇村害瓚婇珮瓚婅兘鍑嗙‘鐨勫畾涔夋暟鎹紝浣嗕篃浼氳秺澶嶆潅搴?銆?br />    浣犲ぇ姒備笉浼氬笇鏈涜姳3澶╁垎鏋愬嚭2澶╁墠鐨勬祦澶辯敤鎴峰惂錛侊紒 :)<br />    2.1 鍒嗗尯鏀墮泦<br />        鍦ㄧ敤鎴鋒祦澶卞垎鏋愪腑錛岃嫢閲囬泦鏃墮棿榪囬暱錛屽彲鑳藉湪嫻佸け鍒ゆ柇鍑烘潵鏃跺鎴峰凡鐒舵祦澶憋紱鑻ラ噰闆嗘椂闂磋繃浜庣揣瀵嗘垨鑰呭疄鏃墮噰闆嗗垯闇瑕佽冭檻榪愯惀鍟嗙幇鏈夌郴緇熺殑鏀拺鑳藉姏銆傚洜姝ゅ鏁版嵁閲囬泦鏃墮棿闂撮殧鐨勮緗樉寰楀挨涓洪噸瑕併?br />    2.2 鍑忓皯鏁版嵁鍣煶<br />    2.3 鍓旈櫎閮ㄥ垎鍐椾綑鏁版嵁<br />        姝ら棿瑕佹敞鎰忕殑鏄湪瀹㈡埛嫻佸け鍒嗘瀽涓婏紝浠庢暟鎹粨搴撲腑閲囬泦鏁版嵁鐨勪富瑕佺洰鐨勬槸璋冩煡瀹㈡埛淇℃伅鐨勫彉鍖栨儏鍐點備竴浜涗笉蹇呰鐨勬暟鎹氨鍘婚櫎鎺夊惂<br /> <br /> <br /> 3. 鍒嗘瀽鏁版嵁 : <strong>鐑韓錛屽緢閲嶈錛?/strong><br />    3.1 鏁版嵁鎶芥牱 <br />        澶氳浜嗭紝鍦ㄨ繖淇℃伅鐖嗙偢鐨勬椂浠o紝鍒浣犳妸涓婄櫨TB鐨勬暟鎹斁鍒板簲鐢ㄥ垎鏋愬簱涓幓錛?br />    3.2 鏁版嵁杞崲<br />        姣斿鏃墮棿鏂歸潰錛氬彲浠ユ妸涓婂崍杞崲涓?1 錛屼腑鍗堣漿鎹負 2 絳夌瓑.渚夸簬鍒嗘瀽<br />    3.3 緙烘崯鏁版嵁澶勭悊<br />    3.4 鏍鋒湰鐢熸垚<br />         寤烘ā鏍鋒湰:涓轟笅涓樁孌靛噯澶?br />         嫻嬭瘯鏍鋒湰錛?瀵規ā鍨嬭繘琛屼慨姝e拰媯楠?br /> <br /> 4. 妯″瀷寤虹珛 : <strong>鎵句釜鍚堝緱鏉ョ殑榪囪繖涓杈堝瓙鍚э紒</strong><br />   瀵規暟鎹繘琛屽垎鏋愬茍鍒╃敤鍚勭鏁版嵁鎸栨帢鎶鏈拰鏂規硶鍦ㄥ涓彲渚涢夋嫨鐨勬ā鍨嬩腑鎵懼嚭鏈浣蟲ā鍨?榪欎釜榪囩▼鏄竴涓驚鐜凱浠g殑榪囩▼.<br />   寤虹珛妯″瀷閫氬父鐢辨暟鎹垎鏋愪笓瀹墮厤鍚堜笟鍔′笓瀹舵潵瀹屾垚<br />   4.1  甯哥敤鐨勬祦澶卞垎鏋愭ā鍨嬩富瑕佹湁  鍐崇瓥鏍?/ 璐濆彾鏂綉緇?/ 紲炵粡緗戠粶絳?br /> <br /> <br /> 5. 妯″瀷鐨勮瘎浼頒笌媯楠?錛?<strong>寮鑺憋紒</strong><br /> <br /> 6. 搴旂敤妯″瀷 錛?<strong>緇堜簬錛岀粨鍑哄ソ鏋滐紙緇撴灉錛夛紒</strong><br /> <br /> <br /> <br /> <br /> $>嫻佸け鍒嗘瀽涓渶瑕佹敞鎰忕殑闂<br />  <br /> >>榪囧害鎶芥牱<br />       鍥藉唴鐢典俊浼佷笟姣忔湀鐨勫鎴鋒祦澶辯巼涓鑸湪1%锝?%宸﹀彸錛屽鏋滅洿鎺ラ噰鐢ㄦ煇縐嶆ā鍨?姣斿鍐崇瓥鏍戙佷漢宸ョ緇忕綉緇滅瓑)鍙兘浼氬洜涓烘暟鎹鐜囧お灝忚屽鑷存ā鍨嬬殑澶辨晥<br />       鍥犳鎴戜滑闇瑕佸姞澶ф祦澶卞鎴峰湪鎬繪牱鏈腑鐨勬瘮渚嬶紝浣嗘槸榪欑榪囧害鎶芥牱蹇呴』璋ㄦ厧灝忓績錛岃鍏呭垎鑰冭檻瀹冪殑璐熼潰鏁堝簲<br />   <br /> >> 妯″瀷鐨勬湁鏁堟?br />    棰勬祴鍑虹粨鏋滐紝浣嗙敤鎴峰凡緇忔祦澶?錛屼富瑕佽鍏蟲敞閲囨牱鏃墮棿璺ㄥ害闂<br />   <br /> >> 妯″瀷鐨勬祦澶卞悗鍒嗘瀽<br />   <span style="color: black; font-family: 瀹嬩綋;">鏁版嵁鎸栨帢鍦ㄥ鎴鋒祦澶辯鐞嗕腑鐨勯噸瑕佸簲鐢ㄤ笉浠呬粎搴斿寘鎷瀹㈡埛嫻? 澶辯殑鎻愬墠棰勮錛岃繕搴斿寘鎷鎴鋒祦澶卞悗鐨勯棶棰樺垎鏋愩傛寜鐓т笉鍚岀殑瀹㈡埛淇℃伅綰害錛屾煡鎵炬渶瀹規槗嫻佸け鐨勫鎴風兢錛屽悓涓氬姟閮ㄩ棬浜哄憳閰嶅悎錛岃緟浠ョ浉鍏寵皟鏌ワ紝鍔涙眰鍙戠幇瀹㈡埛嫻佸け鐨? 鐥囩粨鎵鍦ㄣ傜劧鑰岋紝榪欎竴閮ㄥ垎寰寰鐢變簬榪囧害涓撴敞浜庢寲鎺樻ā鍨嬫湰韜殑鎷熷悎搴﹁屽拷鐣ヤ簡嫻佸け綆$悊鐨勫疄闄呬環鍊兼墍鍦ㄣ?/span><br /> <span style="background-color: #3844ff;"><span style="background-color: #a8adff;"><span style="background-color: #3844ff;"><span style="background-color: #70e5ff;"><span style="background-color: #3844ff;"><span style="background-color: #e0f4ff;"><span style="background-color: #ffffff;"><span style="background-color: #3844ff;"><span style="background-color: #70e5ff;"><span style="background-color: #3844ff;"><span style="background-color: #e0f4ff;"><br /> </span><span><span style="background-color: #3844ff;"><span style="background-color: #a8adff;"><span style="background-color: #3844ff;"><span style="background-color: #70e5ff;"><span style="background-color: #3844ff;"><span style="background-color: #e0f4ff;"><span style="background-color: #ffffff;"></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><strong><br /> <br /> 璋㈣阿 鍚屼簨 鍚?鐨勬寚瀵?榪欎粬鐨勫師璇?杞嚭鏉ヤ緵澶у瀛︿範</strong><br /> 0. 鎴戣寰楀仛bi鍜屾妧鏈渶澶х殑涓鐐瑰樊鍒氨鏄?br />     bi鏄暟鎹鍚戯紝闇姹傜殑浼樺厛綰ц浣庝簬鏁版嵁<br /> <br /> 1. 娌℃暟鎹殑璇濓紝闇姹傚氨娌℃垙浜? <br /> 2. 鎶鏈槸闇姹傚鍚戯紝鍙鏈夐渶姹傦紝鎶鏈熀鏈笂閮借兘鍋氬嚭鏉?br /> 3. 鏁版嵁鐨勫姞杞姐佸姞宸ャ佹竻媧楋紝鍙仛etl錛屽叾瀹炲拰浣犵幇鍦ㄥ仛鐨勪簨鎯呭緢鍍?br /> 4. etl鏄寲鎺橀噷闈炲父閲嶈鐨勪竴閮ㄥ垎<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> 鍙傝冿細<strong><strong><font color="#a00000">鏁版嵁鎸栨帢鍦ㄧ數淇″鎴鋒祦澶卞垎鏋愪腑鐨勫簲鐢?br /> <span style="color: #080000;">http://www.teleinfocn.com/html/2007-02-12/3448.html</span></font></strong></strong><br /> <br /> <br /> <br /> <br /> <img src ="http://m.tkk7.com/Skynet/aggbug/300946.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://m.tkk7.com/Skynet/" target="_blank">鍒樺嚡姣?/a> 2009-11-03 17:44 <a href="http://m.tkk7.com/Skynet/archive/2009/11/03/300946.html#Feedback" target="_blank" style="text-decoration:none;">鍙戣〃璇勮</a></div>]]></description></item><item><title>鏁版嵁鎸栨帢鐮旂┒鍐呭鍜屾湰璐紙杞級http://m.tkk7.com/Skynet/archive/2009/10/22/299411.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Thu, 22 Oct 2009 10:05:00 GMThttp://m.tkk7.com/Skynet/archive/2009/10/22/299411.htmlhttp://m.tkk7.com/Skynet/comments/299411.htmlhttp://m.tkk7.com/Skynet/archive/2009/10/22/299411.html#Feedback1http://m.tkk7.com/Skynet/comments/commentRss/299411.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/299411.html
鏁版嵁鎸栨帢鐮旂┒鍐呭鍜屾湰璐?/strong>   闅忕潃DMKD鐮旂┒閫愭璧板悜娣卞叆錛屾暟鎹寲鎺樺拰鐭ヨ瘑鍙戠幇鐨勭爺絀跺凡緇忓艦鎴愪簡涓夋牴寮哄ぇ鐨勬妧鏈敮鏌憋細鏁版嵁搴撱佷漢宸ユ櫤鑳藉拰 鏁扮悊緇熻銆傚洜姝わ紝KDD澶т細紼嬪簭濮斿憳浼氭浘緇忕敱榪欎笁涓縐戠殑鏉冨▉浜虹墿鍚屾椂鏉ヤ換涓誨腑銆傜洰鍓岲MKD鐨勪富瑕佺爺絀跺唴瀹瑰寘鎷熀紜鐞嗚銆佸彂鐜扮畻娉曘佹暟鎹粨搴撱佸彲瑙嗗寲鎶 鏈佸畾鎬у畾閲忎簰鎹㈡ā鍨嬨佺煡璇嗚〃紺烘柟娉曘佸彂鐜扮煡璇嗙殑緇存姢鍜屽啀鍒╃敤銆佸崐緇撴瀯鍖栧拰闈炵粨鏋勫寲鏁版嵁涓殑鐭ヨ瘑鍙戠幇浠ュ強緗戜笂鏁版嵁鎸栨帢絳夈?

鏁版嵁鎸栨帢鎵鍙戠幇鐨勭煡璇嗘渶甯歌鐨勬湁浠ヤ笅鍥涚被錛? - 騫夸箟鐭ヨ瘑錛圙eneralization錛?/td>   騫夸箟鐭ヨ瘑鎸囩被鍒壒寰佺殑姒傛嫭鎬ф弿榪扮煡璇嗐傛牴鎹暟鎹殑寰鐗規у彂鐜板叾琛ㄥ緛鐨勩佸甫鏈夋櫘閬嶆х殑銆佽緝楂樺眰嬈℃蹇電殑銆佷腑瑙傚拰瀹忚鐨勭煡璇嗭紝鍙嶆槧鍚岀被浜嬬墿鍏卞悓鎬ц川錛屾槸瀵規暟鎹殑姒傛嫭銆佺簿鐐煎拰鎶借薄銆?br />
騫? 涔夌煡璇嗙殑鍙戠幇鏂規硶鍜屽疄鐜版妧鏈湁寰堝錛屽鏁版嵁绔嬫柟浣撱侀潰鍚戝睘鎬х殑褰掔害絳夈傛暟鎹珛鏂逛綋榪樻湁鍏朵粬涓浜涘埆鍚嶏紝濡?#8220;澶氱淮鏁版嵁搴?#8221;銆?#8220;瀹炵幇瑙嗗浘”銆?#8220;OLAP"絳夈傝 鏂規硶鐨勫熀鏈濇兂鏄疄鐜版煇浜涘父鐢ㄧ殑浠d環杈冮珮鐨勮仛闆嗗嚱鏁扮殑璁$畻錛岃濡傝鏁般佹眰鍜屻佸鉤鍧囥佹渶澶у肩瓑錛屽茍灝嗚繖浜涘疄鐜拌鍥懼偍瀛樺湪澶氱淮鏁版嵁搴撲腑銆傛棦鐒跺緢澶氳仛闆嗗嚱鏁伴渶緇? 甯擱噸澶嶈綆楋紝閭d箞鍦ㄥ緇存暟鎹珛鏂逛綋涓瓨鏀鵑鍏堣綆楀ソ鐨勭粨鏋滃皢鑳戒繚璇佸揩閫熷搷搴旓紝騫跺彲鐏墊椿鍦版彁渚涗笉鍚岃搴﹀拰涓嶅悓鎶借薄灞傛涓婄殑鏁版嵁瑙嗗浘銆傚彟涓縐嶅箍涔夌煡璇嗗彂鐜版柟娉? 鏄姞鎷垮ぇSimonFraser澶у鎻愬嚭鐨勯潰鍚戝睘鎬х殑褰掔害鏂規硶銆傝繖縐嶆柟娉曚互綾籗QL璇█琛ㄧず鏁版嵁鎸栨帢鏌ヨ錛屾敹闆嗘暟鎹簱涓殑鐩稿叧鏁版嵁闆嗭紝鐒跺悗鍦ㄧ浉鍏蟲暟鎹泦涓? 搴旂敤涓緋誨垪鏁版嵁鎺ㄥ箍鎶鏈繘琛屾暟鎹帹騫匡紝鍖呮嫭灞炴у垹闄ゃ佹蹇墊爲鎻愬崌銆佸睘鎬ч槇鍊兼帶鍒躲佽鏁板強鍏朵粬鑱氶泦鍑芥暟浼犳挱絳夈?/td>     - 鍏寵仈鐭ヨ瘑錛圓ssociation錛?/td>   瀹冨弽鏄犱竴涓簨浠跺拰鍏朵粬浜嬩歡涔嬮棿渚濊禆鎴栧叧鑱旂殑鐭ヨ瘑銆傚鏋滀袱欏規垨澶氶」灞炴т箣闂村瓨鍦ㄥ叧鑱旓紝閭d箞鍏朵腑涓欏圭殑灞炴у煎氨鍙? 浠ヤ緷鎹叾浠栧睘鎬у艱繘琛岄嫻嬨傛渶涓鴻憲鍚嶇殑鍏寵仈瑙勫垯鍙戠幇鏂規硶鏄疪.Agrawal鎻愬嚭鐨凙priori綆楁硶銆傚叧鑱旇鍒欑殑鍙戠幇鍙垎涓轟袱姝ャ傜涓姝ユ槸榪唬璇嗗埆鎵鏈? 鐨勯綣侀」鐩泦錛岃姹傞綣侀」鐩泦鐨勬敮鎸佺巼涓嶄綆浜庣敤鎴瘋瀹氱殑鏈浣庡鹼紱絎簩姝ユ槸浠庨綣侀」鐩泦涓瀯閫犲彲淇″害涓嶄綆浜庣敤鎴瘋瀹氱殑鏈浣庡肩殑瑙勫垯銆傝瘑鍒垨鍙戠幇鎵鏈夐綣侀」鐩? 闆嗘槸鍏寵仈瑙勫垯鍙戠幇綆楁硶鐨勬牳蹇冿紝涔熸槸璁$畻閲忔渶澶х殑閮ㄥ垎銆?/td>     - 鍒嗙被鐭ヨ瘑(Classification錛咰lustering)   瀹冨弽鏄犲悓綾諱簨鐗╁叡鍚屾ц川鐨勭壒寰佸瀷鐭ヨ瘑鍜屼笉鍚屼簨鐗╀箣闂寸殑宸紓鍨嬬壒寰佺煡璇嗐傛渶涓哄吀鍨嬬殑鍒嗙被鏂規硶鏄熀浜庡喅絳栨爲鐨勫垎綾? 鏂規硶銆傚畠鏄粠瀹炰緥闆嗕腑鏋勯犲喅絳栨爲錛屾槸涓縐嶆湁鎸囧鐨勫涔犳柟娉曘傝鏂規硶鍏堟牴鎹緇冨瓙闆嗭紙鍙堢О涓虹獥鍙o級褰㈡垚鍐崇瓥鏍戙傚鏋滆鏍戜笉鑳藉鎵鏈夊璞$粰鍑烘紜殑鍒嗙被錛岄偅涔? 閫夋嫨涓浜涗緥澶栧姞鍏ュ埌紿楀彛涓紝閲嶅璇ヨ繃紼嬩竴鐩村埌褰㈡垚姝g‘鐨勫喅絳栭泦銆傛渶緇堢粨鏋滄槸涓媯墊爲錛屽叾鍙剁粨鐐規槸綾誨悕錛屼腑闂寸粨鐐規槸甯︽湁鍒嗘灊鐨勫睘鎬э紝璇ュ垎鏋濆搴旇灞炴х殑鏌愪竴鍙? 鑳藉箋傛渶涓哄吀鍨嬬殑鍐崇瓥鏍戝涔犵郴緇熸槸ID3錛屽畠閲囩敤鑷《鍚戜笅涓嶅洖婧瓥鐣ワ紝鑳戒繚璇佹壘鍒頒竴涓畝鍗曠殑鏍戙傜畻娉旵4.5鍜孋5.0閮芥槸ID3鐨勬墿灞曪紝瀹冧滑灝嗗垎綾婚鍩? 浠庣被鍒睘鎬ф墿灞曞埌鏁板煎瀷灞炴с?

鏁版嵁鍒嗙被榪樻湁緇熻銆佺矖緋欓泦錛圧oughSet錛夌瓑鏂規硶銆傜嚎鎬у洖褰掑拰綰挎ц鯨鍒垎鏋愭槸鍏稿瀷鐨勭粺璁℃ā鍨嬨備負闄嶄綆鍐崇瓥鏍戠敓鎴愪唬浠鳳紝浜轟滑榪樻彁鍑轟簡涓縐嶅尯闂村垎綾誨櫒銆傛渶榪戜篃鏈変漢鐮旂┒浣跨敤紲炵粡緗戠粶鏂規硶鍦ㄦ暟鎹簱涓繘琛屽垎綾誨拰瑙勫垯鎻愬彇銆?/td>     - 棰勬祴鍨嬬煡璇嗭紙Prediction錛?/td>   瀹冩牴鎹椂闂村簭鍒楀瀷鏁版嵁錛岀敱鍘嗗彶鐨勫拰褰撳墠鐨勬暟鎹幓鎺ㄦ祴鏈潵鐨勬暟鎹紝涔熷彲浠ヨ涓烘槸浠ユ椂闂翠負鍏抽敭灞炴х殑鍏寵仈鐭ヨ瘑銆?br />
鐩? 鍓嶏紝鏃墮棿搴忓垪棰勬祴鏂規硶鏈夌粡鍏哥殑緇熻鏂規硶銆佺緇忕綉緇滃拰鏈哄櫒瀛︿範絳夈?968騫碆ox鍜孞enkins鎻愬嚭浜嗕竴濂楁瘮杈冨畬鍠勭殑鏃墮棿搴忓垪寤烘ā鐞嗚鍜屽垎鏋愭柟娉曪紝榪欎簺 緇忓吀鐨勬暟瀛︽柟娉曢氳繃寤虹珛闅忔満妯″瀷錛屽鑷洖褰掓ā鍨嬨佽嚜鍥炲綊婊戝姩騫沖潎妯″瀷銆佹眰鍜岃嚜鍥炲綊婊戝姩騫沖潎妯″瀷鍜屽鑺傝皟鏁存ā鍨嬬瓑錛岃繘琛屾椂闂村簭鍒楃殑棰勬祴銆傜敱浜庡ぇ閲忕殑鏃墮棿搴忓垪 鏄潪騫崇ǔ鐨勶紝鍏剁壒寰佸弬鏁板拰鏁版嵁鍒嗗竷闅忕潃鏃墮棿鐨勬帹縐昏屽彂鐢熷彉鍖栥傚洜姝わ紝浠呬粎閫氳繃瀵規煇孌靛巻鍙叉暟鎹殑璁粌錛屽緩绔嬪崟涓鐨勭緇忕綉緇滈嫻嬫ā鍨嬶紝榪樻棤娉曞畬鎴愬噯紜殑棰勬祴浠? 鍔°備負姝わ紝浜轟滑鎻愬嚭浜嗗熀浜庣粺璁″鍜屽熀浜庣簿紜х殑鍐嶈緇冩柟娉曪紝褰撳彂鐜扮幇瀛橀嫻嬫ā鍨嬩笉鍐嶉傜敤浜庡綋鍓嶆暟鎹椂錛屽妯″瀷閲嶆柊璁粌錛岃幏寰楁柊鐨勬潈閲嶅弬鏁幫紝寤虹珛鏂扮殑妯″瀷銆? 涔熸湁璁稿緋葷粺鍊熷姪騫惰綆楁硶鐨勮綆椾紭鍔胯繘琛屾椂闂村簭鍒楅嫻嬨?     - 鍋忓樊鍨嬬煡璇?Deviation)   姝ゅ錛岃繕鍙互鍙戠幇鍏朵粬綾誨瀷鐨勭煡璇嗭紝濡傚亸宸瀷鐭ヨ瘑(Deviation)錛屽畠鏄宸紓鍜屾瀬绔壒渚嬬殑鎻忚堪錛屾彮紺轟簨 鐗╁亸紱誨父瑙勭殑寮傚父鐜拌薄錛屽鏍囧噯綾誨鐨勭壒渚嬶紝鏁版嵁鑱氱被澶栫殑紱葷兢鍊肩瓑銆傛墍鏈夎繖浜涚煡璇嗛兘鍙互鍦ㄤ笉鍚岀殑姒傚康灞傛涓婅鍙戠幇錛屽茍闅忕潃姒傚康灞傛鐨勬彁鍗囷紝浠庡井瑙傚埌涓銆佸埌 瀹忚錛屼互婊¤凍涓嶅悓鐢ㄦ埛涓嶅悓灞傛鍐崇瓥鐨勯渶瑕併?/td>    
鏁版嵁鎸栨帢鐨勫姛鑳?/strong>   鏁版嵁鎸栨帢閫氳繃棰勬祴鏈潵瓚嬪娍鍙婅涓猴紝鍋氬嚭鍓嶆憚鐨勩佸熀浜庣煡璇嗙殑鍐崇瓥銆傛暟鎹寲鎺樼殑鐩爣鏄粠鏁版嵁搴撲腑鍙戠幇闅愬惈鐨勩佹湁鎰忎箟鐨勭煡璇嗭紝涓昏鏈変互涓嬩簲綾誨姛鑳姐?     - 鑷姩棰勬祴瓚嬪娍鍜岃涓?/td>   鏁版嵁鎸栨帢鑷姩鍦ㄥぇ鍨嬫暟鎹簱涓鎵鵑嫻嬫т俊鎭紝浠ュ線闇瑕佽繘琛屽ぇ閲忔墜宸ュ垎鏋愮殑闂濡備粖鍙互榪呴熺洿鎺ョ敱鏁版嵁鏈韓寰楀嚭 緇撹銆備竴涓吀鍨嬬殑渚嬪瓙鏄競鍦洪嫻嬮棶棰橈紝鏁版嵁鎸栨帢浣跨敤榪囧幓鏈夊叧淇冮攢鐨勬暟鎹潵瀵繪壘鏈潵鎶曡祫涓洖鎶ユ渶澶х殑鐢ㄦ埛錛屽叾瀹冨彲棰勬祴鐨勯棶棰樺寘鎷鎶ョ牬浜т互鍙婅瀹氬鎸囧畾浜嬩歡 鏈鍙兘浣滃嚭鍙嶅簲鐨勭兢浣撱?     - 鍏寵仈鍒嗘瀽   鏁版嵁鍏寵仈鏄暟鎹簱涓瓨鍦ㄧ殑涓綾婚噸瑕佺殑鍙鍙戠幇鐨勭煡璇嗐傝嫢涓や釜鎴栧涓彉閲忕殑鍙栧間箣闂村瓨鍦ㄦ煇縐嶈寰嬫э紝灝辯О涓哄叧 鑱斻傚叧鑱斿彲鍒嗕負綆鍗曞叧鑱斻佹椂搴忓叧鑱斻佸洜鏋滃叧鑱斻傚叧鑱斿垎鏋愮殑鐩殑鏄壘鍑烘暟鎹簱涓殣钘忕殑鍏寵仈緗戙傛湁鏃跺茍涓嶇煡閬撴暟鎹簱涓暟鎹殑鍏寵仈鍑芥暟錛屽嵆浣跨煡閬撲篃鏄笉紜畾鐨勶紝 鍥犳鍏寵仈鍒嗘瀽鐢熸垚鐨勮鍒欏甫鏈夊彲淇″害銆?/td>     - 鑱氱被   鏁版嵁搴撲腑鐨勮褰曞彲琚寲鍒嗕負涓緋誨垪鏈夋剰涔夌殑瀛愰泦錛屽嵆鑱氱被銆傝仛綾誨寮轟簡浜轟滑瀵瑰瑙傜幇瀹炵殑璁よ瘑錛屾槸姒傚康鎻忚堪鍜屽亸宸垎 鏋愮殑鍏堝喅鏉′歡銆傝仛綾繪妧鏈富瑕佸寘鎷紶緇熺殑妯″紡璇嗗埆鏂規硶鍜屾暟瀛﹀垎綾誨銆?0騫翠唬鍒濓紝Mchalski鎻愬嚭浜嗘蹇佃仛綾繪妧鏈墳鍏惰鐐規槸錛屽湪鍒掑垎瀵硅薄鏃朵笉浠呰冭檻瀵硅薄 涔嬮棿鐨勮窛紱伙紝榪樿姹傚垝鍒嗗嚭鐨勭被鍏鋒湁鏌愮鍐呮兜鎻忚堪錛屼粠鑰岄伩鍏嶄簡浼犵粺鎶鏈殑鏌愪簺鐗囬潰鎬с?/td>     - 姒傚康鎻忚堪   姒傚康鎻忚堪灝辨槸瀵規煇綾誨璞$殑鍐呮兜榪涜鎻忚堪錛屽茍姒傛嫭榪欑被瀵硅薄鐨勬湁鍏崇壒寰併傛蹇墊弿榪板垎涓虹壒寰佹ф弿榪板拰鍖哄埆鎬ф弿榪幫紝鍓? 鑰呮弿榪版煇綾誨璞$殑鍏卞悓鐗瑰緛錛屽悗鑰呮弿榪頒笉鍚岀被瀵硅薄涔嬮棿鐨勫尯鍒傜敓鎴愪竴涓被鐨勭壒寰佹ф弿榪板彧娑夊強璇ョ被瀵硅薄涓墍鏈夊璞$殑鍏辨с傜敓鎴愬尯鍒ф弿榪扮殑鏂規硶寰堝錛屽鍐崇瓥鏍? 鏂規硶銆侀仐浼犵畻娉曠瓑銆?/td>     - 鍋忓樊媯嫻?/td>   鏁版嵁搴撲腑鐨勬暟鎹父鏈変竴浜涘紓甯歌褰曪紝浠庢暟鎹簱涓嫻嬭繖浜涘亸宸緢鏈夋剰涔夈傚亸宸寘鎷緢澶氭綔鍦ㄧ殑鐭ヨ瘑錛屽鍒嗙被涓殑鍙嶅父瀹炰緥銆佷笉婊¤凍瑙勫垯鐨勭壒渚嬨佽嫻嬬粨鏋滀笌妯″瀷棰勬祴鍊肩殑鍋忓樊銆侀噺鍊奸殢鏃墮棿鐨勫彉鍖栫瓑銆傚亸宸嫻嬬殑鍩烘湰鏂規硶鏄紝瀵繪壘瑙傛祴緇撴灉涓庡弬鐓у間箣闂存湁鎰忎箟鐨勫樊鍒?/td>    
鏁版嵁鎸栨帢甯哥敤鎶鏈?/strong> - 浜哄伐紲炵粡緗戠粶   浠跨収鐢熺悊紲炵粡緗戠粶緇撴瀯鐨勯潪綰垮艦棰勬祴妯″瀷錛岄氳繃瀛︿範榪涜妯″紡璇嗗埆銆?/td>     - 鍐崇瓥鏍?/td>   浠h〃鐫鍐崇瓥闆嗙殑鏍戝艦緇撴瀯銆?/td>     - 閬椾紶綆楁硶   鍩轟簬榪涘寲鐞嗚錛屽茍閲囩敤閬椾紶緇撳悎銆侀仐浼犲彉寮傘佷互鍙婅嚜鐒墮夋嫨絳夎璁℃柟娉曠殑浼樺寲鎶鏈?/td>     - 榪戦偦綆楁硶   灝嗘暟鎹泦鍚堜腑姣忎竴涓褰曡繘琛屽垎綾葷殑鏂規硶銆?/td>     - 瑙勫垯鎺ㄥ   浠庣粺璁℃剰涔変笂瀵規暟鎹腑鐨?#8220;濡傛灉-閭d箞”瑙勫垯榪涜瀵繪壘鍜屾帹瀵箋?

閲囩敤涓婅堪鎶鏈殑鏌愪簺涓撻棬鐨勫垎鏋愬伐鍏峰凡緇忓彂灞曚簡澶х害鍗佸勾鐨勫巻鍙詫紝涓嶈繃榪欎簺宸ュ叿鎵闈㈠鐨勬暟鎹噺閫氬父杈冨皬銆傝岀幇鍦ㄨ繖浜涙妧鏈凡緇忚鐩存帴闆嗘垚鍒拌澶氬ぇ鍨嬬殑宸ヤ笟鏍囧噯鐨勬暟鎹粨搴撳拰鑱旀満鍒嗘瀽緋葷粺涓幓浜嗐?   鎽樿嚜銆婃暟鎹寲鎺樿璁虹粍銆?/td>

]]>
hadoop streaming( hadoop + perl )灝忚瘯http://m.tkk7.com/Skynet/archive/2009/09/25/296420.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Fri, 25 Sep 2009 06:33:00 GMThttp://m.tkk7.com/Skynet/archive/2009/09/25/296420.htmlhttp://m.tkk7.com/Skynet/comments/296420.htmlhttp://m.tkk7.com/Skynet/archive/2009/09/25/296420.html#Feedback0http://m.tkk7.com/Skynet/comments/commentRss/296420.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/296420.html   http://hadoop.apache.org/common/docs/r0.15.2/streaming.html

娉ㄦ剰
  鐩墠 streaming 瀵?linux pipe #涔熷氨鏄?cat |wc -l 榪欐牱鐨勭閬?涓嶆敮鎸侊紝浣嗕笉濡ㄧ鎴戜滑浣跨敤perl,python 琛屽紡鍛戒護錛侊紒
  鍘熻瘽鏄?錛?br />   Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?
    Currently this does not work and gives an "java.io.IOException: Broken pipe" error.
    This is probably a bug that needs to be investigated.
  浣嗗鏋滀綘鏄己鐑堢殑 linux shell pipe 鍙戠儳鍙?錛?鍙傝冧笅闈?br />   $> perl -e 'open( my $fh, "grep -v null tt |sed -n 1,5p |");while ( <$fh> ) {print;} '
     #涓嶈繃鎴戞病嫻嬭瘯閫氳繃 錛侊紒

鐜 錛歨adoop-0.18.3
$> find . -type f -name "*streaming*.jar"
./contrib/streaming/hadoop-0.18.3-streaming.jar


嫻嬭瘯鏁版嵁錛?br />
-bash-3.00$ head tt 
null    false    3702    208100
6005100    false    70    13220
6005127    false    24    4640
6005160    false    25    4820
6005161    false    20    3620
6005164    false    14    1280
6005165    false    37    7080
6005168    false    104    20140
6005169    false    35    6680
6005240    false    169    32140
......


榪愯錛?br />
c1="  perl -ne  'if(/.*\t(.*)/){\$sum+=\$1;}END{print \"\$sum\";}'  "
# 娉ㄦ剰 榪欓噷 $ 瑕佸啓鎴?\$    " 鍐欐垚 \"
echo $c1; # 鎵撳嵃杈撳嚭  perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}'
hadoop jar hadoop-0.18.3-streaming.jar
  
-input file:///data/hadoop/lky/jar/tt 
   -
mapper   "/bin/cat" 
   -
reducer "$c1" 
  
-output file:///tmp/lky/streamingx8


緇撴灉:
cat /tmp/lky/streamingx8/*
1166480

鏈湴榪愯杈撳嚭:
perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}' < tt
1166480

緇撴灉姝g‘!!!!


鍛戒護鑷甫鏂囨。錛?br />
-bash-3.00$ hadoop jar hadoop-0.18.3-streaming.jar -info
09/09/25 14:50:12 ERROR streaming.StreamJob: Missing required option -input
Usage: $HADOOP_HOME
/bin/hadoop [--config dir] jar \
          $HADOOP_HOME
/hadoop-streaming.jar [options]
Options:
  
-input    <path>     DFS input file(s) for the Map step
  
-output   <path>     DFS output directory for the Reduce step
  
-mapper   <cmd|JavaClassName>      The streaming command to run
  
-combiner <JavaClassName> Combiner has to be a Java class
  
-reducer  <cmd|JavaClassName>      The streaming command to run
  
-file     <file>     File/dir to be shipped in the Job jar file
  
-dfs    <h:p>|local  Optional. Override DFS configuration
  
-jt     <h:p>|local  Optional. Override JobTracker configuration
  
-additionalconfspec specfile  Optional.
  
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
  
-outputformat TextOutputFormat(default)|JavaClassName  Optional.
  
-partitioner JavaClassName  Optional.
  
-numReduceTasks <num>  Optional.
  
-inputreader <spec>  Optional.
  
-jobconf  <n>=<v>    Optional. Add or override a JobConf property
  
-cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
  
-mapdebug <path>  Optional. To run this script when a map task fails 
  
-reducedebug <path>  Optional. To run this script when a reduce task fails 
  
-cacheFile fileNameURI
  
-cacheArchive fileNameURI
  
-verbose




]]>
hadoop jython join ( 1 )http://m.tkk7.com/Skynet/archive/2009/09/08/294261.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Tue, 08 Sep 2009 02:39:00 GMThttp://m.tkk7.com/Skynet/archive/2009/09/08/294261.htmlhttp://m.tkk7.com/Skynet/comments/294261.htmlhttp://m.tkk7.com/Skynet/archive/2009/09/08/294261.html#Feedback2http://m.tkk7.com/Skynet/comments/commentRss/294261.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/294261.html 棣栧厛 鏈枃涓殑 hadoop join  鍦ㄥ疄闄呭紑鍙戞病鏈夌敤澶勶紒
濡傛灉鍦ㄥ紑鍙戜腑 璇蜂嬌鐢?cascading  groupby, 榪涜 hadoop join,
鏈枃鍙槸涓烘帰璁ㄥ紕鎳?cascading 瀹炵幇鍋氬噯澶囥?br />

褰撶劧 濡傛灉鏈夋湁浜?hadoop join 榪?璇瘋仈緋繪垜錛屽ぇ瀹朵氦嫻佷笅 錛?br />
鏂囦歡鍙兘闇瑕佺殑涓浜涘弬鑰冿細
hadoop jython ( windows )
jython 錛宩ython 緙栬瘧浠ュ強jar 鍖?
灝戦噺 linux shell


鏈枃浠嬬粛 hadoop 鍙兘浣跨敤鍒扮殑 join 鎺ュ彛嫻嬭瘯 錛屽凡緇忓弬鑰冿細
浣跨敤Hadoop瀹炵幇Inner Join鎿嶄綔鐨勬柟娉曘恌rom娣樺疂銆?/strong>錛歨ttp://labs.chinamobile.com/groups/58_547

涓嬮潰 嫻嬭瘯鍚?錛屾垜榪欏ぇ浣撲笂 瀵?hadoop  join 鐨勬柟寮忔槸榪欐牱鐞嗚В鐨?錛堢寽鎯籌級錛?br /> 鏁版嵁1 ; 鏁版嵁2
job1.map( 鏁版嵁1 ) =錛堜復鏃舵枃浠?錛?gt;  鏂囦歡鏍囩ず1+闇瑕乯oin鍒?nbsp; 鏁版嵁
job2.map( 鏁版嵁2 ) =錛堜復鏃舵枃浠?錛?gt;  鏂囦歡鏍囩ず2+闇瑕乯oin鍒?nbsp; 鏁版嵁

涓存椂鏂囦歡 mapred.join.expr 鐢熸垚
job3.map ->
鏂囦歡鏍囩ず1+闇瑕乯oin鍒?: 鏁版嵁
鏂囦歡鏍囩ず2+闇瑕乯oin鍒?: 鏁版嵁
......
job3.Combiner - >
闇瑕乯oin鍒?: 鏂囦歡鏍囩ず1+鏁版嵁
闇瑕乯oin鍒?: 鏂囦歡鏍囩ず2+鏁版嵁
job3.Reducer->
闇瑕乯oin鍒?: 浣跨敤 java-list > 鐢熸垚
  鏂囦歡2-鍒梮 [  鏁版嵁,鏁版嵁... ]
  鏂囦歡1-鍒梮 [  鏁版嵁,鏁版嵁... ]
鐒跺悗 浣犺繖 left join ,鎴?inner join 鎴?xxx join 閫昏緫 灝辮嚜宸辨潵鍚?br />

緇撴灉闆嗗悎
[root@localhost python]# cat /home/megajobs/del/jobs/tools/hadoop-0.18.3/data/090907/1
1
2
3
4
5
[root@localhost python]# cat /home/megajobs/del/jobs/tools/hadoop-0.18.3/data/090907/2
2
4
3
1

淇敼 ..../hadoop-0.18.3/src/examples/python/compile
#!/usr/bin/env bash

export HADOOP_HOME
=/home/xx/del/jobs/tools/hadoop-0.18.3
export CASCADING_HOME
=/home/xx/del/jobs/tools/cascading-1.0.16-hadoop-0.18.3
export JYTHON_HOME
=/home/xx/del/jobs/tools/jython2.2.1

export CLASSPATH
="$HADOOP_HOME/hadoop-0.18.3-core.jar"                                            

# so that filenames w/ spaces are handled correctly in loops below
IFS=

# add libs to CLASSPATH

for f in $HADOOP_HOME/lib/*.jar; do                                                               
  CLASSPATH
=${CLASSPATH}:$f;
done

for f in $HADOOP_HOME/lib/jetty-ext/*.jar; do
  CLASSPATH
=${CLASSPATH}:$f;
done

for f in $CASCADING_HOME/*.jar; do
  CLASSPATH
=${CLASSPATH}:$f;
done

for f in $CASCADING_HOME/lib/*.jar; do
  CLASSPATH
=${CLASSPATH}:$f;
done


for f in $JYTHON_HOME/*.jar; do
  CLASSPATH
=${CLASSPATH}:$f;
done

# restore ordinary behaviour
unset IFS

/home/xx/del/jobs/tools/jython2.2.1/jythonc -p org.apache.hadoop.examples --j $1.jar  -c $1.py 
/home/xx/del/jobs/tools/hadoop-0.18.3/bin/hadoop jar $1.jar $2 $3 $4 $5 $6 $7 $8 $9 


綆鍗?鏁版嵁 閾炬帴 :
from org.apache.hadoop.fs import Path                                                             
from org.apache.hadoop.io import *                                                                
from org.apache.hadoop.mapred.lib import *                                                        
from org.apache.hadoop.mapred.join  import *                                                      
from org.apache.hadoop.mapred import *                                                            
import sys                                                                                        
import getopt                                                                                     
                                                                                                  
class tMap(Mapper, MapReduceBase):                                                                
        
def map(self, key, value, output, reporter):                                              
                output.collect( Text( str(key) ) , Text( value.toString() ))                      
                                                                                       
                               
def main(args):                                                                                   
        conf 
= JobConf(tMap)                                                                      
        conf.setJobName(
"wordcount")                                                              
                                                                                                  
        conf.setMapperClass( tMap )                                                               

        FileInputFormat.setInputPaths(conf,[ Path(sp) for sp in args[1:-1]])                      
        conf.setOutputKeyClass( Text )
        conf.setOutputValueClass( Text )                                                         

        conf.setOutputPath(Path(args[
-1]))                                                        
        
        JobClient.runJob(conf)                                                                    
        
if __name__ == "__main__":main(sys.argv)     

榪愯
./compile test file:///home/xx/del/jobs/tools/hadoop-0.18.3/data/090907/1 file:///home/xx/del/jobs/tools/hadoop-0.18.3/data/090907/2   file:///home/xx/del/jobs/tools/hadoop-0.18.3/tmp/wc78
緇撴灉:
[xx@localhost wc78]$ cat ../wc78/part-00000
0    1
0    2
2    4
2    2
4    3
4    3
6    1
6    4
8    5


綆鍗曠殑鏁版嵁 join :
from org.apache.hadoop.fs import Path
from org.apache.hadoop.io import *
from org.apache.hadoop.mapred.lib import *
from org.apache.hadoop.mapred.join  import *
from org.apache.hadoop.mapred import *
import sys
import getopt

class tMap(Mapper, MapReduceBase):
        
def map(self, key, value, output, reporter):
                output.collect( Text( str(key) ) , Text( value.toString() ))

def main(args):
        conf 
= JobConf(tMap)
        conf.setJobName(
"wordcount")
        conf.setMapperClass( tMap )

        conf.set("mapred.join.expr", CompositeInputFormat.compose("override",TextInputFormat, args[1:-1] ) )
        conf.setOutputKeyClass( Text )
        conf.setOutputValueClass( Text )

        conf.setInputFormat(CompositeInputFormat)
     
        conf.setOutputPath(Path(args[
-1]))

        JobClient.runJob(conf)

if __name__ == "__main__":main(sys.argv)
        

榪愯緇撴灉 (  ) :
./compile test file:///home/xx/del/jobs/tools/hadoop-0.18.3/data/090907/1 file:///home/xx/del/jobs/tools/hadoop-0.18.3/data/090907/2   file:///home/xx/del/jobs/tools/hadoop-0.18.3/tmp/wc79
[xx@localhost wc78]$ cat ../wc79/part-00000
0    2
2    4
4    3
6    1
8    5













]]>
hadoop jython ( windows )http://m.tkk7.com/Skynet/archive/2009/09/04/293914.html鍒樺嚡姣?/dc:creator>鍒樺嚡姣?/author>Fri, 04 Sep 2009 09:14:00 GMThttp://m.tkk7.com/Skynet/archive/2009/09/04/293914.htmlhttp://m.tkk7.com/Skynet/comments/293914.htmlhttp://m.tkk7.com/Skynet/archive/2009/09/04/293914.html#Feedback0http://m.tkk7.com/Skynet/comments/commentRss/293914.htmlhttp://m.tkk7.com/Skynet/services/trackbacks/293914.htmlhadoop window 鎼緩 鍚?鐢變簬瀵?py 鐨勮娉曞枩嬈?錛屼竴鐩存兂 鎶奾adoop,鏀規垚jython 鐨?
榪欐 鍦?鑷繁鐢佃剳涓?nbsp; 緇堜簬 瀹屾垚,涓嬮潰浠嬬粛榪囩▼:

嫻嬭瘯鐜錛?br /> 渚濈劧鐨?windows + cygwin
hadoop 0.18  # C:/cygwin/home/lky/tools/java/hadoop-0.18.3
jython 2.2.1 # C:/jython2.2.1

鍙傝? PythonWordCount

鍚姩 hadoop 騫跺埌 hdoop_home 涓?br />
# 鍦ㄤ簯鐜涓垱寤?input 鐩綍
$>bin/hadoop dfs -mkdir input

# 鍦?鍖?hadoop 鐨?NOTICE.txt 鎷瘋礉鍒?input 鐩綍涓?/strong>
$>bin/hadoop dfs -copyFromLocal c:/cygwin/home/lky/tools/java/hadoop-0.18.3/NOTICE.txt  hdfs:///user/lky/input

$>cd
src/examples/python

# 鍒涘緩 涓?鑴氭湰 ( jy->jar->hd run  ) 涓姝ュ畬鎴?
# 褰撶劧 鍦?linux 鍐欎釜鑴氭湰姣旇繖 濂界湅 鍛靛懙錛?br />
$>vim run.bat
"C:\Program Files\Java\jdk1.6.0_11\bin\java.exe"  -classpath "C:\jython2.2.1\jython.jar;%CLASSPATH%" org.python.util.jython C:\jython2.2.1\Tools\jythonc\jythonc.py   -p org.apache.hadoop.examples -d -j wc.jar -c %1

sh C:\cygwin\home\lky\tools\java\hadoop-
0.18.3\bin\hadoop jar wc.jar  %2 %3 %4 %5 %6 %7 %8 %9

# 淇敼 jythonc 鎵撳寘 鐜 銆?+hadoop jar
$>vim C:\jython2.2.1\Tools\jythonc\jythonc.py
# Copyright (c) Corporation for National Research Initiatives
# Driver script for jythonc2.  See module main.py for details
import sys,os,glob

for fn in glob.glob('c:/cygwin/home/lky/tools/java/hadoop-0.18.3/*.jar') :sys.path.append(fn)
for fn in glob.glob('c:/jython2.2.1/*.jar') :sys.path.append(fn)
for fn in glob.glob('c:/cygwin/home/lky/tools/java/hadoop-0.18.3/lib/*.jar'
) :sys.path.append(fn)

import main
main.main()

import os
os._exit(0)


# 榪愯
C:/cygwin/home/lky/tools/java/hadoop-0.18.3/src/examples/python>
  run.bat WordCount.py  hdfs:///user/lky/input  file:///c:/cygwin/home/lky/tools/java/hadoop-0.18.3/tmp2




緇撴灉杈撳嚭錛?/strong>
cat c:/cygwin/home/lky/tools/java/hadoop-0.18.3/tmp2/part-00000
(http://www.apache.org/).       1
Apache  1
Foundation      1
Software        1
The     1
This    1
by      1
developed       1
includes        1
product 1
software        1

涓嬮潰閲嶅ご鏉ヤ簡 錛氾紙綆媧佺殑 jy hdoop 浠g爜錛?/strong>
#
#
 Licensed to the Apache Software Foundation (ASF) under one
#
 or more contributor license agreements.  See the NOTICE file
#
 distributed with this work for additional information
#
 regarding copyright ownership.  The ASF licenses this file
#
 to you under the Apache License, Version 2.0 (the
#
 "License"); you may not use this file except in compliance
#
 with the License.  You may obtain a copy of the License at
#
#
     http://www.apache.org/licenses/LICENSE-2.0
#
#
 Unless required by applicable law or agreed to in writing, software
#
 distributed under the License is distributed on an "AS IS" BASIS,
#
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
 See the License for the specific language governing permissions and
#
 limitations under the License.
#

from org.apache.hadoop.fs import Path
from org.apache.hadoop.io import *
from org.apache.hadoop.mapred import *

import sys
import getopt

class WordCountMap(Mapper, MapReduceBase):
    one 
= IntWritable(1)
    
def map(self, key, value, output, reporter):
        
for w in value.toString().split():
            output.collect(Text(w), self.one)

class Summer(Reducer, MapReduceBase):
    
def reduce(self, key, values, output, reporter):
        sum 
= 0
        
while values.hasNext():
            sum 
+= values.next().get()
        output.collect(key, IntWritable(sum))

def printUsage(code):
    
print "wordcount [-m <maps>] [-r <reduces>] <input> <output>"
    sys.exit(code)

def main(args):
    conf 
= JobConf(WordCountMap);
    conf.setJobName(
"wordcount");
 
    conf.setOutputKeyClass(Text);
    conf.setOutputValueClass(IntWritable);
    
    conf.setMapperClass(WordCountMap);        
    conf.setCombinerClass(Summer);
    conf.setReducerClass(Summer);
    
try:
        flags, other_args 
= getopt.getopt(args[1:], "m:r:")
    
except getopt.GetoptError:
        printUsage(
1)
    
if len(other_args) != 2:
        printUsage(
1)
    
    
for f,v in flags:
        
if f == "-m":
            conf.setNumMapTasks(int(v))
        
elif f == "-r":
            conf.setNumReduceTasks(int(v))
    conf.setInputPath(Path(other_args[0]))
    conf.setOutputPath(Path(other_args[
1]))
    JobClient.runJob(conf);

if __name__ == "__main__":
    main(sys.argv)







]]>
主站蜘蛛池模板: 国产在线播放线91免费| 亚洲妇女无套内射精| 一个人看的www免费视频在线观看 一个人免费视频观看在线www | 亚洲美女视频免费| 免费A级毛片在线播放| 亚洲国产精品久久久久网站 | 国产一卡二卡3卡四卡免费| 亚洲美女免费视频| 青青青国产在线观看免费网站| 亚洲人成网站色在线观看| 日韩免费高清一级毛片在线| 亚洲另类无码专区丝袜| 五月婷婷亚洲综合| free哆拍拍免费永久视频| 久久精品国产亚洲麻豆| 99re6免费视频| 亚洲中文字幕无码中文字| 亚洲福利中文字幕在线网址| h片在线观看免费| 亚洲精品无码久久久久久久| 久久午夜免费视频| www亚洲精品久久久乳| 久久亚洲高清观看| AV无码免费永久在线观看| 亚洲αⅴ无码乱码在线观看性色 | 成人性生交大片免费看无遮挡| 337P日本欧洲亚洲大胆精品| 亚洲色大成网站WWW久久九九 | 国产亚洲精品久久久久秋霞| 91福利免费体验区观看区| 亚洲精品日韩一区二区小说| 国产日产亚洲系列最新| 久久免费看黄a级毛片| 老司机午夜精品视频在线观看免费| 久久亚洲中文字幕精品一区| 84pao强力永久免费高清| 国产成人亚洲精品播放器下载| 亚洲AV综合色区无码另类小说| 欧亚精品一区三区免费| 好吊色永久免费视频大全| 2020天堂在线亚洲精品专区|