參考:
http://hadoop.apache.org/common/docs/r0.15.2/streaming.html
注意
目前 streaming 對 linux pipe #也就是 cat |wc -l 這樣的管道 不支持,但不妨礙我們使用perl,python 行式命令!!
原話是 :
Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?
Currently this does not work and gives an "java.io.IOException: Broken
pipe" error.
This is probably a bug that needs to be investigated.
但如果你是強烈的 linux shell pipe 發燒友 ! 參考下面
$> perl -e 'open( my $fh, "grep -v null
tt |sed -n 1,5p |");while ( <$fh> ) {print;} '
#不過我沒測試通過 !!
環境 :hadoop-0.18.3
$> find . -type f -name "*streaming*.jar"
./contrib/streaming/hadoop-0.18.3-streaming.jar
測試數據:
-bash-3.00$ head tt
null false 3702 208100
6005100 false 70 13220
6005127 false 24 4640
6005160 false 25 4820
6005161 false 20 3620
6005164 false 14 1280
6005165 false 37 7080
6005168 false 104 20140
6005169 false 35 6680
6005240 false 169 32140
......
運行:
c1=" perl -ne 'if(/.*\t(.*)/){\$sum+=\$1;}END{print \"\$sum\";}' "
# 注意 這里 $ 要寫成 \$ " 寫成 \"
echo $c1; # 打印輸出 perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}'
hadoop jar hadoop-0.18.3-streaming.jar
-input file:///data/hadoop/lky/jar/tt
-mapper "/bin/cat"
-reducer "$c1"
-output file:///tmp/lky/streamingx8
結果:
cat
/tmp/lky/streamingx8/*
1166480
本地運行輸出:
perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}' < tt
1166480
結果正確!!!!
命令自帶文檔:
-bash-3.00$ hadoop jar hadoop-0.18.3-streaming.jar -info
09/09/25 14:50:12 ERROR streaming.StreamJob: Missing required option -input
Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \
$HADOOP_HOME/hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd|JavaClassName> The streaming command to run
-combiner <JavaClassName> Combiner has to be a Java class
-reducer <cmd|JavaClassName> The streaming command to run
-file <file> File/dir to be shipped in the Job jar file
-dfs <h:p>|local Optional. Override DFS configuration
-jt <h:p>|local Optional. Override JobTracker configuration
-additionalconfspec specfile Optional.
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
-outputformat TextOutputFormat(default)|JavaClassName Optional.
-partitioner JavaClassName Optional.
-numReduceTasks <num> Optional.
-inputreader <spec> Optional.
-jobconf <n>=<v> Optional. Add or override a JobConf property
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands
-mapdebug <path> Optional. To run this script when a map task fails
-reducedebug <path> Optional. To run this script when a reduce task fails
-cacheFile fileNameURI
-cacheArchive fileNameURI
-verbose
整理 m.tkk7.com/Good-Game