亚洲欧美成aⅴ人在线观看,亚洲福利视频一区二区三区,男人的天堂亚洲一区二区三区

Hadoop使用常見問題以及解決方法5

Posted on 2012-04-15 16:37 zljpp 閱讀(212) 評(píng)論(0) 編輯收藏

hadoop的job顯示web
There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system. By default, these are located at [WWW] http://job.tracker.addr:50030/ and [WWW] http://name.node.addr:50070/.

hadoop監(jiān)控
OnlyXP(52388483) 131702
用nagios作告警，ganglia作監(jiān)控圖表即可

status of 255 error
錯(cuò)誤類型：
java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

錯(cuò)誤原因：
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure

split size
FileInputFormat input splits: (詳見《the definitive guide》P190)
mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.
mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.
dfs.block.size: default = 64M, 系統(tǒng)中設(shè)置為128M。
如果設(shè)置 minimum split size > block size, 會(huì)增加塊的數(shù)量。(猜想從其他節(jié)點(diǎn)拿去數(shù)據(jù)的時(shí)候，會(huì)合并block，導(dǎo)致block數(shù)量增多)
如果設(shè)置maximum split size < block size, 會(huì)進(jìn)一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));
其中 minimumSize < blockSize < maximumSize.

sort by value
hadoop 不提供直接的sort by value方法，因?yàn)檫@樣會(huì)降低mapreduce性能。
但可以用組合的辦法來實(shí)現(xiàn)，具體實(shí)現(xiàn)方法見《the definitive guide》, P250
基本思想：
1. 組合key/value作為新的key；
2. 重載partitioner，根據(jù)old key來分割；
conf.setPartitionerClass(FirstPartitioner.class);
3. 自定義keyComparator：先根據(jù)old key排序，再根據(jù)old value排序；
conf.setOutputKeyComparatorClass(KeyComparator.class);
4. 重載GroupComparator, 也根據(jù)old key 來組合； conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的處理
對(duì)于一系列的small files作為input file，會(huì)降低hadoop效率。
有3種方法可以將small file合并處理：
1. 將一系列的small files合并成一個(gè)sequneceFile，加快mapreduce速度。
詳見WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194
2. 使用CombineFileInputFormat集成FileinputFormat，但是未實(shí)現(xiàn)過；
3. 使用hadoop archives(類似打包)，減少小文件在namenode中的metadata內(nèi)存消耗。(這個(gè)方法不一定可行，所以不建議使用)
方法：
將/my/files目錄及其子目錄歸檔成files.har，然后放在/my目錄下
bin/hadoop archive -archiveName files.har /my/files /my

查看files in the archive:
bin/hadoop fs -lsr har://my/files.har

skip bad records
JobConf conf = new JobConf(ProductMR.class);
conf.setJobName("ProductMR");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Product.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setMapOutputCompressorClass(DefaultCodec.class);
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
String objpath = "abc1";
SequenceFileInputFormat.addInputPath(conf, new Path(objpath));
SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);
SkipBadRecords.setAttemptsToStartSkipping(conf, 0);
SkipBadRecords.setSkipOutputPath(conf, new Path("data/product/skip/"));
String output = "abc";
SequenceFileOutputFormat.setOutputPath(conf, new Path(output));
JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 單個(gè)datanode
如果一個(gè)datanode 出現(xiàn)問題，解決之后需要重新加入cluster而不重啟cluster，方法如下：
bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start jobtracker

新用戶注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理

我的家園

導(dǎo)航

常用鏈接

我的收藏

最新評(píng)論

Hadoop使用常見問題以及解決方法5