Hadoop streaming C++ MapReduce 求助

2016-07-06 13:31:20 +08:00
 HeartJ

最近在写 C++的 MapReduce 程序,简单的查词 demo 跑过了,

但在实际工作要用的 MR 程序中报错了,提示错误内容如下:

map 100% reduce 26% INFO mapreduce.Job: Task Id : attempt_1467765498879_0091_r_000000_0, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 134 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) map 100% reduce 100%

INFO mapreduce.Job: Job job_1467765498879_0091 failed with state FAILED due to: Task failed task_1467765498879_0091_r_000000 Job failed as tasks failed. failedMaps:0 failedReduces:1

INFO mapreduce.Job: Counters: 39 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=3181370 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1202484763 HDFS: Number of bytes written=0 HDFS: Number of read operations=27 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Failed reduce tasks=4 Killed map tasks=1 Launched map tasks=10 Launched reduce tasks=4 Data-local map tasks=9 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=223454 Total time spent by all reduces in occupied slots (ms)=19038 Total time spent by all map tasks (ms)=111727 Total time spent by all reduce tasks (ms)=19038 Total vcore-milliseconds taken by all map tasks=111727 Total vcore-milliseconds taken by all reduce tasks=19038 Total megabyte-milliseconds taken by all map tasks=228816896 Total megabyte-milliseconds taken by all reduce tasks=19494912 Map-Reduce Framework Map input records=18446925 Map output records=75252 Map output bytes=2037977 Map output materialized bytes=2188535 Input split bytes=954 Combine input records=0 Spilled Records=75252 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=613 CPU time spent (ms)=80580 Physical memory (bytes) snapshot=2821369856 Virtual memory (bytes) snapshot=25988837376 Total committed heap usage (bytes)=2422210560 File Input Format Counters Bytes Read=1202483809 16/07/06 13:31:06 ERROR streaming.StreamJob: Job not successful! Streaming Command Failed!

调试中遇到的错误有 subprocess failed with code 134 和 subprocess failed with code 139 ,

有哪位朋友知道什么原因吗?

请指教一下,谢谢。

4449 次点击
所在节点    Hadoop
5 条回复
yaoyuan7571
2016-07-06 15:08:41 +08:00
可以试着看下 reduce 程序中是否有代码或数据错误导致程序崩溃,减少 reducer 数目,然后在代码中输出一些调试日志,然后在 jobtracker 界面查看日志,确定程序崩溃位置,推荐使用 java 开发
HeartJ
2016-07-06 16:15:55 +08:00
@yaoyuan7571 多谢。确实应该用 Java 写,采用 streaming 方式还是有不便的地方。
ooonme
2016-07-07 09:01:54 +08:00
为什么不上 spark
cljnnn
2016-07-28 10:58:12 +08:00
@ooonme spark 现在支持 C++了?
ooonme
2016-08-06 00:23:13 +08:00
@cljnnn JVM 的世界干嘛用 C 艹,以前都是 java 调 c ,现在 c 调 java ,然后在 native c😜,其实我只用 scala

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/290619

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX