mapreduce系列(3)----在window端远程提交mr程序运行
2017-11-23 13:26
495 查看
之前讲到windows上跑本地版的mapreduce程序,毫无问题,
但是更进一步,我现在想直接把我的idea上的程序运行在linunx集群上,这样,我的本地就相当于是mapreduce的一个客户端了。
沿着这个思路,我们直接把conf配置如下设置:
运行,发下如下错误:
可以知道是在YarnRunner中shell脚本导致的错误。
单步跟踪源码到YarnRunner中的submitJob()中
appContext的信息如下:
可以看到是把windows上的路径上的”%”和”;”传到linux上了,所以只要这个类在拷贝到自己的工程中修改路径即可(包名和路径名不能有任何变化)
把
修改如下几个地方即可:
第一处:
第二处:
增加上面使用过的getLinx()函数:
还有一个地方需要十分注意的是:
driver类中需要setJar配置绝对路径,因为setJarByclass本质上是依靠hadoop jar这个命令里面的脚本来读取绝对路径的,现在我们的客户端是在windows上,没有运行在linux集群上,所以setJarByclass会报mapper找不到的错误的。
但是更进一步,我现在想直接把我的idea上的程序运行在linunx集群上,这样,我的本地就相当于是mapreduce的一个客户端了。
沿着这个思路,我们直接把conf配置如下设置:
conf.set("mapreduce.framework.name","yarn"); conf.set("yarn.resourcemanager.hostname","mini01"); conf.set("fs.defaultFS","hdfs://mini01:9000/");
运行,发下如下错误:
17/03/17 19:02:22 INFO client.RMProxy: Connecting to ResourceManager at mini01/192.168.153.11:8032 17/03/17 19:02:22 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/03/17 19:02:22 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 17/03/17 19:02:22 INFO input.FileInputFormat: Total input paths to process : 1 17/03/17 19:02:22 INFO mapreduce.JobSubmitter: number of splits:1 17/03/17 19:02:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489496419130_0002 17/03/17 19:02:37 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 17/03/17 19:05:18 INFO impl.YarnClientImpl: Submitted application application_1489496419130_0002 17/03/17 19:09:19 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002 Exception in thread "main" java.io.IOException: Failed to run job : Application application_1489496419130_0002 failed 2 times due to AM Container for appattempt_1489496419130_0002_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://mini01:8088/proxy/application_1489496419130_0002/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1489496419130_0002_02_000001 Exit code: 1 Exception message: /bin/bash: line 0: fg: no job control Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:241) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315) at wc.WordCountRunner.main(WordCountRunner.java:78)
可以知道是在YarnRunner中shell脚本导致的错误。
单步跟踪源码到YarnRunner中的submitJob()中
// Construct necessary information to start the MR AM ApplicationSubmissionContext appContext =createApplicationSubmissionContext(conf, jobSubmitDir, ts);
appContext的信息如下:
application_id { id: 2 cluster_timestamp: 1489496419130 } application_name: "N/A" queue: "default" am_container_spec { localResources { key: "jobSubmitDir/job.splitmetainfo" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.splitmetainfo" } size: 27 timestamp: 1489698635869 type: FILE visibility: APPLICATION } } localResources { key: "jobSubmitDir/job.split" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.split" } size: 112 timestamp: 1489698635836 type: FILE visibility: APPLICATION } } localResources { key: "job.xml" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.xml" } size: 88715 timestamp: 1489698636066 type: FILE visibility: APPLICATION } } tokens: "HDTS\000\000\001\025MapReduceShuffleToken\b\213\023`\302+\213\302`" environment { key: "HADOOP_CLASSPATH" value: "%PWD%;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*;null" } environment { key: "SHELL" value: "/bin/bash" } environment { key: "CLASSPATH" value: "%PWD%;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\lib\\*;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*" } environment { key: "LD_LIBRARY_PATH" value: "%PWD%" } command: "%JAVA_HOME%/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr " application_ACLs { accessType: APPACCESS_VIEW_APP acl: " " } application_ACLs { accessType: APPACCESS_MODIFY_APP acl: " " } } cancel_tokens_when_complete: true maxAppAttempts: 2 resource { memory: 1536 virtual_cores: 1 } applicationType: "MAPREDUCE"
可以看到是把windows上的路径上的”%”和”;”传到linux上了,所以只要这个类在拷贝到自己的工程中修改路径即可(包名和路径名不能有任何变化)
把
org.apache.hadoop.mapred.YARNRunner.java文件原封不动的拷贝到自己的src下,包名和路径名不能有任何变化。
修改如下几个地方即可:
第一处:
// Setup the command to run the AM List<String> vargs = new ArrayList<String>(8); //TODO:注释一下代码 //vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java"); //TODO: tianjun修改的源码 System.out.println(MRApps.crossPlatformifyMREnv(jobConf,Environment.JAVA_HOME)+"/bin/java"); System.out.println("$JAVA_HOME/bin/java"); vargs.add("$JAVA_HOME/bin/java");
第二处:
//TODO: tianjun修改的源码 for (String key : environment.keySet()){ String org = environment.get(key); String linux = getLinux(org); environment.put(key,linux); } // Setup ContainerLaunchContext for AM container ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);
增加上面使用过的getLinx()函数:
//TODO:tianjun 增加 private String getLinux(String org) { StringBuilder sb = new StringBuilder(); int c = 0; for (int i = 0; i < org.length(); i++) { if (org.charAt(i) == '%') { c++; if (c % 2 == 1) { sb.append("$"); } } else { switch (org.charAt(i)) { case ';': sb.append(":"); break; case '\\': sb.append("/"); break; default: sb.append(org.charAt(i)); break; } } } return (sb.toString()); }
还有一个地方需要十分注意的是:
driver类中需要setJar配置绝对路径,因为setJarByclass本质上是依靠hadoop jar这个命令里面的脚本来读取绝对路径的,现在我们的客户端是在windows上,没有运行在linux集群上,所以setJarByclass会报mapper找不到的错误的。
wcjob.setJar("F:/myWorkPlace/java/dubbo/demo/dubbo-demo/mr-demo1/target/mr.demo-1.0-SNAPSHOT.jar"); //如果从本地拷贝,是不行的,这时需要使用setJar //wcjob.setJarByClass(WordCountRunner.class);
相关文章推荐
- mapreduce系列(3)----在window端远程提交mr程序运行
- 在JAVA应用中远程提交MapReduce程序至Hadoop集群运行
- hadoop中mapreduce程序的几种提交运行模式
- MR程序的几种提交运行模式
- 使用Windows下的intellij IDEA 运行MapReduce程序远程调用Hadoop的hdfs(非Maven方法)
- scala编写的Spark程序远程提交到服务器集群上运行
- 简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行
- Windows平台开发Mapreduce程序远程调用运行在Hadoop集群—Yarn调度引擎异常
- 简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行
- MapReduce 编程 系列四 MapReduce例子程序运行
- MapReduce程序的几种提交运行模式
- 在window下远程虚拟机(centos)hadoop运行mapreduce程序
- MapReduce程序打成jar包在远程服务器运行
- MR程序的几种提交运行模式
- win10下将spark的程序提交给远程集群中运行
- 攻城狮在路上(陆)-- 提交运行MapReduce程序到hadoop集群运行
- mac电脑的eclipse把mapreduce程序提交到hadoop2.x集群虚拟机上运行
- 【Python学习系列四】Python程序通过hadoop-streaming提交到Hadoop集群执行MapReduce
- Win系统下用Eclipse中运行远程hadoop MapReduce程序出现Permission denied错误
- MR程序的几种提交运行模式