How to run a Spark job using Oozie workflow
2016-04-26 00:00
423 查看
摘要:OozieisaworkflowschedulersystemtomanageApacheHadoopjobs.
OozieWorkflowjobsareDirectedAcyclicalGraphs(DAGs)ofactions.
OozieCoordinatorjobsarerecurrentOozieWorkflowjobstriggeredbytime(frequency)anddataavailabilty.
OozieisintegratedwiththerestoftheHadoopstacksupportingseveraltypesofHadoopjobsoutofthebox(suchasJavamap-reduce,Streamingmap-reduce,Pig,Hive,SqoopandDistcp)aswellassystemspecificjobs(suchasJavaprogramsandshellscripts).
Oozieisascalable,reliableandextensiblesystem.
andthencompliethecodetogetajaroozie-examples.jar.
thejobDAGlookslikebelow:
ThesparkTachyon/containstheapplicationXMLfile,thejob.propertiesfiletosubmitthejobandtheJARfilestheexampleneed.
ThesparkTachyon/directorymustbecopiedtotheuserHOMEdirectoryinHDFS:
TheexampleassumestheJobTrackerisers2.analytics.net:8050andtheNameNodeishdfs://ers2.analytics.net:8020.Iftheactualvaluesaredifferent,thejobpropertiesfilesintheexamplesdirectorymustbeeditedtothecorrectvalues.
TheexampleassumestheAlluxiohasraninthelocal(IntroductiontoAlluxio(formerlyTachyon))
Theinputsfortheexampleisinthehttp://ers2:19999/browse?path=%2FLICENSEdirectory.(youcanchoosetousethehdfsfilesystem)
Theexamplescreateoutputunderthehttp://ers2:19999/browse?path=%2FLICENSE3directory.
Note:Thejob.propertiesfileneedstobealocalfileduringsubmissions,andnotaHDFSpath.
Checktheworkflowjobstatus:
JobID:0000048-160415102017608-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
WorkflowName:SparkCalc
AppPath:hdfs://ers2.analytics.net:8020/user/ambari-qa/examples/apps/sparkTachyon
Status:SUCCEEDED
Run:0
User:ambari-qa
Group:-
Created:2016-04-1916:37GMT
Started:2016-04-1916:37GMT
LastModified:2016-04-1916:38GMT
Ended:2016-04-1916:38GMT
CoordActionID:-
Actions
------------------------------------------------------------------------------------------------------------------------------------
IDStatusExtIDExtStatusErrCode
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@:start:OK-OK-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-nodeOKjob_1460569303137_0082SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-node2OKjob_1460569303137_0083SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@endOK-OK-
------------------------------------------------------------------------------------------------------------------------------------
TochecktheworkflowjobstatusviatheOoziewebconsole,withabrowsergotohttp://localhost:11000/oozie/
https://oozie.apache.org/
https://oozie.apache.org/docs/3.3.1/DG_Examples.html#Local_Oozie_Example
https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
Whatisoozie
OozieisaworkflowschedulersystemtomanageApacheHadoopjobs.OozieWorkflowjobsareDirectedAcyclicalGraphs(DAGs)ofactions.
OozieCoordinatorjobsarerecurrentOozieWorkflowjobstriggeredbytime(frequency)anddataavailabilty.
OozieisintegratedwiththerestoftheHadoopstacksupportingseveraltypesofHadoopjobsoutofthebox(suchasJavamap-reduce,Streamingmap-reduce,Pig,Hive,SqoopandDistcp)aswellassystemspecificjobs(suchasJavaprogramsandshellscripts).
Oozieisascalable,reliableandextensiblesystem.
preparetwosparkjobs(thetwosparkjobdothesamething)
packageorg.apache.oozie.example importorg.apache.spark.{SparkConf,SparkContext} objectSpark1{ defmain(args:Array[String]){ if(args.length<2){ System.err.println("Usage:SparkTachyon<file><file>"); System.exit(1); } valconf=newSparkConf() valsc=newSparkContext(conf) vallines=sc.textFile(args(0)) valresult=lines.map(line=>line+line) result.saveAsTextFile(args(1)); sc.stop(); } } |
packageorg.apache.oozie.example importorg.apache.spark.{SparkConf,SparkContext} objectSpark2{ defmain(args:Array[String]){ if(args.length<2){ System.err.println("Usage:SparkTachyon<file><file>"); System.exit(1); } valconf=newSparkConf() valsc=newSparkContext(conf) vallines=sc.textFile(args(0)) valresult=lines.map(line=>line+line) result.saveAsTextFile(args(1)); sc.stop(); } } |
prepareaworkflowdescriptionfile
<!-- LicensedtotheApacheSoftwareFoundation(ASF)underone ormorecontributorlicenseagreements.SeetheNOTICEfile distributedwiththisworkforadditionalinformation regardingcopyrightownership.TheASFlicensesthisfile toyouundertheApacheLicense,Version2.0(the "License");youmaynotusethisfileexceptincompliance withtheLicense.YoumayobtainacopyoftheLicenseat |
prepareajob.propertiesfile
# #LicensedtotheApacheSoftwareFoundation(ASF)underone #ormorecontributorlicenseagreements.SeetheNOTICEfile #distributedwiththisworkforadditionalinformation #regardingcopyrightownership.TheASFlicensesthisfile #toyouundertheApacheLicense,Version2.0(the #"License");youmaynotusethisfileexceptincompliance #withtheLicense.YoumayobtainacopyoftheLicenseat # # |
putthejarandthetwofilesinafolder
+---sparkTachyon |+---lib ||+---alluxio-core-client-1.1.0-SNAPSHOT-jar-with-dependencies.jar ||+---guava-11.0.2.jar ||+---oozie-examples.jar |+---job.properties |\---workflow.xml |
ThesparkTachyon/directorymustbecopiedtotheuserHOMEdirectoryinHDFS:
$hadoopfs-putsparkTachyon/sparkTachyon
Runningtheexample
AddOoziebin/totheenvironmentPATH.TheexampleassumestheJobTrackerisers2.analytics.net:8050andtheNameNodeishdfs://ers2.analytics.net:8020.Iftheactualvaluesaredifferent,thejobpropertiesfilesintheexamplesdirectorymustbeeditedtothecorrectvalues.
TheexampleassumestheAlluxiohasraninthelocal(IntroductiontoAlluxio(formerlyTachyon))
Theinputsfortheexampleisinthe
Theexamplescreateoutputunderthe
Note:Thejob.propertiesfileneedstobealocalfileduringsubmissions,andnotaHDFSpath.
Howtorunanexampleapplication:
$ooziejob-ooziehttp://localhost:11000/oozie-configsparkTachyon/job.properties-run
. job:0000048-160415102017608-oozie-oozi-W
Checktheworkflowjobstatus:
$ooziejob-ooziehttp://localhost:11000/oozie-info0000048-160415102017608-oozie-oozi-W
.
JobID:0000048-160415102017608-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
WorkflowName:SparkCalc
AppPath:hdfs://ers2.analytics.net:8020/user/ambari-qa/examples/apps/sparkTachyon
Status:SUCCEEDED
Run:0
User:ambari-qa
Group:-
Created:2016-04-1916:37GMT
Started:2016-04-1916:37GMT
LastModified:2016-04-1916:38GMT
Ended:2016-04-1916:38GMT
CoordActionID:-
Actions
------------------------------------------------------------------------------------------------------------------------------------
IDStatusExtIDExtStatusErrCode
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@:start:OK-OK-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-nodeOKjob_1460569303137_0082SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-node2OKjob_1460569303137_0083SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
TochecktheworkflowjobstatusviatheOoziewebconsole,withabrowsergoto
ReferenceLinks
相关文章推荐
- #Note# Analyzing Twitter Data with Apache Hadoo...
- oozie上传实例解读
- 【新手入门】Oozie workflow如何在action之间传递参数
- Hadoop: Hadoop oozie main sub workflow.xml configuration
- oozie(4.1.0)架构及二次开发流程
- oozie 编程方式的工作流
- OOZIE
- oozie 介绍
- oozie下使用hive UDF的惨痛教训
- How to access local directory with script executed in oozie?
- Oozie中Hive action配置时的注意事项
- Oozie配置Hue
- Oozie配置数据库
- Oozie MapReduce Action配置的要点
- 修改oozie action node name的长度限制
- Oozie的input-events和done-flag使用
- 一个简单的使用Quartz和Oozie调度作业给大数据计算平台执行
- Apache Oozie 的执行模型理解
- 基于Oozie实现MapReduce作业的自动提交功能