您的位置:首页 > 其它

How to run a Spark job using Oozie workflow

2016-04-26 00:00 423 查看
摘要:OozieisaworkflowschedulersystemtomanageApacheHadoopjobs.

Whatisoozie

OozieisaworkflowschedulersystemtomanageApacheHadoopjobs.

OozieWorkflowjobsareDirectedAcyclicalGraphs(DAGs)ofactions.

OozieCoordinatorjobsarerecurrentOozieWorkflowjobstriggeredbytime(frequency)anddataavailabilty.

OozieisintegratedwiththerestoftheHadoopstacksupportingseveraltypesofHadoopjobsoutofthebox(suchasJavamap-reduce,Streamingmap-reduce,Pig,Hive,SqoopandDistcp)aswellassystemspecificjobs(suchasJavaprogramsandshellscripts).

Oozieisascalable,reliableandextensiblesystem.

preparetwosparkjobs(thetwosparkjobdothesamething)

packageorg.apache.oozie.example

importorg.apache.spark.{SparkConf,SparkContext}

objectSpark1{
defmain(args:Array[String]){
if(args.length<2){
System.err.println("Usage:SparkTachyon<file><file>");
System.exit(1);
}

valconf=newSparkConf()
valsc=newSparkContext(conf)
vallines=sc.textFile(args(0))
valresult=lines.map(line=>line+line)
result.saveAsTextFile(args(1));
sc.stop();
}
}
packageorg.apache.oozie.example

importorg.apache.spark.{SparkConf,SparkContext}

objectSpark2{
defmain(args:Array[String]){
if(args.length<2){
System.err.println("Usage:SparkTachyon<file><file>");
System.exit(1);
}

valconf=newSparkConf()
valsc=newSparkContext(conf)
vallines=sc.textFile(args(0))
valresult=lines.map(line=>line+line)
result.saveAsTextFile(args(1));
sc.stop();
}
}
andthencompliethecodetogetajaroozie-examples.jar.

prepareaworkflowdescriptionfile

<!--
LicensedtotheApacheSoftwareFoundation(ASF)underone
ormorecontributorlicenseagreements.SeetheNOTICEfile
distributedwiththisworkforadditionalinformation
regardingcopyrightownership.TheASFlicensesthisfile
toyouundertheApacheLicense,Version2.0(the
"License");youmaynotusethisfileexceptincompliance
withtheLicense.YoumayobtainacopyoftheLicenseathttp://www.apache.org/licenses/LICENSE-2.0Unlessrequiredbyapplicablelaworagreedtoinwriting,software
distributedundertheLicenseisdistributedonan"ASIS"BASIS,
WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied.
SeetheLicenseforthespecificlanguagegoverningpermissionsand
limitationsundertheLicense.
-->
<workflow-appxmlns='uri:oozie:workflow:0.5'name='Spark'>
<startto='spark-node'/>
<actionname='spark-node'>
<sparkxmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Spark</name>
<class>org.apache.oozie.example.Spark1</class>
<jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/sparkTachyon/lib/oozie-examples.jar</jar>
<arg>alluxio://localhost:19998/LICENSE</arg>
<arg>alluxio://localhost:19998/LICENSE2</arg>
</spark>
<okto="spark-node2"/>
<errorto="fail"/>
</action>

<actionname='spark-node2'>
<sparkxmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Spark2</name>
<class>org.apache.oozie.example.Spark2</class>
<jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/sparkTachyon/lib/oozie-examples.jar</jar>
<arg>alluxio://localhost:19998/LICENSE2</arg>
<arg>alluxio://localhost:19998/LICENSE3</arg>
</spark>
<okto="end"/>
<errorto="fail"/>
</action>
<killname="fail">
<message>Workflowfailed,error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<endname='end'/>
</workflow-app>
thejobDAGlookslikebelow:

prepareajob.propertiesfile

#
#LicensedtotheApacheSoftwareFoundation(ASF)underone
#ormorecontributorlicenseagreements.SeetheNOTICEfile
#distributedwiththisworkforadditionalinformation
#regardingcopyrightownership.TheASFlicensesthisfile
#toyouundertheApacheLicense,Version2.0(the
#"License");youmaynotusethisfileexceptincompliance
#withtheLicense.YoumayobtainacopyoftheLicenseat
#
#http://www.apache.org/licenses/LICENSE-2.0#
#Unlessrequiredbyapplicablelaworagreedtoinwriting,software
#distributedundertheLicenseisdistributedonan"ASIS"BASIS,
#WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied.
#SeetheLicenseforthespecificlanguagegoverningpermissionsand
#limitationsundertheLicense.
#
nameNode=hdfs://ers2.analytics.net:8020
jobTracker=ers2.analytics.net:8050
master=local[*]
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/sparkTachyon

putthejarandthetwofilesinafolder

+---sparkTachyon
|+---lib
||+---alluxio-core-client-1.1.0-SNAPSHOT-jar-with-dependencies.jar
||+---guava-11.0.2.jar
||+---oozie-examples.jar
|+---job.properties
|\---workflow.xml
ThesparkTachyon/containstheapplicationXMLfile,thejob.propertiesfiletosubmitthejobandtheJARfilestheexampleneed.
ThesparkTachyon/directorymustbecopiedtotheuserHOMEdirectoryinHDFS:
$hadoopfs-putsparkTachyon/sparkTachyon


Runningtheexample

AddOoziebin/totheenvironmentPATH.
TheexampleassumestheJobTrackerisers2.analytics.net:8050andtheNameNodeishdfs://ers2.analytics.net:8020.Iftheactualvaluesaredifferent,thejobpropertiesfilesintheexamplesdirectorymustbeeditedtothecorrectvalues.
TheexampleassumestheAlluxiohasraninthelocal(IntroductiontoAlluxio(formerlyTachyon))
Theinputsfortheexampleisinthehttp://ers2:19999/browse?path=%2FLICENSEdirectory.(youcanchoosetousethehdfsfilesystem)
Theexamplescreateoutputunderthehttp://ers2:19999/browse?path=%2FLICENSE3directory.
Note:Thejob.propertiesfileneedstobealocalfileduringsubmissions,andnotaHDFSpath.

Howtorunanexampleapplication:

$ooziejob-ooziehttp://localhost:11000/oozie-configsparkTachyon/job.properties-run

.
job:0000048-160415102017608-oozie-oozi-W


Checktheworkflowjobstatus:

$ooziejob-ooziehttp://localhost:11000/oozie-info0000048-160415102017608-oozie-oozi-W

.


JobID:0000048-160415102017608-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
WorkflowName:SparkCalc
AppPath:hdfs://ers2.analytics.net:8020/user/ambari-qa/examples/apps/sparkTachyon
Status:SUCCEEDED
Run:0
User:ambari-qa
Group:-
Created:2016-04-1916:37GMT
Started:2016-04-1916:37GMT
LastModified:2016-04-1916:38GMT
Ended:2016-04-1916:38GMT
CoordActionID:-
Actions
------------------------------------------------------------------------------------------------------------------------------------
IDStatusExtIDExtStatusErrCode
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@:start:OK-OK-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-nodeOKjob_1460569303137_0082SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@spark-node2OKjob_1460569303137_0083SUCCEEDED-
------------------------------------------------------------------------------------------------------------------------------------
0000048-160415102017608-oozie-oozi-W@endOK-OK-
------------------------------------------------------------------------------------------------------------------------------------

TochecktheworkflowjobstatusviatheOoziewebconsole,withabrowsergotohttp://localhost:11000/oozie/

ReferenceLinks

https://oozie.apache.org/
https://oozie.apache.org/docs/3.3.1/DG_Examples.html#Local_Oozie_Example
https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  oozie