shell命令行中用sbt打包scala程序为jar包 并发布到集群中测试
2018-02-22 01:36
741 查看
1. sbt安装
下载sbt文件,并在Linux中解压二进制的sbt软件包,这里提供下载地址 https://www.scala-sbt.org/download.html在解压后进入目录,然后执行./bin/sbt,第一次执行时会从网上下载依赖的JAR包,并保存到~/.sbt:
[elon@hadoop ~]$ tar xf sbt-1.1.1.tgz [elon@hadoop ~]$ cd sbt/ [elon@hadoop sbt]$ ./bin/sbt
为了方便使用,可以加入到PATH路径中去:
[elon@hadoop sbt]$ mv bin/ conf/ ~/ [elon@hadoop sbt]$ export PATH="$PATH:$HOME/bin"
2. 编写sbt配置文件
一个典型的Spark程序的sbt配置文件如下,这里将其命名为wordCount.sbt:name := "WordCount" version := "0.1" scalaVersion := "2.11.8" // https://mvnrepository.com/artifact/org.apache.spark/spark-core libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
wordCount.sbt本身也是Scala代码,如果使用了其他组件,可以在libraryDependencies中继续添加。
为了让sbt正常工作,sbt文件和程序源码需要满足一定的目录结构,应该是这样:
[elon@hadoop wordcount]$ find . . ./src ./src/main ./src/main/scala ./src/main/scala/WordCount.scala ./wordCount.sbt
3. 编译、链接、打包
相关命令如下:[elon@hadoop wordcount]$sbt package [info] Updated file /home/elon/workspace/wordcount/project/build.properties: set sbt.version to 1.1.1 [info] Loading project definition from /home/elon/workspace/wordcount/project [info] Updating ProjectRef(uri("file:/home/elon/workspace/wordcount/project/"), "wordcount-build") ... [info] Compiling 1 Scala source to /home/elon/workspace/wordcount/target/scala-2.11/classes ... [info] Done compiling. [info] Packaging /home/elon/workspace/wordcount/target/scala-2.11/wordcount_2.11-0.1.jar ... [info] Done packaging. [success] Total time: 62 s, completed 2018-2-22 1:26:33
初次执行也比较慢,因为需要下载依赖的Spark相关的JAR包。
4. 提交
打包之后,就可以提交至集群上运行了,提交任务的基本形式如下:[elon@hadoop spark]$ ./bin/spark-submit \ --class WordCount \ --master local \ ~/workspace/wordcount/target/scala-2.11/wordcount_2.11-0.1.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/02/22 01:33:27 INFO SparkContext: Running Spark version 2.2.1 18/02/22 01:33:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/22 01:33:28 INFO SparkContext: Submitted application: WordCount 18/02/22 01:33:28 INFO SecurityManager: Changing view acls to: elon 18/02/22 01:33:28 INFO SecurityManager: Changing modify acls to: elon 18/02/22 01:33:28 INFO SecurityManager: Changing view acls groups to: 18/02/22 01:33:28 INFO SecurityManager: Changing modify acls groups to: 18/02/22 01:33:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(elon); groups with view permissions: Set(); users with modify permissions: Set(elon); groups with modify permissions: Set() 18/02/22 01:33:29 INFO Utils: Successfully started service 'sparkDriver' on port 40258. 18/02/22 01:33:29 INFO SparkEnv: Registering MapOutputTracker 18/02/22 01:33:29 INFO SparkEnv: Registering BlockManagerMaster 18/02/22 01:33:29 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/02/22 01:33:29 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/02/22 01:33:30 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-618a12c1-361a-4bd1-b81e-4b523ebad17e 18/02/22 01:33:30 INFO MemoryStore: MemoryStore started with capacity 413.9 MB 18/02/22 01:33:30 INFO SparkEnv: Registering OutputCommitCoordinator 18/02/22 01:33:30 INFO Utils: Successfully started service 'SparkUI' on port 4040. 18/02/22 01:33:31 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.115:4040 18/02/22 01:33:31 INFO SparkContext: Added JAR file:/home/elon/workspace/wordcount/target/scala-2.11/wordcount_2.11-0.1.jar at spark://192.168.1.115:40258/jars/wordcount_2.11-0.1.jar with timestamp 1519234411259 18/02/22 01:33:31 INFO Executor: Starting executor ID driver on host localhost 18/02/22 01:33:31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39799. 18/02/22 01:33:31 INFO NettyBlockTransferService: Server created on 192.168.1.115:39799 18/02/22 01:33:31 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/02/22 01:33:31 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.115, 39799, None) 18/02/22 01:33:31 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.115:39799 with 413.9 MB RAM, BlockManagerId(driver, 192.168.1.115, 39799, None) 18/02/22 01:33:31 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.115, 39799, None) 18/02/22 01:33:31 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.115, 39799, None) 18/02/22 01:33:34 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 236.5 KB, free 413.7 MB) 18/02/22 01:33:35 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 413.7 MB) 18/02/22 01:33:35 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.115:39799 (size: 22.9 KB, free: 413.9 MB) 18/02/22 01:33:35 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:9 18/02/22 01:33:35 INFO FileInputFormat: Total input paths to process : 1 wordCounts: 18/02/22 01:33:35 INFO SparkContext: Starting job: collect at WordCount.scala:14 18/02/22 01:33:36 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:11) 18/02/22 01:33:36 INFO DAGScheduler: Got job 0 (collect at WordCount.scala:14) with 1 output partitions 18/02/22 01:33:36 INFO DAGScheduler: Final stage: ResultStage 1 (collect at WordCount.scala:14) 18/02/22 01:33:36 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/02/22 01:33:36 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/02/22 01:33:36 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:11), which has no missing parents 18/02/22 01:33:36 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.7 KB, free 413.7 MB) 18/02/22 01:33:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.7 KB, free 413.7 MB) 18/02/22 01:33:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.115:39799 (size: 2.7 KB, free: 413.9 MB) 18/02/22 01:33:36 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/02/22 01:33:36 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:11) (first 15 tasks are for partitions Vector(0)) 18/02/22 01:33:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 18/02/22 01:33:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4839 bytes) 18/02/22 01:33:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 18/02/22 01:33:36 INFO Executor: Fetching spark://192.168.1.115:40258/jars/wordcount_2.11-0.1.jar with timestamp 1519234411259 18/02/22 01:33:36 INFO TransportClientFactory: Successfully created connection to /192.168.1.115:40258 after 105 ms (0 ms spent in bootstraps) 18/02/22 01:33:37 INFO Utils: Fetching spark://192.168.1.115:40258/jars/wordcount_2.11-0.1.jar to /tmp/spark-dec8c0aa-5af4-43fb-ab29-a3bd6273e561/userFiles-0ea2f549-1a97-4276-acc6-cf3c07948ccb/fetchFileTemp4497605578776580420.tmp 18/02/22 01:33:37 INFO Executor: Adding file:/tmp/spark-dec8c0aa-5af4-43fb-ab29-a3bd6273e561/userFiles-0ea2f549-1a97-4276-acc6-cf3c07948ccb/wordcount_2.11-0.1.jar to class loader 18/02/22 01:33:37 INFO HadoopRDD: Input split: file:/home/elon/spark/README.md:0+3809 18/02/22 01:33:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1197 bytes result sent to driver 18/02/22 01:33:38 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1510 ms on localhost (executor driver) (1/1) 18/02/22 01:33:38 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/02/22 01:33:38 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:11) finished in 1.594 s 18/02/22 01:33:38 INFO DAGScheduler: looking for newly runnable stages 18/02/22 01:33:38 INFO DAGScheduler: running: Set() 18/02/22 01:33:38 INFO DAGScheduler: waiting: Set(ResultStage 1) 18/02/22 01:33:38 INFO DAGScheduler: failed: Set() 18/02/22 01:33:38 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:12), which has no missing parents 18/02/22 01:33:38 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 413.7 MB) 18/02/22 01:33:38 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1957.0 B, free 413.7 MB) 18/02/22 01:33:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.115:39799 (size: 1957.0 B, free: 413.9 MB) 18/02/22 01:33:38 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 18/02/22 01:33:38 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:12) (first 15 tasks are for partitions Vector(0)) 18/02/22 01:33:38 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 18/02/22 01:33:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, ANY, 4621 bytes) 18/02/22 01:33:38 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 18/02/22 01:33:38 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 18/02/22 01:33:38 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 31 ms 18/02/22 01:33:38 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 7866 bytes result sent to driver 18/02/22 01:33:38 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 318 ms on localhost (executor driver) (1/1) 18/02/22 01:33:38 INFO DAGScheduler: ResultStage 1 (collect at WordCount.scala:14) finished in 0.316 s 18/02/22 01:33:38 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/02/22 01:33:38 INFO DAGScheduler: Job 0 finished: collect at WordCount.scala:14, took 3.083995 s (package,1) (For,3) (Programs,1) (processing.,1) (Because,1) (The,1) (page](http://spark.apache.org/documentation.html).,1) (cluster.,1) (its,1) ([run,1) (than,1) (APIs,1) (have,1) (Try,1) (computation,1) (through,1) (several,1) (This,2) (graph,1) (Hive,2) (storage,1) (["Specifying,1) (To,2) ("yarn",1) (Once,1) (["Useful,1) (prefer,1) (SparkPi,2) (engine,1) (version,1) (file,1) (documentation,,1) (processing,,1) (the,24) (are,1) (systems.,1) (params,1) (not,1) (different,1) (refer,2) (Interactive,2) (R,,1) (given.,1) (if,4) (build,4) (when,1) (be,2) (Tests,1) (Apache,1) (thread,1) (programs,,1) (including,4) (./bin/run-example,2) (Spark.,1) (package.,1) (1000).count(),1) (Versions,1) (HDFS,1) (Data.,1) (>>>,1) (Maven,1) (programming,1) (Testing,1) (module,,1) (Streaming,1) (environment,1) (run:,1) (Developer,1) (clean,1) (1000:,2) (rich,1) (GraphX,1) (Please,4) (is,6) (guide](http://spark.apache.org/contributing.html),1) (run,7) (URL,,1) (threads.,1) (same,1) (MASTER=spark://host:7077,1) (on,7) (built,1) (against,1) ([Apache,1) (tests,2) (examples,2) (at,2) (optimized,1) (3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).,1) (usage,1) (development,1) (Maven,,1) (graphs,1) (talk,1) (Shell,2) (class,2) (abbreviated,1) (using,5) (directory.,1) (README,1) (computing,1) (overview,1) (`examples`,2) (example:,1) (##,9) (N,1) (set,2) (use,3) (Hadoop-supported,1) (running,1) (find,1) (contains,1) (project,1) (Pi,1) (need,1) (or,3) (Big,1) (high-level,1) (Java,,1) (uses,1) (<class>,1) (Hadoop,,2) (available,1) (requires,1) ((You,1) (more,1) (see,3) (Documentation,1) (of,5) (tools,1) (using:,1) (cluster,2) (must,1) (supports,2) (built,,1) (tests](http://spark.apache.org/developer-tools.html#individual-tests).,1) (system,1) (build/mvn,1) (Hadoop,3) (this,1) (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1) (particular,2) (Python,2) (Spark,16) (general,3) (YARN,,1) (pre-built,1) ([Configuration,1) (locally,2) (library,1) (A,1) (locally.,1) (sc.parallelize(1,1) (only,1) (Configuration,1) (following,2) (basic,1) (#,1) (changed,1) (More,1) (which,2) (learning,,1) (first,1) (./bin/pyspark,1) (also,4) (info,1) (should,2) (for,12) ([params]`.,1) (documentation,3) ([project,1) (mesos://,1) (Maven](http://maven.apache.org/).,1) (setup,1) (<http://spark.apache.org/>,1) (latest,1) (your,1) (MASTER,1) (example,3) (["Parallel,1) (scala>,1) (DataFrames,,1) (provides,1) (configure,1) (distributions.,1) (can,7) (About,1) (instructions.,1) (do,2) (easiest,1) (no,1) (project.,1) (how,3) (`./bin/run-example,1) (started,1) (Note,1) (by,1) (individual,1) (spark://,1) (It,2) (tips,,1) (Scala,2) (Alternatively,,1) (an,4) (variable,1) (submit,1) (-T,1) (machine,1) (thread,,1) (them,,1) (detailed,2) (stream,1) (And,1) (distribution,1) (review,1) (return,2) (Thriftserver,1) (developing,1) (./bin/spark-shell,1) ("local",1) (start,1) (You,4) (Spark](#building-spark).,1) (one,3) (help,1) (with,4) (print,1) (Spark"](http://spark.apache.org/docs/latest/building-spark.html).,1) (data,1) (Contributing,1) (in,6) (-DskipTests,1) (downloaded,1) (versions,1) (online,1) (Guide](http://spark.apache.org/docs/latest/configuration.html),1) (builds,1) (comes,1) (Tools"](http://spark.apache.org/developer-tools.html).,1) ([building,1) (Python,,2) (Many,1) (building,2) (Running,1) (from,1) (way,1) (Online,1) (site,,1) (other,1) (Example,1) ([Contribution,1) (analysis.,1) (sc.parallelize(range(1000)).count(),1) (you,4) (runs.,1) (Building,1) (higher-level,1) (protocols,1) (guidance,2) (a,8) (guide,,1) (name,1) (fast,1) (SQL,2) (that,2) (will,1) (IDE,,1) (to,17) (get,1) (,71) (information,1) (core,1) (web,1) ("local ",1) (programs,2) (option,1) (MLlib,1) (["Building,1) (contributing,1) (shell:,2) (instance:,1) (Scala,,1) (and,9) (command,,2) (package.),1) (./dev/run-tests,1) (sample,1) 18/02/22 01:33:38 INFO SparkContext: Invoking stop() from shutdown hook 18/02/22 01:33:38 INFO SparkUI: Stopped Spark web UI at http://192.168.1.115:4040 18/02/22 01:33:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/02/22 01:33:38 INFO MemoryStore: MemoryStore cleared 18/02/22 01:33:38 INFO BlockManager: BlockManager stopped 18/02/22 01:33:38 INFO BlockManagerMaster: BlockManagerMaster stopped 18/02/22 01:33:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/02/22 01:33:38 INFO SparkContext: Successfully stopped SparkContext 18/02/22 01:33:38 INFO ShutdownHookManager: Shutdown hook called 18/02/22 01:33:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-dec8c0aa-5af4-43fb-ab29-a3bd6273e561
相关文章推荐
- idea利用sbt打包scala程序为jar包 并发布到集群中测试
- [raw]命令行打包发布jar程序
- ios程序发布测试打包备忘
- sbt的安装以及用sbt编译打包scala编写的spark程序
- iOS真机调试、打包测试、发布程序常见问题(不断更新ing)
- Ubuntu系统下的Hadoop集群(2)_使用命令行编译打包运行自己的MapReduce程序
- iOS程序发布测试4-打包(Arc…
- ios程序发布测试打包
- iOS程序发布测试4-打包(Archive)发布(share)
- iOS程序发布测试4-打包(Archive)发布(share)
- iOS程序发布测试4-打包(Archive)发布(share)
- iOS程序发布测试4-打包(Archive)发布(share)
- 分别用Java、Scala、spark-shell开发wordcount程序及测试代码
- iOS程序发布测试4-打包(Archive)发布(share)
- iOS程序发布测试4-打包(Archive)发布(share)
- ios程序发布测试打包
- sbt打包Scala写的Spark程序,打包正常,提交运行时提示找不到对应的类
- 编写第一个用scala写的spark任务,用sbt打包成jar,并单机模式下运行
- IDEA 运行调试Scala程序,做成jar包放到集群运行
- java程序打包成jar,运行jar的命令行方式