spark安装入门
2015-11-04 16:30
330 查看
下载:
http://spark.apache.org/downloads.html
下载带hadoop版本:spark-1.5.1-bin-hadoop2.6.tgz
启动:
./bin/spark-shell
进入scala shell
scala> sc.parallelize(1 to 1000).count()
测试scala
scala>System.getenv()
scala>System.out.print(new java.io.File(".").getAbsolutePath())
进入python shell
./bin/pyspark
>>> sc.parallelize(range(1000)).count()
测试python
>>>type(10)
## Example Programs
./bin/run-example SparkPi
will run the Pi example locally.
提交集群计算:
MASTER=spark://host:7077 ./bin/run-example SparkPi
文档:
https://taiwansparkusergroup.gitbooks.io/spark-programming-guide-zh-tw/content/quick-start/using-spark-shell.html
example:
http://spark.apache.org/examples.html
scala demo:
idea开发spark(scala)程序步骤:
建立一个sbt工程spark
然后建立一个Non-SBT模块simple。
注意:指定好jdk和scala
创建\src\main\scala目录
新建SimpleApp.scala文件。
构建工具sbt安装使用:
http://niweiwei.iteye.com/blog/1879374
scala文件属于编译型语言,需要编译为class打包为jar才可以给spark运行
而py、r文件可以直接运行。
用spark-submit /spark-shell启动应用程序
https://aiyanbo.gitbooks.io/spark-programming-guide-zh-cn/content/deploying/submitting-applications.html
spark直接运行scala文件
bin/spark-submit --class "SimpleApp" SimpleApp.scala
py可以直接运行:
./bin/spark-submit examples/src/main/python/pi.py
启动spark集群:
./sbin/start-master.sh
master web UI默认的地址是http://localhost:8080。
java demo例子:
打包好就运行
./bin/spark-submit --class com.jiepu.spark.MyJavaSpark bin/spark-0.0.1-SNAPSHOT.jar
附上maven+scala开发spark程序例子源码(右键另存为rar):
http://spark.apache.org/downloads.html
下载带hadoop版本:spark-1.5.1-bin-hadoop2.6.tgz
启动:
./bin/spark-shell
进入scala shell
scala> sc.parallelize(1 to 1000).count()
测试scala
scala>System.getenv()
scala>System.out.print(new java.io.File(".").getAbsolutePath())
进入python shell
./bin/pyspark
>>> sc.parallelize(range(1000)).count()
测试python
>>>type(10)
## Example Programs
./bin/run-example SparkPi
will run the Pi example locally.
提交集群计算:
MASTER=spark://host:7077 ./bin/run-example SparkPi
文档:
https://taiwansparkusergroup.gitbooks.io/spark-programming-guide-zh-tw/content/quick-start/using-spark-shell.html
example:
http://spark.apache.org/examples.html
scala demo:
val textFile = sc.textFile("README.md") textFile.count() textFile.first() val linesWithSpark = textFile.filter(line => line.contains("Spark")) textFile.filter(line => line.contains("Spark")).count() textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b) import java.lang.Math textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b)) //Spark 能很容易地實現 MapReduce val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCounts.collect() //暫存(Caching) linesWithSpark.cache() linesWithSpark.count()
idea开发spark(scala)程序步骤:
建立一个sbt工程spark
然后建立一个Non-SBT模块simple。
注意:指定好jdk和scala
创建\src\main\scala目录
新建SimpleApp.scala文件。
构建工具sbt安装使用:
http://niweiwei.iteye.com/blog/1879374
scala文件属于编译型语言,需要编译为class打包为jar才可以给spark运行
而py、r文件可以直接运行。
用spark-submit /spark-shell启动应用程序
https://aiyanbo.gitbooks.io/spark-programming-guide-zh-cn/content/deploying/submitting-applications.html
spark直接运行scala文件
bin/spark-submit --class "SimpleApp" SimpleApp.scala
py可以直接运行:
./bin/spark-submit examples/src/main/python/pi.py
启动spark集群:
./sbin/start-master.sh
master web UI默认的地址是http://localhost:8080。
java demo例子:
打包好就运行
./bin/spark-submit --class com.jiepu.spark.MyJavaSpark bin/spark-0.0.1-SNAPSHOT.jar
package com.jiepu.spark; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import java.util.ArrayList; import java.util.List; public final class MyJavaSpark { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2; int n = 100000 * slices; List<Integer> l = new ArrayList<Integer>(n); for (int i = 0; i < n; i++) { l.add(i); } JavaRDD<Integer> dataSet = jsc.parallelize(l, slices); int count = dataSet.map(new Function<Integer, Integer>() { @Override public Integer call(Integer integer) { double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y < 1) ? 1 : 0; } }).reduce(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer integer, Integer integer2) { return integer + integer2; } }); System.out.println("Pi is roughly " + 4.0 * count / n); jsc.stop(); } }
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.jiepu</groupId> <artifactId>spark</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>spark</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.2.0</version> </dependency> </dependencies> <build> <plugins> <!-- compiler插件, 设定JDK版本 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <encoding>utf-8</encoding> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> </project>
附上maven+scala开发spark程序例子源码(右键另存为rar):
相关文章推荐
- 涨姿势!关于可访问性设计师必知的7件事情(下)
- Python中如何实现两个字典合并
- 四则运算java实现
- 为什么说串行比并行快?
- Apache Commons工具集简介(转)
- UIImageView和UIButton区别
- Java微观探源(一)_开篇
- ListSet_对半搜索的递归算法
- 涨姿势! 对于可访问性设计师必知的7件事(上)
- Dell服务器安装卸载监控Open Manage
- CentOS源码安装GitLab汉化版
- Task线程 同时执行多个任务
- Ubuntu 14.04 64bit上编译安装simple-rtmp-server(srs)服务器
- HDU5512 Pagodas(博弈)
- [OpenJudge-NOI]算24 Dfs
- 涨姿势! 对于可访问性设计师必知的7件事(上)
- java的System.getProperty()方法可以获取的值
- Java for Web学习笔记(二):Web Containers
- 字符串编码
- 关于Android项目的目录结构说明