您的位置:首页 > 其它

spark安装入门

2015-11-04 16:30 330 查看
下载:

http://spark.apache.org/downloads.html

下载带hadoop版本:spark-1.5.1-bin-hadoop2.6.tgz

启动:

./bin/spark-shell

进入scala shell

scala> sc.parallelize(1 to 1000).count()

测试scala

scala>System.getenv()

scala>System.out.print(new java.io.File(".").getAbsolutePath())

进入python shell

./bin/pyspark

>>> sc.parallelize(range(1000)).count()

测试python

>>>type(10)

## Example Programs

./bin/run-example SparkPi

will run the Pi example locally.

提交集群计算:

MASTER=spark://host:7077 ./bin/run-example SparkPi

文档:

https://taiwansparkusergroup.gitbooks.io/spark-programming-guide-zh-tw/content/quick-start/using-spark-shell.html

example:

http://spark.apache.org/examples.html

scala demo:

val textFile = sc.textFile("README.md")
textFile.count()
textFile.first()
val linesWithSpark = textFile.filter(line => line.contains("Spark"))
textFile.filter(line => line.contains("Spark")).count()
textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
import java.lang.Math
textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
//Spark 能很容易地實現 MapReduce
val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCounts.collect()

//暫存(Caching)
linesWithSpark.cache()
linesWithSpark.count()


idea开发spark(scala)程序步骤:

建立一个sbt工程spark

然后建立一个Non-SBT模块simple。

注意:指定好jdk和scala

创建\src\main\scala目录

新建SimpleApp.scala文件。

构建工具sbt安装使用:
http://niweiwei.iteye.com/blog/1879374
scala文件属于编译型语言,需要编译为class打包为jar才可以给spark运行

而py、r文件可以直接运行。

用spark-submit /spark-shell启动应用程序
https://aiyanbo.gitbooks.io/spark-programming-guide-zh-cn/content/deploying/submitting-applications.html
spark直接运行scala文件

bin/spark-submit --class "SimpleApp" SimpleApp.scala

py可以直接运行:

./bin/spark-submit examples/src/main/python/pi.py

启动spark集群:

./sbin/start-master.sh

master web UI默认的地址是http://localhost:8080。

java demo例子:

打包好就运行

./bin/spark-submit --class com.jiepu.spark.MyJavaSpark bin/spark-0.0.1-SNAPSHOT.jar

package com.jiepu.spark;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;

import java.util.ArrayList;
import java.util.List;

public final class MyJavaSpark {

public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);

int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
int n = 100000 * slices;
List<Integer> l = new ArrayList<Integer>(n);
for (int i = 0; i < n; i++) {
l.add(i);
}

JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

int count = dataSet.map(new Function<Integer, Integer>() {
@Override
public Integer call(Integer integer) {
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
return (x * x + y * y < 1) ? 1 : 0;
}
}).reduce(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer integer, Integer integer2) {
return integer + integer2;
}
});

System.out.println("Pi is roughly " + 4.0 * count / n);

jsc.stop();
}
}


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>

<groupId>com.jiepu</groupId>
<artifactId>spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>spark</name>
<url>http://maven.apache.org</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
</dependency>

</dependencies>
<build>
<plugins>
<!-- compiler插件, 设定JDK版本 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<encoding>utf-8</encoding>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>

</build>
</project>


附上maven+scala开发spark程序例子源码(右键另存为rar):

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: