Running Spark on Mesos[ 安装及运行 ]
2013-01-23 20:17
489 查看
安装scala
解压文档:tar -zxvf scala-2.9.2.tgz将下面语句加入到~/.bashrc 或 .profile
export SCALA_HOME="/opt/scala"
export PATH="${SCALA_HOME}/bin:${JAVA_HOME}/bin:${PATH}"
然后 $ source ~/.bashrc
测试scala安装是否成功
$ scala
安装Spark-0.6.1
Spark requiresScala 2.9.2. You will need to have Scala’s
bindirectory in your
PATH,or you will need to set the
SCALA_HOMEenvironment variable to pointto where you’ve installed Scala. Scala must also be accessible through oneof these
methods on slave nodes on your cluster.
Spark uses
Simple Build Tool, which is bundled with it. To compile the code, go into the top-level Spark directory and run
sbt/sbt package
Testing the Build
Spark comes with a number of sample programs in theexamplesdirectory.To run one of the samples, use
./run <class> <params>in the top-level Spark directory(the
runscript sets up the appropriate
paths and launches that program).For example,
./run spark.examples.SparkPiwill run a sample program that estimates Pi. Each of theexamples prints usage help if no params are given.
Note that all of the sample programs take a
<master>parameter specifying the cluster URLto connect to. This can be a
URL for a distributed cluster,or
localto run locally with one thread, or
localto run locally with N threads. You should start by using
localfor testing.
Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through
./spark-shell. This is a great way to learn Spark.
Running Spark on Mesos
Spark can run on private clusters managed by theApache Mesos resource manager. Follow the steps below to install Mesos and Spark:
Download and build Spark using the instructions
here.
Download Mesos 0.9.0-incubating from a
mirror.
Configure Mesos using the
configurescript, passing the location of your
JAVA_HOMEusing
--with-java-home. Mesos comes with “template” configure scripts for different platforms, such as
configure.macosx,
that you can run. See the README file in Mesos for other options.Note: If you want to run Mesos without installing it into the default paths on your system (e.g. if you don’t have administrative privileges to install it), you should also pass
the
--prefixoption to
configureto tell it where to install. For example, pass
--prefix=/home/user/mesos. By default the prefix is
/usr/local.
Build Mesos using
make, and then install it using
make install.
Create a file called
spark-env.shin Spark’s
confdirectory, by copying
conf/spark-env.sh.template, and add the following lines in it:
export MESOS_NATIVE_LIBRARY=<path to libmesos.so>. This path is usually
<prefix>/lib/libmesos.so(where the prefix is
/usr/localby default). Also, on Mac OS X, the library is called
libmesos.dylib
instead of
.so.
export SCALA_HOME=<path to Scala directory>.
Copy Spark and Mesos to the same paths on all the nodes in the cluster (or, for Mesos,
make installon every node).
Configure Mesos for deployment:
On your master node, edit
<prefix>/var/mesos/deploy/mastersto list your master and
<prefix>/var/mesos/deploy/slavesto list the slaves, where
<prefix>is the prefix where you installed Mesos
(
/usr/localby default).
On all nodes, edit
<prefix>/var/mesos/conf/mesos.confand add the line
master=HOST:5050, where HOST is your master node.
Run
<prefix>/sbin/mesos-start-cluster.shon your master to start Mesos. If all goes well, you should see Mesos’s web UI on port 8080 of the master machine.
See Mesos’s README file for more information on deploying it.
To run a Spark job against the cluster, when you create your
SparkContext, pass the string
mesos://HOST:5050as the first parameter, where
HOSTis the machine running your Mesos master. In
addition, pass the location of Spark on your nodes as the third parameter, and a list of JAR files containing your JAR’s code as the fourth (these will automatically get copied to the workers). For example:
new SparkContext("mesos://HOST:5050", "My Job Name", "/home/user/spark", List("my-job.jar"))
运行SparkKMeans算法在Mesos
启动各个节点的mesos服务,检查WebUI各个slaves有没有挂载上,启动hadoop alongside, 上传keansdata.txt到hdfs上,在master进入spark目录,运行kmeans算法。./run spark.examples.SparkKMeans 192.168.1.130:5050 hdfs://master:9000/user/liu/testdata/kmeansdata.txt 8 2.0
注意添加环境变量
export JAVA_HOME=$HOME/jdk1.7.0_05 export HADOOP_VERSION=1.0.4 export HADOOP_HOME=$HOME/hadoop-$HADOOP_VERSION export SCALA_HOME=$HOME/scala-2.9.2 export MESOS_HOME=$HOME/mesos-0.9.0 export MESOS_NATIVE_LIBRARY=$MESOS_HOME/src/.libs/libmesos.so export SPARK_HOME=$HOME/spark-0.6.1 export LD_LIBRARY_PATH=$MESOS_HOME/src/.libs export CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin
Note:
第八步说明的不是很具体,这里以Spark自带的SparkKMeans.scala为例,如何编译与运行程序。以下步骤都只在master节点上操作即可。可参考sparkprogramming guide
首先生成 Spark和依赖的jar包
(core/target/spark-core-assembly-0.6.0.jar)
sbt/sbt assembly
将此jar包加入到CLASSPATH中
export CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
将下面语句加入到scala程序文件中:
import spark.SparkContext import SparkContext._
编译scala程序
scalac SparkKMeans.scala运行编译好的 SparkKMeans 程序
scala spark.examples.SparkKMeans mesos://192.168.1.130:5050 hdfs://192.168.1.130:9000/dataset/Square-10m.txt 8 2.0
如何写Spark程序
The first thing a Spark program must do is to create aSparkContextobject, which tells Spark how to access a cluster.This is done through the following constructor:
new SparkContext(master, jobName, [sparkHome], [jars])
The
masterparameter is a string specifying aMesos cluster to connect to, or a special “local” string to run in local mode, as described below.
jobName
is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called
sc. Making your own SparkContext will not work. You can set which master the context connects to using
the
MASTERenvironment variable. For example, to run on four cores, use
$ MASTER=local[4] ./spark-shell
相关文章推荐
- spark官方文档之——Running Spark on YARN YARN上运行SPARK
- spark on mesos 两种运行模式
- CaffeOnSpark 安装笔记 可以正常运行
- spark on mesos 两种运行模式
- spark on mesos 两种运行模式
- spark+mesos安装运行
- spark on mesos 两种运行模式
- Spark On YARN 集群安装部署
- spark集群安装部署(spark on yarn)
- Spark on Mesos部署
- Hive on Spark 配置、运行
- 雅虎开源TensorflowOnSpark,Ubuntu配置安装TensorflowOnSpark
- TensorFlowOnSpark安装教程
- Spark On YARN 集群安装部署
- Centos6安装TensorFlow及TensorFlowOnSpark
- 1、Spark的StandAlone模式原理和安装、Spark-on-YARN的理解
- 粗粒度和细粒度的区别(以Spark on Mesos为例)
- Spark 安装及运行第一个程序遇到问题总结
- Spark on Mesos: 搭建Mesos的一些问题
- Apache Spark支持三种分布式部署方式 standalone、spark on mesos和 spark on YARN区别