您的位置：首页 > 其它

Running Spark on Mesos[ 安装及运行 ]

2013-01-23 20:17 489 查看

安装scala

解压文档：tar -zxvf scala-2.9.2.tgz

将下面语句加入到~/.bashrc 或 .profile

export SCALA_HOME="/opt/scala"

export PATH="${SCALA_HOME}/bin:${JAVA_HOME}/bin:${PATH}"
然后 $ source ~/.bashrc
测试scala安装是否成功
$ scala

安装Spark-0.6.1

Spark requires
Scala 2.9.2. You will need to have Scala’s

bin

directory in your

PATH

,or you will need to set the

SCALA_HOME

environment variable to pointto where you’ve installed Scala. Scala must also be accessible through oneof these
methods on slave nodes on your cluster.

Spark uses
Simple Build Tool, which is bundled with it. To compile the code, go into the top-level Spark directory and run

sbt/sbt package

Testing the Build

Spark comes with a number of sample programs in the

examples

directory.To run one of the samples, use

./run <class> <params>

in the top-level Spark directory(the

run

script sets up the appropriate
paths and launches that program).For example,

./run spark.examples.SparkPi

will run a sample program that estimates Pi. Each of theexamples prints usage help if no params are given.

Note that all of the sample programs take a

<master>

parameter specifying the cluster URLto connect to. This can be a
URL for a distributed cluster,or

local

to run locally with one thread, or

local

to run locally with N threads. You should start by using

local

for testing.

Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through

./spark-shell

. This is a great way to learn Spark.

Running Spark on Mesos

Spark can run on private clusters managed by the
Apache Mesos resource manager. Follow the steps below to install Mesos and Spark:

Download and build Spark using the instructions
here.
Download Mesos 0.9.0-incubating from a
mirror.
Configure Mesos using the

configure

script, passing the location of your

JAVA_HOME

using

--with-java-home

. Mesos comes with “template” configure scripts for different platforms, such as

configure.macosx

,
that you can run. See the README file in Mesos for other options.Note: If you want to run Mesos without installing it into the default paths on your system (e.g. if you don’t have administrative privileges to install it), you should also pass
the

--prefix

option to

configure

to tell it where to install. For example, pass

--prefix=/home/user/mesos

. By default the prefix is

/usr/local

.
Build Mesos using

make

, and then install it using

make install

.
Create a file called

spark-env.sh

in Spark’s

conf

directory, by copying

conf/spark-env.sh.template

, and add the following lines in it:

export MESOS_NATIVE_LIBRARY=<path to libmesos.so>

. This path is usually

<prefix>/lib/libmesos.so

(where the prefix is

/usr/local

by default). Also, on Mac OS X, the library is called

libmesos.dylib

instead of

.so

export SCALA_HOME=<path to Scala directory>

.

Copy Spark and Mesos to the same paths on all the nodes in the cluster (or, for Mesos,

make install

on every node).
Configure Mesos for deployment:

On your master node, edit

<prefix>/var/mesos/deploy/masters

to list your master and

<prefix>/var/mesos/deploy/slaves

to list the slaves, where

<prefix>

is the prefix where you installed Mesos
(

/usr/local

by default).
On all nodes, edit

<prefix>/var/mesos/conf/mesos.conf

and add the line

master=HOST:5050

, where HOST is your master node.
Run

<prefix>/sbin/mesos-start-cluster.sh

on your master to start Mesos. If all goes well, you should see Mesos’s web UI on port 8080 of the master machine.
See Mesos’s README file for more information on deploying it.

To run a Spark job against the cluster, when you create your

SparkContext

, pass the string

mesos://HOST:5050

as the first parameter, where

HOST

is the machine running your Mesos master. In
addition, pass the location of Spark on your nodes as the third parameter, and a list of JAR files containing your JAR’s code as the fourth (these will automatically get copied to the workers). For example:

new SparkContext("mesos://HOST:5050", "My Job Name", "/home/user/spark", List("my-job.jar"))

运行SparkKMeans算法在Mesos

启动各个节点的mesos服务，检查WebUI各个slaves有没有挂载上，启动hadoop alongside, 上传keansdata.txt到hdfs上，在master进入spark目录，运行kmeans算法。

./run spark.examples.SparkKMeans 192.168.1.130:5050 hdfs://master:9000/user/liu/testdata/kmeansdata.txt 8 2.0

注意添加环境变量

export JAVA_HOME=$HOME/jdk1.7.0_05
export HADOOP_VERSION=1.0.4
export HADOOP_HOME=$HOME/hadoop-$HADOOP_VERSION
export SCALA_HOME=$HOME/scala-2.9.2
export MESOS_HOME=$HOME/mesos-0.9.0
export MESOS_NATIVE_LIBRARY=$MESOS_HOME/src/.libs/libmesos.so
export SPARK_HOME=$HOME/spark-0.6.1
export LD_LIBRARY_PATH=$MESOS_HOME/src/.libs
export CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin

Note:

第八步说明的不是很具体，这里以Spark自带的SparkKMeans.scala为例，如何编译与运行程序。以下步骤都只在master节点上操作即可。可参考spark
programming guide

首先生成 Spark和依赖的jar包

(core/target/spark-core-assembly-0.6.0.jar)

sbt/sbt assembly

将此jar包加入到CLASSPATH中

export CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

将下面语句加入到scala程序文件中:

import spark.SparkContext
import SparkContext._

编译scala程序

scalac SparkKMeans.scala

运行编译好的 SparkKMeans 程序

scala spark.examples.SparkKMeans mesos://192.168.1.130:5050 hdfs://192.168.1.130:9000/dataset/Square-10m.txt 8 2.0

如何写Spark程序

The first thing a Spark program must do is to create a

SparkContext

object, which tells Spark how to access a cluster.This is done through the following constructor:

new SparkContext(master, jobName, [sparkHome], [jars])

The

master

parameter is a string specifying aMesos cluster to connect to, or a special “local” string to run in local mode, as described below.

jobName

is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.

In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called

sc

. Making your own SparkContext will not work. You can set which master the context connects to using
the

MASTER

environment variable. For example, to run on four cores, use

$ MASTER=local[4] ./spark-shell

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航