您的位置:首页 > 运维架构

在hadoop环境下用spark跑wordcount(没有安装scala)

2016-04-29 23:20 609 查看
在spark和hdfs上运行wordcount:

一、单机单节点安装spark:

1、解压

2、配置conf,cp,, spark-env.sh,写路径,sbin跑动

[root@localhost spark-1.6.1-bin-hadoop1]# cd sbin

[root@localhost sbin]# ls

slaves.sh start-slaves.sh

spark-config.sh start-thriftserver.sh

spark-daemon.sh stop-all.sh

spark-daemons.sh stop-history-server.sh

start-all.sh stop-master.sh

start-history-server.sh stop-mesos-dispatcher.sh

start-master.sh stop-mesos-shuffle-service.sh

start-mesos-dispatcher.sh stop-shuffle-service.sh

start-mesos-shuffle-service.sh stop-slave.sh

start-shuffle-service.sh stop-slaves.sh

start-slave.sh stop-thriftserver.sh

[root@localhost sbin]# start-all.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out

localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out

localhost: failed to launch org.apache.spark.deploy.worker.Worker:

localhost: Spark Command: /usr/lib/jvm/jdk1.7.0_45/bin/java -cp /usr/lib/spark/spark-1.6.1-bin-hadoop1/conf/:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/spark-assembly-1.6.1-hadoop1.2.1.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/hadoop/hadoop-1.2.1/etc/hadoop
-Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost.localdomain:7077

localhost: ========================================

localhost: full log in /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out

[root@localhost sbin]# jps

4035

5268 Master

5354 Jps

5322 Worker

二、

再跑动hadoop

[root@localhost sbin]# cd ..

[root@localhost spark-1.6.1-bin-hadoop1]# start-all.sh

starting namenode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-localhost.localdomain.out

localhost: starting datanode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-localhost.localdomain.out

localhost: starting secondarynamenode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out

starting jobtracker, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-localhost.localdomain.out

localhost: starting tasktracker, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-localhost.localdomain.out

[root@localhost spark-1.6.1-bin-hadoop1]# jps

4035

6364 TaskTracker

6432 Jps

6244 JobTracker

6064 DataNode

5268 Master

6166 SecondaryNameNode

5959 NameNode

5322 Worker

三、上传文档

[root@localhost hadoop-1.2.1]# hadoop fs -copyFromLocal README.txt input

四、

没有装scala,想用纯java环境跑动wordcount

(一)进入/bin/spark-shell

[root@localhost bin]# spark-shell

log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.Groups).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties

To adjust logging level use sc.setLogLevel("INFO")

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 1.6.1

/_/

Using Scala version 2.10.5 (Java HotSpot(TM) Client VM, Java 1.7.0_45)

Type in expressions to have them evaluated.

Type :help for more information.

16/04/29 08:03:24 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.86.135 instead (on interface eth0)

16/04/29 08:03:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

Spark context available as sc.

16/04/29 08:03:57 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

16/04/29 08:04:06 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

16/04/29 08:04:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

16/04/29 08:04:30 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

16/04/29 08:05:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/04/29 08:05:23 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

16/04/29 08:05:26 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)

SQL context available as sqlContext.

scala>

(二)指定运行文档

scala> val file=sc.textFile("hdfs://192.168.86.135:9000/user/root/input/README.txt")

16/04/29 08:07:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes

file: org.apache.spark.rdd.RDD[String] = hdfs://192.168.86.135:9000/user/root/input/README.txt MapPartitionsRDD[1] at textFile at <console>:27

(三)

进行运算输出格式编辑

scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

16/04/29 08:12:31 WARN LoadSnappy: Snappy native library not loaded

count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:29

(四)

输出

scala> count.collect()

res0: Array[(String, Int)] = Array((Hadoop,1), (Commodity,1), (For,1), (this,3), (country,1), (under,1), (it,1), (The,4), (Jetty,1), (Software,2), (Technology,1), (<http://www.wassenaar.org/>,1), (have,1), (http://wiki.apache.org/hadoop/,1), (BIS,1), (classified,1),
(This,1), (following,1), (which,2), (security,1), (See,1), (encryption,3), (Number,1), (export,1), (reside,1), (for,3), ((BIS),,1), (any,1), (at:,2), (software,2), (makes,1), (algorithms.,1), (re-export,2), (latest,1), (your,1), (SSL,1), (the,8), (Administration,1),
(includes,2), (import,,2), (provides,1), (Unrestricted,1), (country's,1), (if,1), (740.13),1), (Commerce,,1), (country,,1), (software.,2), (concerning,1), (laws,,1), (source,1), (possession,,2), (Apache,1), (our,2), (written,1), (as,1), (License,1), (regulations,...

scala>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: