SparkR的第一个测试例子Spark Pi计算
2015-10-09 09:36
357 查看
安装SparkR颇费周折,网上看到的各种安装方法,其实最终测试都很不好用。可能是国内有些网站被屏蔽的关系吧。
如install_github("amplab-extras/SparkR-pkg", subdir="pkg"),这条命令,就卡在SBT的环节,即使下载了SBT进行安装也是不行。其错误报码是:“Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar”
单独在Spark源码下编译也是无法成功,虽然利用R -e "devtools::install('.')"这样的命令能够生成SparkR的包,但是进行测试的时候,发现网络连接没通过,其核心原因还是因为没有生成sparkr-assembly-0.1.jar,缺少这个当然所有都无法进行联调。
编译完成之后,利用RStudio进行了第一个测试,sparkR进行Spark Pi测试,测试通过,非常开心。
欢迎交流学习. Email : sparkexpert@sina.com
这是在SparkR中输出的结果。
library(SparkR)
[SparkR] Initializing with classpath /home/ndscbigdata/R/x86_64-pc-linux-gnu-library/3.2/SparkR/sparkr-assembly-0.1.jar
>
> #args <- commandArgs(trailing = TRUE)
>
> #if (length(args) < 1) {
> # print("Usage: pi <master> [<slices>]")
> # q("no")
> #}
>
> #sc <- sparkR.init(master="spark://ubuntu-bigdata-5:7077","PiR");
> sc <- sparkR.init(master="local", "PiR")
Launching java with command /usr/lib/jvm/java-8-oracle/bin/java -Xmx512m -cp '/home/ndscbigdata/R/x86_64-pc-linux-gnu-library/3.2/SparkR/sparkr-assembly-0.1.jar:' edu.berkeley.cs.amplab.sparkr.SparkRBackend /tmp/RtmpGq7K9F/backend_port4ae6710a585b
15/10/09 09:31:27 INFO Slf4jLogger: Slf4jLogger started
>
> slices <- ifelse(length(args) > 1, as.integer(args[[2]]), 2)
>
> n <- 100000 * slices
>
> piFunc <- function(elem) {
+ rands <- runif(n = 2, min = -1, max = 1)
+ val <- ifelse((rands[1]^2 + rands[2]^2) < 1, 1.0, 0.0)
+ val
+ }
>
>
> piFuncVec <- function(elems) {
+ message(length(elems))
+ rands1 <- runif(n = length(elems), min = -1, max = 1)
+ rands2 <- runif(n = length(elems), min = -1, max = 1)
+ val <- ifelse((rands1^2 + rands2^2) < 1, 1.0, 0.0)
+ sum(val)
+ }
>
> rdd <- parallelize(sc, 1:n, slices)
> count <- reduce(lapplyPartition(rdd, piFuncVec), sum)
15/10/09 09:31:28 WARN TaskSetManager: Stage 0 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
100000
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.768 s, init = 0.003 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.066 s, write-output = 0.000 s, total = 0.838 s
100000
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.062 s, write-output = 0.000 s, total = 0.069 s
> cat("Pi is roughly", 4.0 * count / n, "\n")
Pi is roughly 3.14792
> cat("Num elements in RDD ", count(rdd), "\n")
15/10/09 09:31:29 WARN TaskSetManager: Stage 1 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.005 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.008 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.007 s
Num elements in RDD 200000
15/10/09 09:31:29 WARN TaskSetManager: Stage 2 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.007 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.006 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.009 s
15/10/09 09:31:29 WARN TaskSetManager: Stage 3 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.001 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.006 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.001 s, total = 0.008 s
如install_github("amplab-extras/SparkR-pkg", subdir="pkg"),这条命令,就卡在SBT的环节,即使下载了SBT进行安装也是不行。其错误报码是:“Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar”
单独在Spark源码下编译也是无法成功,虽然利用R -e "devtools::install('.')"这样的命令能够生成SparkR的包,但是进行测试的时候,发现网络连接没通过,其核心原因还是因为没有生成sparkr-assembly-0.1.jar,缺少这个当然所有都无法进行联调。
编译完成之后,利用RStudio进行了第一个测试,sparkR进行Spark Pi测试,测试通过,非常开心。
欢迎交流学习. Email : sparkexpert@sina.com
这是在SparkR中输出的结果。
library(SparkR)
[SparkR] Initializing with classpath /home/ndscbigdata/R/x86_64-pc-linux-gnu-library/3.2/SparkR/sparkr-assembly-0.1.jar
>
> #args <- commandArgs(trailing = TRUE)
>
> #if (length(args) < 1) {
> # print("Usage: pi <master> [<slices>]")
> # q("no")
> #}
>
> #sc <- sparkR.init(master="spark://ubuntu-bigdata-5:7077","PiR");
> sc <- sparkR.init(master="local", "PiR")
Launching java with command /usr/lib/jvm/java-8-oracle/bin/java -Xmx512m -cp '/home/ndscbigdata/R/x86_64-pc-linux-gnu-library/3.2/SparkR/sparkr-assembly-0.1.jar:' edu.berkeley.cs.amplab.sparkr.SparkRBackend /tmp/RtmpGq7K9F/backend_port4ae6710a585b
15/10/09 09:31:27 INFO Slf4jLogger: Slf4jLogger started
>
> slices <- ifelse(length(args) > 1, as.integer(args[[2]]), 2)
>
> n <- 100000 * slices
>
> piFunc <- function(elem) {
+ rands <- runif(n = 2, min = -1, max = 1)
+ val <- ifelse((rands[1]^2 + rands[2]^2) < 1, 1.0, 0.0)
+ val
+ }
>
>
> piFuncVec <- function(elems) {
+ message(length(elems))
+ rands1 <- runif(n = length(elems), min = -1, max = 1)
+ rands2 <- runif(n = length(elems), min = -1, max = 1)
+ val <- ifelse((rands1^2 + rands2^2) < 1, 1.0, 0.0)
+ sum(val)
+ }
>
> rdd <- parallelize(sc, 1:n, slices)
> count <- reduce(lapplyPartition(rdd, piFuncVec), sum)
15/10/09 09:31:28 WARN TaskSetManager: Stage 0 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
100000
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.768 s, init = 0.003 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.066 s, write-output = 0.000 s, total = 0.838 s
100000
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.062 s, write-output = 0.000 s, total = 0.069 s
> cat("Pi is roughly", 4.0 * count / n, "\n")
Pi is roughly 3.14792
> cat("Num elements in RDD ", count(rdd), "\n")
15/10/09 09:31:29 WARN TaskSetManager: Stage 1 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.005 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.008 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.007 s
Num elements in RDD 200000
15/10/09 09:31:29 WARN TaskSetManager: Stage 2 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.007 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.006 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.009 s
15/10/09 09:31:29 WARN TaskSetManager: Stage 3 contains a task of very large size (391 KB). The maximum recommended task size is 100 KB.
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.001 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.000 s, total = 0.006 s
15/10/09 09:31:29 INFO RRDD: Times: boot = 0.004 s, init = 0.002 s, broadcast = 0.000 s, read-input = 0.001 s, compute = 0.000 s, write-output = 0.001 s, total = 0.008 s
相关文章推荐
- IBeacon技术
- 二极管起什么作用?
- 在Unity3D中实现简单的碰撞检测
- 在没有数据库客户端的情况下查看表接口和数据
- LIVE555基础介绍(转)
- 在windows下查看PCI或PCIE設備的vendor ID和device ID
- php中mysqli_connect_errno和mysqli_connect_error
- js 第二节 动态函数 arguments json
- [Err] 1231 - Variable 'sql_mode' can't be set to the value of 'NULL
- 关于APP接口设计
- angularjs学习
- 独立思考
- JqueryMobile涉及的传参问题
- 判断两个链表是否相交并找出交点
- 怎样设置UITableView的cell之间的距离
- sync
- 条款24:如果所有的参数都需要类型转换,那么请为此采用non-member函数
- 对zabbix discovery(LLD)的理解
- 驼峰式命名法
- 解决XCode7上传报错提示(ERROR ITMS-90168)