您的位置:首页 > 运维架构

RHadoop 例子WordCount测试

2015-10-18 16:58 393 查看
安装完RHadoop,当然要进行一下例子测试,看了网上相关的关于wordcount的例子,还是有不少,有些还比较模糊,于是就把自己下载的代码与编译结果记录一下:

欢迎交流学习:sparkexpert@sina.com

library(rmr2)

library(rhdfs)

hdfs.init()

rmr.options(backend = "local")

# Word count --------------------------------------------------------------

ebookLocation <- "/home/ndscbigdata/wofile.txt"

m <- mapreduce(input = ebookLocation,

input.format = "text",

map = function(k, v){

words <- unlist(strsplit(v, split = "[[:space:][:punct:]]"))

words <- tolower(words)

words <- gsub("[0-9]", "", words)

words <- words[words != ""]

wordcount <- table(words)

keyval(

key = names(wordcount),

val = as.numeric(wordcount)

)

},

reduce = function(k, counts){

keyval(key = k,

val = sum(counts))

}

)

# Retrieve results and prepare to plot ------------------------------------

x <- from.dfs(m)

dat <- data.frame(

word = keys(x),

count = values(x)

)

dat <- dat[order(dat$count, decreasing=TRUE), ]

head(dat, 50)

with(head(dat, 25), plot(count, names = word))

其结果呈现在RStudio示例如下:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: