您的位置:首页 > 运维架构

RHadoop : Reading CSV using rhdfs

2014-03-12 18:49 162 查看


[RHadoop:#104] How to read hdfs file into data frame

http://grokbase.com/t/gg/rhadoop/125qyh30m3/104-how-to-read-hdfs-file-into-data-frame


RHadoop : Reading CSV using rhdfs

Here is a small code snippet on how to read the csv data from
HDFS using rhdfs (RHadoop)

rhdfs uses rJava and the buffersize is limited by the heapsize. By default the size of the buffer is set to 5Mb in rhdfs. The source code for rhdfs can be found here.

HADOOP_CMD environment should point to the hadoop.

Sys.setenv(HADOOP_CMD="/bin/hadoop")

library(rhdfs)
hdfs.init()

f = hdfs.file("fulldata.csv","r",buffersize=104857600)
m = hdfs.read(f)
c = rawToChar(m)

data = read.table(textConnection(c), sep = ",")

## Alternatively You can use hdfs.line.reader()

reader = hdfs.line.reader("fulldata.csv")

x = reader$read()
typeof(x)
## [1] "character"
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: