RHadoop : Reading CSV using rhdfs
2014-03-12 18:49
162 查看
[RHadoop:#104] How to read hdfs file into data frame
http://grokbase.com/t/gg/rhadoop/125qyh30m3/104-how-to-read-hdfs-file-into-data-frame
RHadoop : Reading CSV using rhdfs
Here is a small code snippet on how to read the csv data fromHDFS using rhdfs (RHadoop)
rhdfs uses rJava and the buffersize is limited by the heapsize. By default the size of the buffer is set to 5Mb in rhdfs. The source code for rhdfs can be found here.
HADOOP_CMD environment should point to the hadoop.
Sys.setenv(HADOOP_CMD="/bin/hadoop") library(rhdfs) hdfs.init() f = hdfs.file("fulldata.csv","r",buffersize=104857600) m = hdfs.read(f) c = rawToChar(m) data = read.table(textConnection(c), sep = ",") ## Alternatively You can use hdfs.line.reader() reader = hdfs.line.reader("fulldata.csv") x = reader$read() typeof(x) ## [1] "character"
相关文章推荐
- wamp apache无法启动的解决方法
- Linux之守护进程理解
- Bash . configure permission denied错误
- Linux文件特殊权限之set位权限和粘滞位权限 推荐
- Open Cascade Data Exchange --- STL
- 发现一个在线jpg png转ICO的网站,用着感觉不错
- 在vim中使用shell命令
- 初始化linux的软件预安装环境
- 第五章 Linux系统的远程登录
- centos6.4下apache配置支持ssl的多台虚拟主机
- makefile 和shell文件相互调用
- Linux 终端的快捷键
- centos6.0安装lvs+keepalive
- 每个极客都应该知道的Linux技巧
- Photoshop常用快捷键
- Photoshop图层学习总结
- mips-linux-gcc交叉编译工具链搭建小结【转】
- linux查看用户所在组以及查看某个组中成员的方法
- ARM Linux教程之三:快速入门使用Ubuntu Linux系统
- U盘安装centos6.0_x64系统提示Missing ISO 9660 image