spark load file的几种方式
2016-01-27 20:24
357 查看
spark load file的几种方式:
1、直接导入localfile,而不是HDFSsc.textFile("file:///path to the file/")
如sc.textFile("file:///home/spark/Desktop/README.md")
注意:
当设置了HADOOP_CONF_DIR的时候,即配置了集群环境的时候,如果直接sc.textFile("path/README.md")
路径会自动变成: hdfs://master:9000/user/spark/README.md
这个时候如果HDFS中没有,就会说,input path does not exist
2、给hdfs 的路径也可以
相关内容:
1、Spark Quick Start - call to open README.md needs explicit fs prefix
Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so
it can’t find this file. The quick start was written to run on a single machine with an
out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use
the following command:
2、
This has been discussed into spark mailing list, and please refer this mail.
You should use hadoop fs -put <localsrc> ... <dst> copy the file into hdfs:
${HADOOP_COMMON_HOME}/bin/hadoop fs -put /path/to/README.md README.md
于是我 /bin/hadoop -fs -put /home/spark/Desktop/README.md README.md
但这种方法怎么试都不行,说no such file or directory,还在研究
相关文章推荐
- Spark RDD API详解(一) Map和Reduce
- 使用spark和spark mllib进行股票预测
- Spark随谈——开发指南(译)
- Spark,一种快速数据分析替代方案
- hadoop的hdfs文件操作实现上传文件到hdfs
- java连接hdfs ha和调用mapreduce jar示例
- java实现将ftp和http的文件直接传送到hdfs
- eclipse 开发 spark Streaming wordCount
- Understanding Spark Caching
- ClassNotFoundException:scala.PreDef$
- Windows 下Spark 快速搭建Spark源码阅读环境
- 在Hadoop2.5.0下利用Java读写HDFS
- HDFS 文件操作
- Spark中将对象序列化存储到hdfs
- 读<王垠:一种新的操作系统设计>
- Spark初探
- Spark Streaming初探
- Spark本地开发环境搭建
- hadoop中RPC通信文件上传原理