您的位置:首页 > 大数据 > Hadoop

spark load file的几种方式

2016-01-27 20:24 357 查看

spark load file的几种方式:

1、直接导入localfile,而不是HDFS

sc.textFile("file:///path to the file/")

如sc.textFile("file:///home/spark/Desktop/README.md")

注意:

当设置了HADOOP_CONF_DIR的时候,即配置了集群环境的时候,如果直接sc.textFile("path/README.md")

路径会自动变成: hdfs://master:9000/user/spark/README.md
这个时候如果HDFS中没有,就会说,input path does not exist

2、给hdfs 的路径也可以

相关内容:

1、

Spark Quick Start - call to open README.md needs explicit fs prefix

Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so

it can’t find this file. The quick start was written to run on a single machine with an

out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use

the following command:

2、

This has been discussed into spark mailing list, and please refer this mail.

You should use hadoop fs -put <localsrc> ... <dst> copy the file into hdfs:

${HADOOP_COMMON_HOME}/bin/hadoop fs -put /path/to/README.md README.md

于是我 /bin/hadoop -fs -put /home/spark/Desktop/README.md README.md
但这种方法怎么试都不行,说no such file or directory,还在研究
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hdfs spark