Steps in Setting hadoop in GHC
2014-02-26 07:24
429 查看
After spending some time on it, I finally could run the hadoop stuff in GHC. Therefore, I would like to share it with those who are still struggling in setting.
1. longin: ssh andrew_id@ghc09.ghc.andrew.cmu.edu
2. set the .bashrc:
$ ls -a
$ vim .bashrc (then copy the setting from the website http://curtis.ml.cmu.edu/w/courses/index.php/Hadoop_cluster_information)
3. enter bash:
$ bash
$ hadoop fs -copyFromLocal nb.jar /user/andrew_id (This is to copy the file from your local disk to the hadoop, make sure you upload the file to cluster first, could use scp)
$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:./nb.jar
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar -input RCV1.small_test.txt -file nb.jar -output output -mapper "/usr/bin/java -cp ./lib/nb.jar NBTrainMapper" -reducer "/usr/bin/java -cp ./lib/nb.jar NBTrainReducer" (Here the
input file is either your small test file or full dataset, the output folder name should be a folder does not exist, like in AWS.)
Then you could see the running process. You could also view all the material in console. http://ghc03.ghc.andrew.cmu.edu:50075/browseDirectory.jsp?dir=/user&namenodeInfoPort=50070,
you could find your user name here, and after you copy the file to it, you could also see it.
Hope it helps.
1. longin: ssh andrew_id@ghc09.ghc.andrew.cmu.edu
2. set the .bashrc:
$ ls -a
$ vim .bashrc (then copy the setting from the website http://curtis.ml.cmu.edu/w/courses/index.php/Hadoop_cluster_information)
3. enter bash:
$ bash
$ hadoop fs -copyFromLocal nb.jar /user/andrew_id (This is to copy the file from your local disk to the hadoop, make sure you upload the file to cluster first, could use scp)
$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:./nb.jar
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar -input RCV1.small_test.txt -file nb.jar -output output -mapper "/usr/bin/java -cp ./lib/nb.jar NBTrainMapper" -reducer "/usr/bin/java -cp ./lib/nb.jar NBTrainReducer" (Here the
input file is either your small test file or full dataset, the output folder name should be a folder does not exist, like in AWS.)
Then you could see the running process. You could also view all the material in console. http://ghc03.ghc.andrew.cmu.edu:50075/browseDirectory.jsp?dir=/user&namenodeInfoPort=50070,
you could find your user name here, and after you copy the file to it, you could also see it.
Hope it helps.
相关文章推荐
- linux下挂载硬盘!
- unix/linux 系统实现多个进程监听同一个端口
- Linux下使用tmux
- 编译安装apache
- Linux中文件查找——find命令 推荐
- Apache Prefork、Worker和Event三种MPM
- 网站许久不管为什么排名越来越好
- Linux yum仓库的安装
- 文件查找工具find命令总结
- WH_CBT监控有窗体的进程创建
- linux 硬盘空间查看常用命令
- 【PHP】 让PHP执行Linux命令
- 构建高并发高可用的电商平台架构实践
- 腾讯大规模Hadoop集群实践
- 4年成为一个产品线的架构师!我操,我他妈太慢了!
- 打听nofollow标签能力做好网站seo优化
- mount: wrong fs type, bad option, bad superblock on /dev/loop0
- Linux的文件搜索神器-find
- 关于Block的copy和循环引用的问题
- Tomcat安装(也有jdk,eclipse)