您的位置：首页 > 运维架构

Steps in Setting hadoop in GHC

2014-02-26 07:24 429 查看

After spending some time on it, I finally could run the hadoop stuff in GHC. Therefore, I would like to share it with those who are still struggling in setting.

1. longin: ssh andrew_id@ghc09.ghc.andrew.cmu.edu

2. set the .bashrc:

$ ls -a

$ vim .bashrc (then copy the setting from the website http://curtis.ml.cmu.edu/w/courses/index.php/Hadoop_cluster_information)

3. enter bash:

$ bash

$ hadoop fs -copyFromLocal nb.jar /user/andrew_id (This is to copy the file from your local disk to the hadoop, make sure you upload the file to cluster first, could use scp)

$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:./nb.jar

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar -input RCV1.small_test.txt -file nb.jar -output output -mapper "/usr/bin/java -cp ./lib/nb.jar NBTrainMapper" -reducer "/usr/bin/java -cp ./lib/nb.jar NBTrainReducer" (Here the
input file is either your small test file or full dataset, the output folder name should be a folder does not exist, like in AWS.)

Then you could see the running process. You could also view all the material in console. http://ghc03.ghc.andrew.cmu.edu:50075/browseDirectory.jsp?dir=/user&namenodeInfoPort=50070,
you could find your user name here, and after you copy the file to it, you could also see it.

Hope it helps.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航