您的位置:首页 > 运维架构

运行mahout0.6 hadoop版本的CF

2013-03-28 14:14 267 查看
1准备数据集这里使用了movieLen的数据集,

下载地址: http://www.grouplens.org/node/73 这里选取了那个1m的数据集

下载之后还要写个小程序将格式转化成csv格式的,才能在作为hadoop的输入

package com.dataset.format.convert;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.Date;

public class MovieLensToCSV {

/**
* @param args
*/
//private static final String srcPath = "/home/hadoop/下载/DataSets/MovieLens/ml-1m/ml-1m/ratings.dat";   //1m
//private static final String desPath = "/home/hadoop/下载/DataSets/MovieLens/ratings_1m_csv.dat";

private static final String srcPath = "/home/hadoop/下载/DataSets/MovieLens/ml-10M100K/ratings.dat";   //10m
private static final String desPath = "/home/hadoop/下载/DataSets/MovieLens/ratings_10m_csv.dat";

public static void main(String[] args) {
System.out.println("Start convert"+new Date().toLocaleString());
try {
File outFile = new File(desPath);
if(!outFile.exists())
outFile.createNewFile();
FileWriter writer = new FileWriter(outFile,true);
BufferedReader reader = new BufferedReader(new FileReader(new File(srcPath)));
String line = null;
int  num =0;
StringBuffer buffer = new StringBuffer();
while((line=reader.readLine())!=null){
int index = line.lastIndexOf("::");
line= line.substring(0,index);
line = line.replaceAll(":+", ","); //这个函数使用时,必须把替换后的string赋值回去
buffer.append(line).append("\n");
num++;
if(num==100){
writer.write(buffer.toString());
writer.flush();
num=0;
buffer = new StringBuffer();
}
}
if (num != 0) {
writer.write(buffer.toString());
writer.flush();
}

reader.close();
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("End convert"+new Date().toLocaleString());

}

}


2 上传数据到hdfs

3 运行推荐:

hadoop jar $MAHOUT_HOME/mahout-core-0.6-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i /input/movieLen/1m -o /output/movieLen/1m -s org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity

4 .查看运行结果

hadoop dfs -cat /output/movieLen/1m/part-r-00000

如果要将结果输出到文件中还可以 : hadoop dfs -cat /output/movieLen/1m/part-r-00000 > /home/hadoop/1.txt
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: