您的位置:首页 > 其它

mahout推荐8-利用布尔型数据评估查准率和查全率

2014-08-04 12:48 288 查看
直接上代码吧:

package mahout;

import java.io.File;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

public class IRSBoolean {

public static void main(String[] args) throws Exception {
//无偏好值的datamodel
DataModel dataModel = new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(new FileDataModel(
new File("data/ua.base"))));
//评估器
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
//推荐引擎构造器,需要构造和实际使用一样的
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

public Recommender buildRecommender(DataModel model)
throws TasteException {
// TODO Auto-generated method stub
//用户相似度,采用Log,而不是Pearson
UserSimilarity userSimilarity = new LogLikelihoodSimilarity(
model);
//用户邻居
UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(
10, userSimilarity, model);
return new GenericUserBasedRecommender(model, userNeighborhood,
userSimilarity);
//return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity);
}
};
//数据模型构造器
DataModelBuilder modelBuilder = new DataModelBuilder() {

public DataModel buildDataModel(FastByIDMap<PreferenceArray> map) {
// TODO Auto-generated method stub
return new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(map));
}
};
//评估标准
IRStatistics stats = evaluator.evaluate(recommenderBuilder,
modelBuilder, dataModel, null, 10,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);

System.out.println("查准率:" + stats.getPrecision());
System.out.println("查全率:" + stats.getRecall());
}
}


所得查准率和查全率

输出结果(有许多打印输出的):

....................
14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms
14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.2549125168236878 / 0.2549125168236878 / 0.004461601695666552 / 0.24390219904521424 / 1.0
14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 31ms
14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.25497311827957 / 0.25497311827957 / 0.004461238812697198 / 0.2439398499255423 / 1.0
查准率:0.25497311827957
查全率:0.25497311827957


书中所查大约为24.7%,有点不一致哎。

换一个推荐程序:

//return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity);

将他打开,看看结果如何:

.................................
14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms
14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.17321668909825047 / 0.17321668909825047 / 0.004950798268872743 / 0.1803236393639469 / 1.0
14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 32ms
14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.1731182795698926 / 0.1731182795698926 / 0.004951387547485665 / 0.1801745904157921 / 1.0
查准率:0.1731182795698926
查全率:0.1731182795698926


书中为22.9%,为何我的都要小呢。难道数据集发生了变化。.....................

类似还有其他datamodel的布尔型变种,如MySQLBooleanPrefDataModel
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: