mahout推荐8-利用布尔型数据评估查准率和查全率
2014-08-04 12:48
288 查看
直接上代码吧:
所得查准率和查全率
输出结果(有许多打印输出的):
书中所查大约为24.7%,有点不一致哎。
换一个推荐程序:
//return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity);
将他打开,看看结果如何:
书中为22.9%,为何我的都要小呢。难道数据集发生了变化。.....................
类似还有其他datamodel的布尔型变种,如MySQLBooleanPrefDataModel
package mahout; import java.io.File; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.eval.DataModelBuilder; import org.apache.mahout.cf.taste.eval.IRStatistics; import org.apache.mahout.cf.taste.eval.RecommenderBuilder; import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator; import org.apache.mahout.cf.taste.impl.common.FastByIDMap; import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator; import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.model.PreferenceArray; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; public class IRSBoolean { public static void main(String[] args) throws Exception { //无偏好值的datamodel DataModel dataModel = new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(new FileDataModel( new File("data/ua.base")))); //评估器 RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); //推荐引擎构造器,需要构造和实际使用一样的 RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { public Recommender buildRecommender(DataModel model) throws TasteException { // TODO Auto-generated method stub //用户相似度,采用Log,而不是Pearson UserSimilarity userSimilarity = new LogLikelihoodSimilarity( model); //用户邻居 UserNeighborhood userNeighborhood = new NearestNUserNeighborhood( 10, userSimilarity, model); return new GenericUserBasedRecommender(model, userNeighborhood, userSimilarity); //return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity); } }; //数据模型构造器 DataModelBuilder modelBuilder = new DataModelBuilder() { public DataModel buildDataModel(FastByIDMap<PreferenceArray> map) { // TODO Auto-generated method stub return new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(map)); } }; //评估标准 IRStatistics stats = evaluator.evaluate(recommenderBuilder, modelBuilder, dataModel, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); System.out.println("查准率:" + stats.getPrecision()); System.out.println("查全率:" + stats.getRecall()); } }
所得查准率和查全率
输出结果(有许多打印输出的):
.................... 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.2549125168236878 / 0.2549125168236878 / 0.004461601695666552 / 0.24390219904521424 / 1.0 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 31ms 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.25497311827957 / 0.25497311827957 / 0.004461238812697198 / 0.2439398499255423 / 1.0 查准率:0.25497311827957 查全率:0.25497311827957
书中所查大约为24.7%,有点不一致哎。
换一个推荐程序:
//return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity);
将他打开,看看结果如何:
................................. 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.17321668909825047 / 0.17321668909825047 / 0.004950798268872743 / 0.1803236393639469 / 1.0 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 32ms 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.1731182795698926 / 0.1731182795698926 / 0.004951387547485665 / 0.1801745904157921 / 1.0 查准率:0.1731182795698926 查全率:0.1731182795698926
书中为22.9%,为何我的都要小呢。难道数据集发生了变化。.....................
类似还有其他datamodel的布尔型变种,如MySQLBooleanPrefDataModel
相关文章推荐
- mahout利用布尔型数据评估查准率和查全率
- mahout推荐3-评估查准率和查全率
- mahout中布尔型数据推荐系统的生成与评估
- mahout入门之评估查准率与查全率
- mahout实现查准率和查全率评估的配置与运行
- mahout推荐入门之评估GroupLens数据集
- Mahout in Action 读书笔记chapter3 推荐数据的表示
- 腾讯数十亿广告的秘密武器:利用大数据实时精准推荐
- mahout推荐10-尝试GroupLens数据集
- mahout推荐4-评估GroupLens数据集
- 使用Mahout搭建推荐系统之入门篇2-玩转你的数据1
- 推荐系统评估 查找率与查全率
- RedHat Linux下利用sersync进行实时同步数据 推荐
- 推荐系统(利用用户行为数据 )
- 数据挖掘-基于机器学习的SNS隐私策略推荐向导分类器的C++及WEKA实现与评估
- 学习开源推荐引擎Mahout中的刷新数据的设计
- 个性化推荐研究(四)之如何利用用户行为数据
- 推荐系统中对数据的需求和利用
- 利用bacula完成数据的备份恢复 推荐
- mahout推荐5-偏好数据的表示