Item-Based Recommendations with Hadoop
2015-11-12 15:58
337 查看
Mahout在MapReduce上实现了Item-Based Collaborative Filtering,这里我尝试运行一下。
1. 安装Hadoop
2. 从下载Mahout并解压
3. 准备数据
下载1 Million MovieLens Dataset,解压得到ratings.dat,用
处理成需要的格式。
4. 运行
mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/desired/output -n 25
参数:
参考
Introduction to Item-Based Recommendations with Hadoop
mahout分布式:Item-based推荐
1. 安装Hadoop
2. 从下载Mahout并解压
3. 准备数据
下载1 Million MovieLens Dataset,解压得到ratings.dat,用
sed 's/::\([0-9]\{1,\}\)::\([0-9]\{1\}\)::[0-9]\{1,\}$/,\1,\2/' ratings.dat
处理成需要的格式。
4. 运行
mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/desired/output -n 25
参数:
MAHOUT-JOB: /home/laxe/apple/mahout/mahout-examples-0.11.0-job.jar Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --numRecommendations (-n) numRecommendations Number of recommendations per user. --usersFile usersFile File of users to recommend for. --itemsFile itemsFile File of items to recommend for. --filterFile (-f) filterFile File containing comma-separated userID,itemID pairs. Used to exclude the item from the recommendations for that user(optional). --userItemFile (-uif) userItemFile File containing comma-separated userID,itemID pairs(optional). Used to include only these items into recommendations. Cannot be used together with usersFile or itemsFile. --booleanData (-b) booleanData Treat input as without prefvalues. --maxPrefsPerUser (-mxp) maxPrefsPerUser Maximum number of preferences considered per user in final recommendation phase. --minPrefsPerUser (-mp) minPrefsPerUser Ignore users with less preferences than this in the similarity computation (default: 1). --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem Maximum number of similarities considered per item. --maxPrefsInItemSimilarity (-mpiis) maxPrefsInItemSimilarity Max number of preferences to consider per user or item in the item similarity computation phase, users or items with more preferences will be sampled down(default: 500). --similarityClassname (-s) similarityClassname Name of distributed similarity measures class to instantiate, alternatively use one of the predefined similarities([SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE, SIMILARITY_PEARSON_CORRELATION, SIMILARITY_EUCLIDEAN_DISTANCE]) --threshold (-tr) threshold Discard item pairs with a similarity value below this. --outputPathForSimilarityMatrix (-opfsm) outputPathForSimilarityMatrix Write the items imilarity matrix to this path(optional). --randomSeed randomSeed Use this seed for sampling. --sequencefileOutput Write the output into a Sequence File instead of a text file. --help (-h) Print out help. --tempDir tempDir Intermediate output directory. --startPhase startPhase First phase to run. --endPhase endPhase Last phase to run specify HDFS directories while running on hadoop; else specify local file system directories.
参考
Introduction to Item-Based Recommendations with Hadoop
mahout分布式:Item-based推荐
相关文章推荐
- Remote Desktop Connection Manager
- linux下软件包的安装卸载
- CentOS下安装vsftpd(FTP)
- sudo and samba file example
- 较好的学习网站整理汇总
- linux的awk指令(个人案例)
- WebRTC中最简单loopback摄像头&麦克风的例子
- JAVA EE架构师 需要具备的知识
- linux awk命令详解
- 基于Linux下的开源wavplay播放器
- Cenos(6.6/7.1)下从源码安装Python+Django+uwsgi+nginx到写nginx的环境部署(一)
- tomcat 访问日志配置
- 关于OpenNI2和OpenCV2的那些事——获取三维点云数据并用OpenGL表示
- Linux上安装MySQL时出现不兼容的解决办法
- XAMPP安装注意事项之apache
- 第一章:SolrCloud4.9+zookeeper在CentOS上的搭建与安装
- spark-submit [options]
- linux上编译live555
- linux命令:scp
- mysql忘记密码怎么办(windows linux)