向MapReduce转换:生成用户向量
2016-02-22 10:20
357 查看
分两部分:
<span style="font-size:18px;">/*** * @author YangXin * @date 2016/2/21 * @ info 主要功能是mahout实现解析Wikipedia链接文件的Mapper接口 */ package unitSix; import java.io.IOException; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.mahout.math.VarLongWritable; public class WikipediaToItemPrefsMapper extends Mapper<LongWritable, Text, VarLongWritable, VarLongWritable>{ private static final Pattern NUMBERS = Pattern.compile("(\\d+)"); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); Matcher m = NUMBERS.matcher(line); //定位用户ID m.find(); VarLongWritable userID = new VarLongWritable(Long.parseLong(m.group())); VarLongWritable itemID = new VarLongWritable(); while(m.find()){ itemID.set(Long.parseLong(m.group())); //为每个物品ID生成用户-物品对 context.write(userID, itemID); } } }</span>
<strong><span style="font-size:18px;">/*** * @author YangXin * @info 功能是mahout实现从用户物品偏好中生成Vector的reducer接口 */ package unitSix; import java.io.IOException; import org.apache.hadoop.mapreduce.Reducer; import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.VarLongWritable; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; public class WikipediaToUserVectorReducer extends Reducer<VarLongWritable, VarLongWritable, VarLongWritable, VectorWritable>{ public void reduce(VarLongWritable userID, Iterable<VarLongWritable> itemPrefs, Context context) throws IOException, InterruptedException{ Vector userVector = new RandomAccessSparseVector(Integer.MAX_VALUE, 100); for(VarLongWritable itemPref : itemPrefs){ userVector.set((int)itemPref.get(), 1.0f); } context.write(userID, new VectorWritable(userVector)); } } </span></strong>
相关文章推荐
- JavaScript模拟表单(带数组的复杂数据结构)提交
- ping 命令
- 如何面试程序员?
- JSON详解
- ubuntu 14.04 nginx php
- jquery中事件委派代码分析以及jQuery中delegate和on的用法与区别详细解析
- 测试工作中常用到的SQL语句
- Ext grid行列操作
- dom4j操作xml的读写操作
- hibernate加载配置文件的两种方法
- 回调函数 钩子函数 有什么区别
- iOS海哥开发笔记 (海哥原创,数据存储篇之sqlite3的基本使用)
- iOS中复制对象的用法及深拷贝和浅拷贝详解
- 删除一个目录下的所有文件
- Android Low Memory Killer
- struts tags
- solr 的使用及安装【转】
- git 远程仓库管理
- iOS之相册实现
- WinCE面试经常被问到的问题