您的位置:首页 > 大数据 > Hadoop

HBase与HDFS结合使用

2016-05-04 09:49 645 查看
默认的MapReduce集群没有访问HBase配置或类的权限,so,我们需要将hbase-site.xml文件放到hadoop的安装目录中的conf文件中,并且还需要在hadoop目录下的lib文件夹中加入HBase的Jar包,然后复制更改到集群或者编辑hadoop-evn.sh文件,添加这些更改到HADOOP_CLASSPATH(不推荐)

接下来,我们看两个简单的demo:

This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
sourceTable,        // input table
scan,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable,        // output table
MyTableReducer.class,    // reducer class
job);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}


this is using HBase as a MapReduce source but HDFS as the sink.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummaryToFile");
job.setJarByClass(MySummaryFileJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
sourceTable,        // input table
scan,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
job.setReducerClass(MyReducer.class);    // reducer class
job.setNumReduceTasks(1);    // at least one, adjust as required
FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile"));  // adjust directories as required

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: