您的位置：首页 > 大数据 > Hadoop

HBase与HDFS结合使用

2016-05-04 09:49 645 查看

默认的MapReduce集群没有访问HBase配置或类的权限，so，我们需要将hbase-site.xml文件放到hadoop的安装目录中的conf文件中，并且还需要在hadoop目录下的lib文件夹中加入HBase的Jar包，然后复制更改到集群或者编辑hadoop-evn.sh文件，添加这些更改到HADOOP_CLASSPATH（不推荐）

接下来，我们看两个简单的demo：

This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
sourceTable,        // input table
scan,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable,        // output table
MyTableReducer.class,    // reducer class
job);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

this is using HBase as a MapReduce source but HDFS as the sink.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummaryToFile");
job.setJarByClass(MySummaryFileJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
sourceTable,        // input table
scan,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
job.setReducerClass(MyReducer.class);    // reducer class
job.setNumReduceTasks(1);    // at least one, adjust as required
FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile"));  // adjust directories as required

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航