HBase与HDFS结合使用
2016-05-04 09:49
645 查看
默认的MapReduce集群没有访问HBase配置或类的权限,so,我们需要将hbase-site.xml文件放到hadoop的安装目录中的conf文件中,并且还需要在hadoop目录下的lib文件夹中加入HBase的Jar包,然后复制更改到集群或者编辑hadoop-evn.sh文件,添加这些更改到HADOOP_CLASSPATH(不推荐)
接下来,我们看两个简单的demo:
This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.
this is using HBase as a MapReduce source but HDFS as the sink.
接下来,我们看两个简单的demo:
This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleSummary"); job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class Text.class, // mapper output key IntWritable.class, // mapper output value job); TableMapReduceUtil.initTableReducerJob( targetTable, // output table MyTableReducer.class, // reducer class job); job.setNumReduceTasks(1); // at least one, adjust as required boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
this is using HBase as a MapReduce source but HDFS as the sink.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleSummaryToFile"); job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class Text.class, // mapper output key IntWritable.class, // mapper output value job); job.setReducerClass(MyReducer.class); // reducer class job.setNumReduceTasks(1); // at least one, adjust as required FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
相关文章推荐
- DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command
- HDFS详解
- HDFS快照管理
- 清理Kylin的中间存储数据(HDFS & HBase Tables)
- Hadoop2.7实战v1.0之Linux参数调优
- IMF传奇行动第85课:Spark Streaming第四课:基于HDFS的Spark Streaming案例实战和内幕源码解密
- HDFS写入和读取流程
- HDFS写入和读取流程
- 六:熟悉HDFS基本常用命令(一)
- 七:熟悉HDFS基本常用命令(二)
- HDFS存储容量扩充
- HDFS多用户管理ACL机制other权限访问控制的理解
- HDFS源码分析之FSImage文件内容(一)总体格式
- 设计和实现一个简单的hdfs的备份恢复与容灾系统(1)
- hadoop入门--虚拟机伪分布式搭建
- HDFS基本的读写文件
- hadoop命令 job相关
- Storm整合HDFS
- 第85课:基于HDFS的SparkStreaming案例实战和内幕源码解密
- HDFS源码分析数据块校验之DataBlockScanner