【Apache HBase系列】HBase ORM框架GORA使用文档
2013-11-21 17:09
489 查看
开源框架 Apache GORA 提供了一个内存中的大数据的数据模型和持久性。
Gora 支持列存储,关键值存储,文档存储和关系数据库管理系统,具有广泛的Apache Hadoop的MapReduce的支持和分析数据。
GORA使用步骤:
1、配置gora.properties文件
2、定义数据源BEAN,以JSON格式定义数据源BEAN,
创建一个json文件,内容如下
3、apache gora使用了arvo框架作为orm映射的实体,这里可以使用gora自带的工具来对json文件进行编译,获取你要的实体对象
编译工具说明如下:
示例:
4、定义数据存储映射:gora-hbase-mapping.xml
完成以上三部工作之后,接下来需要做的是实体和表的映射配置
示例如下:
5、Api
1)、初始化创建HBaseStore对象
这里GORA会根据你上面编译的实体类以及gora-hbase-mapping.xml帮你创建好相应的hbase数据库表
2)、数据存储
3)、读取数据
4)、查询
遍历结果
5)、删除数据
6)、MapReduce支持
JOB:
Mapper:
Reducer:
GORA除了支持HBASE外,还支持sql(mysql、hsql),dynamodb,cassandra,accumulo。需要的话大伙可以试试其他功能。具体使用与上面的使用方法类似!
Gora 支持列存储,关键值存储,文档存储和关系数据库管理系统,具有广泛的Apache Hadoop的MapReduce的支持和分析数据。
GORA使用步骤:
1、配置gora.properties文件
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore gora.datastore.autocreateschema=true
2、定义数据源BEAN,以JSON格式定义数据源BEAN,
创建一个json文件,内容如下
{ "type": "record", "name": "Pageview", "namespace": "org.apache.gora.tutorial.log.generated", "fields" : [ {"name": "url", "type": "string"}, {"name": "timestamp", "type": "long"}, {"name": "ip", "type": "string"}, {"name": "httpMethod", "type": "string"}, {"name": "httpStatusCode", "type": "int"}, {"name": "responseSize", "type": "int"}, {"name": "referrer", "type": "string"}, {"name": "userAgent", "type": "string"} ] }
3、apache gora使用了arvo框架作为orm映射的实体,这里可以使用gora自带的工具来对json文件进行编译,获取你要的实体对象
$ bin/gora goracompile
编译工具说明如下:
$ Usage: GoraCompiler <schema file> <output dir> [-license <id>] <schema file> - individual avsc file to be compiled or a directory path containing avsc files <output dir> - output directory for generated Java files [-license <id>] - the preferred license header to add to the generated Java file. Current options include; ASLv2 (Apache Software License v2.0) AGPLv3 (GNU Affero General Public License) CDDLv1 (Common Development and Distribution License v1.0) FDLv13 (GNU Free Documentation License v1.3) GPLv1 (GNU General Public License v1.0) GPLv2 (GNU General Public License v2.0) GPLv3 (GNU General Public License v3.0) LGPLv21 (GNU Lesser General Public License v2.1) LGPLv3 (GNU Lesser General Public License v2.1)
示例:
$ bin/gora goracompiler gora-tutorial/src/main/avro/pageview.json gora-tutorial/src/main/java/
4、定义数据存储映射:gora-hbase-mapping.xml
完成以上三部工作之后,接下来需要做的是实体和表的映射配置
示例如下:
<!-- This is gora-sql-mapping.xml <gora-orm> <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog"> <primarykey column="line"/> <field name="url" column="url" length="512" primarykey="true"/> <field name="timestamp" column="timestamp"/> <field name="ip" column="ip" length="16"/> <field name="httpMethod" column="httpMethod" length="6"/> <;field name="httpStatusCode" column="httpStatusCode"/> <field name="responseSize" column="responseSize"/> <field name="referrer" column="referrer" length="512"/> <field name="userAgent" column="userAgent" length="512"/> </class> ... </gora-orm> --> <gora-orm> <table name="Pageview"> <!-- optional descriptors for tables --> <family name="common"> <!-- This can also have params like compression, bloom filters --> <family name="http"/> <family name="misc"/> </table> <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog"> <field name="url" family="common" qualifier="url"/> <field name="timestamp" family="common" qualifier="timestamp"/> <field name="ip" family="common" qualifier="ip" /> <field name="httpMethod" family="http" qualifier="httpMethod"/> <field name="httpStatusCode" family="http" qualifier="httpStatusCode"/> <field name="responseSize" family="http" qualifier="responseSize"/> <field name="referrer" family="misc" qualifier="referrer"/> <field name="userAgent" family="misc" qualifier="userAgent"/> </class> ... </gora-orm>
5、Api
1)、初始化创建HBaseStore对象
private void init() throws IOException { dataStore = DataStoreFactory.getDataStore(Long.class, Pageview.class); }
这里GORA会根据你上面编译的实体类以及gora-hbase-mapping.xml帮你创建好相应的hbase数据库表
2)、数据存储
/** Stores the pageview object with the given key */ private void storePageview(long key, Pageview pageview) throws IOException { dataStore.put(key, pageview); }
3)、读取数据
/** Fetches a single pageview object and prints it*/ private void get(long key) throws IOException { Pageview pageview = dataStore.get(key); printPageview(pageview); }
4)、查询
/** Queries and prints pageview object that have keys between startKey and endKey*/ private void query(long startKey, long endKey) throws IOException { Query<Long, Pageview> query = dataStore.newQuery(); //set the properties of query query.setStartKey(startKey); query.setEndKey(endKey); Result<Long, Pageview> result = query.execute(); printResult(result); }
遍历结果
private void printResult(Result<Long, Pageview> result) throws IOException { while(result.next()) { //advances the Result object and breaks if at end long resultKey = result.getKey(); //obtain current key Pageview resultPageview = result.get(); //obtain current value object //print the results System.out.println(resultKey + ":"); printPageview(resultPageview); } System.out.println("Number of pageviews from the query:" + result.getOffset()); }
5)、删除数据
/**Deletes the pageview with the given line number */ private void delete(long lineNum) throws Exception { dataStore.delete(lineNum); dataStore.flush(); //write changes may need to be flushed before they are committed } /** This method illustrates delete by query call */ private void deleteByQuery(long startKey, long endKey) throws IOException { //Constructs a query from the dataStore. The matching rows to this query will be deleted Query<Long, Pageview> query = dataStore.newQuery(); //set the properties of query query.setStartKey(startKey); query.setEndKey(endKey); dataStore.deleteByQuery(query); }
6)、MapReduce支持
JOB:
public Job createJob(DataStore<Long, Pageview> inStore , DataStore<String, MetricDatum> outStore, int numReducer) throws IOException { Job job = new Job(getConf()); job.setJobName("Log Analytics"); job.setNumReduceTasks(numReducer); job.setJarByClass(getClass()); /* Mappers are initialized with GoraMapper.initMapper() or * GoraInputFormat.setInput()*/ GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class , LogAnalyticsMapper.class, true); /* Reducers are initialized with GoraReducer#initReducer(). * If the output is not to be persisted via Gora, any reducer * can be used instead. */ GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class); return job; }
Mapper:
private TextLong tuple; protected void map(Long key, Pageview pageview, Context context) throws IOException ,InterruptedException { Utf8 url = pageview.getUrl(); long day = getDay(pageview.getTimestamp()); tuple.getKey().set(url.toString()); tuple.getValue().set(day); context.write(tuple, one); };
Reducer:
protected void reduce(TextLong tuple , Iterable<LongWritable> values, Context context) throws IOException ,InterruptedException { long sum = 0L; //sum up the values for(LongWritable value: values) { sum+= value.get(); } String dimension = tuple.getKey().toString(); long timestamp = tuple.getValue().get(); metricDatum.setMetricDimension(new Utf8(dimension)); metricDatum.setTimestamp(timestamp); String key = metricDatum.getMetricDimension().toString(); metricDatum.setMetric(sum); context.write(key, metricDatum); };
GORA除了支持HBASE外,还支持sql(mysql、hsql),dynamodb,cassandra,accumulo。需要的话大伙可以试试其他功能。具体使用与上面的使用方法类似!
相关文章推荐
- 【Scikit-Learn 中文文档】使用 scikit-learn 介绍机器学习 - scikit-learn 教程 | ApacheCN
- 【Scikit-Learn 中文文档】四十三:使用 scikit-learn 介绍机器学习 - scikit-learn 教程 | ApacheCN
- Apache Kudu 与 Impala Shell 的结合使用文档(创建表、删、改、查)
- 使用HBase处理海量数据系列—Part4—Java API
- 【Scikit-Learn 中文文档】使用 scikit-learn 介绍机器学习 | ApacheCN
- (转)使用Apache Xerces解析XML文档
- 【Scikit-Learn 中文文档】使用 scikit-learn 介绍机器学习 | ApacheCN
- 【大数据系列】apache hive 官方文档翻译
- 使用Apache Xerces解析XML文档
- aio系列文档(3)----protobuf入门与使用
- java使用openoffice将office系列文档转换为PDF
- 使用Apache Xerces解析XML文档
- JBoss 系列二:使用Apache httpd(mod_cluster)和JBoss构架高可用集群环境
- NPOI 2.1.1 系列(2) 使用NPOI读取List或者datatable数据生成 Excel文档 ;Npoi生成 xlsx 2007以上文档
- 【Scikit-Learn 中文文档】使用 scikit-learn 介绍机器学习 - scikit-learn 教程 | ApacheCN
- Android官方开发文档Training系列课程中文版:后台服务之IntentService的使用
- 【Apache Nutch系列】Nutch2.2+hadoop+hbase+zookeeper环境部署
- iOS指南系列:使用QLPreviewController浏览文档
- 【SharePoint 文档管理解决方案设计系列一】文档使用分析