Hive数据仓库--HiveQL视图和索引
2016-10-10 19:43
435 查看
视图
创建这样的一个视图,高收入人群。我试了下,这里的视图并不会帮我们进行存储视图所代表的查询语句所包含的数据的,这里可以认为他就是一个复杂的语句的简化,是一个逻辑的视图,而不是物化视图,这里好像并没有对效率进行提升。视图在这里是将Hive的行和列进行的固化,但是并没有对数据进行固化,那么当你删除掉表中的列的时候,会造成视图的错误。
创建视图语句
[sql] viewplain copy
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...
创建视图
[sql] viewplain copy
hive>
> create view salaries_high as
> select * from salaries_external where salary > 500000;
OK
Time taken: 1.227 seconds
hive> select * from salaries_high limit 10;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1475147088438_0007, Tracking URL = http://hadoopwy1:8088/proxy/application_1475147088438_0007/
Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1475147088438_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-09-29 05:37:02,617 Stage-1 map = 0%, reduce = 0%
2016-09-29 05:37:10,092 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
MapReduce Total cumulative CPU time: 1 seconds 460 msec
Ended Job = job_1475147088438_0007
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.46 sec HDFS Read: 4422 HDFS Write: 310 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 460 msec
OK
1985 BAL AL murraed02 1472819.0
1985 BAL AL lynnfr01 1090000.0
1985 BAL AL ripkeca01 800000.0
1985 BAL AL lacyle01 725000.0
1985 BAL AL flanami01 641667.0
1985 BAL AL boddimi01 625000.0
1985 BAL AL stewasa01 581250.0
1985 BAL AL martide01 560000.0
1985 BAL AL roeniga01 558333.0
1985 BAL AL mcgresc01 547143.0
Time taken: 26.702 seconds, Fetched: 10 row(s)
删除视图
[sql] viewplain copy
hive> drop view if exists salaries_high;
OK
Time taken: 1.043 seconds
索引
创建索引语句
[sql] viewplain copy
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]
创建一个索引
索引表的[html] view
plain copy
hive> create index yearindex on table salaries_external(yearid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild in table salaries_external_index;
OK
Time taken: 0.475 seconds
仅仅索引的
[sql] view
plain copy
hive> create index index_test on table salaries_external(yearid) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD ;
OK
Time taken: 0.278 seconds
查看索引
[sql] viewplain copy
hive>
> show index on salaries_external;
OK
yearindex salaries_external yearid salaries_external_index compact
index_test salaries_external yearid default__salaries_external_index_test__ compact
Time taken: 0.077 seconds, Fetched: 2 row(s)
改变索引
[sql] viewplain copy
hive> alter index index_test on salaries_external rebuild;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1475147088438_0009, Tracking URL = http://hadoopwy1:8088/proxy/application_1475147088438_0009/
Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1475147088438_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-09-29 06:44:34,287 Stage-1 map = 0%, reduce = 0%
2016-09-29 06:45:02,611 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.2 sec
2016-09-29 06:45:18,538 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.85 sec
MapReduce Total cumulative CPU time: 3 seconds 850 msec
Ended Job = job_1475147088438_0009
Loading data to table default.default__salaries_external_index_test__
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://hadoopnodeservice1/user/hive/warehouse/default__salaries_external_index_test__
Table default.default__salaries_external_index_test__ stats: [numFiles=1, numRows=58, totalSize=321107, rawDataSize=321049]
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.85 sec HDFS Read: 1354022 HDFS Write: 321214 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 850 msec
OK
Time taken: 58.187 seconds
删除索引
[sql] viewplain copy
hive>
> drop index index_test on salaries_external;
OK
Time taken: 0.188 seconds
hive> show index on salaries_external;
OK
yearindex salaries_external yearid salaries_external_index compact
Time taken: 0.065 seconds, Fetched: 1 row(s)
转载请注明出处:Hive数据仓库--HiveQL视图和索引
相关文章推荐
- Hive数据仓库--HiveQL视图和索引
- 物化视图,索引,数据仓库
- Hive数据仓库--HiveQL查询
- 大数据Hive的案例、参数、动态分区、分桶、视图、索引、运行方式、权限管理、Hive的优化_03_03
- Hive 视图 索引 动态分区装载数据
- 数据仓库(九)---hive的性能优化---hive索引机制和原理
- Hive和并行数据仓库的比较
- 数据仓库之视图
- oracle 中的视图,索引,序列及同义词数据字典
- 物化视图——数据仓库手册
- oracle表空间,角色,权限,表,索引,序列号,视图,同义词,约束条件,存储函数和过程,常用数据字典,基本数据字典信息,查看VGA信息,维护表空间,创建表空间等信息
- hadoop+hive 做数据仓库 & 一些测试
- 空间数据、空间数据质量控制、空间数据索引、空间数据仓库
- Hive-数据仓库
- Hive与并行数据仓库的体系结构比较
- 从数据仓库系统对比看Hive发展前景
- 在VB.NET中 用代码 利用SQL语句创建数据库、表、存储过程、视图、索引、规则、修改表、查看数据等的方法
- oracle 中的视图,索引,序列及同义词数据字典
- 基于Hive的数据仓库架构
- Hive与并行数据仓库的体系结构比较