hive update delete
2015-07-09 14:29
211 查看
Hive 从0.14开始支持事务,即支持update和delete操作。事务操作有严格的要求,在写这篇文章时用的1.1.0有以下限制
BEGIN, COMMIT, and ROLLBACK are not yet supported. All language operations are auto-commit. The plan is to support these in a future release.
Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
By default transactions are configured to be off. See the Configuration section below for a discussion of which values need to be set to configure it.
Tables must be bucketed to make use of these features. Tables in the same system not using transactions and ACID do not need to be bucketed.
At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.
The existing ZooKeeper and in-memory lock managers are not compatible with transactions. There is no intention to address this issue. See Basic Design below for a discussion of how locks are stored for transactions.
经过多次试验,才最终实现,具体步骤如下:
1、设置相关参数
2、创建表
3、测试
注:update\delete后面的where表达式不支持子查询
参考
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
BEGIN, COMMIT, and ROLLBACK are not yet supported. All language operations are auto-commit. The plan is to support these in a future release.
Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
By default transactions are configured to be off. See the Configuration section below for a discussion of which values need to be set to configure it.
Tables must be bucketed to make use of these features. Tables in the same system not using transactions and ACID do not need to be bucketed.
At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.
The existing ZooKeeper and in-memory lock managers are not compatible with transactions. There is no intention to address this issue. See Basic Design below for a discussion of how locks are stored for transactions.
经过多次试验,才最终实现,具体步骤如下:
1、设置相关参数
set hive.support.concurrency=true; set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=true; set hive.compactor.worker.threads=1;
2、创建表
use tmp; drop table cuixz; create table cuixz ( log_id int, agg_status tinyint ) CLUSTERED BY (log_id) INTO 1 BUCKETS stored as orc TBLPROPERTIES ("transactional"="true","NO_AUTO_COMPACTION"="true"); --目前大小写敏感
3、测试
hive> insert into cuixz select log_id, agg_status from stg_dp.s_log_file; hive> select * from cuixz where log_id = 3480224; OK 3480224 NULL Time taken: 0.213 seconds, Fetched: 1 row(s) hive> update cuixz set agg_status = 1 where log_id = 3480224; hive> select * from cuixz where log_id = 3480224; OK 3480224 1 Time taken: 0.144 seconds, Fetched: 1 row(s) hive> select * from cuixz where log_id = 3480224; hive> select * from cuixz where log_id = 3480224; OK Time taken: 0.351 seconds
注:update\delete后面的where表达式不支持子查询
参考
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
相关文章推荐
- SCCM 2012安装教程(Part1)
- JAVA 并发编程-线程与进程的由来(一)
- javascript正整数,输入验证,字节长度计算
- Spring 配置的2个常见问题
- 抚摸那条船——彭晓东
- 除了封装,继承,多态 您还知道那些?-面向对象设计的金字塔
- canvas 截图图片 生成新图片
- Restful Jersey-1 入门例子
- validform表单验证插件最终版
- servlet填充Response时,数据转换之content-type
- Git命令及使用
- Flume分布式日志收集收集系统
- 用WaitForSingleObject()函数实现简单的多线程互斥访问
- jmeter之监听器
- 手动脱NsPacK壳实战
- iOS获取系统语言
- GRE写作必备句型
- CentOS 6.5 服务端配置vnc-server windows客户端远程桌面访问CentOS 6.5
- Algorithms—54.Spiral Matrix
- 基本数据类型那点事