hive case when 引发错误一例
2013-12-16 10:27
465 查看
今天发现hive 在使用 case when then else end 方式下会存在BUG, 具体表现如下,
现有表: t_aa_pc_log, 其中一个字段为channel, 当channel值为'NA'或者'EMPTY'时
设置为'A', 其他值设置为'B', 然后输出channel值为'A'的前10个记录
查询一:根据需求写出SQL:
select a.channel
from
(
select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where a.channel='A' limit 10;
查询结果为空:
hive>
>
>
> select a.channel
> from
> (
> select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where a.channel='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1490941, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1490941
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1490941
2012-07-05 14:00:10,528 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:00:14,669 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:00:15,731 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1490941
OK
Time taken: 9.974 seconds
hive>
查询二:去掉外部查询的where条件:
select a.channel
from
(
select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491035, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491035
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491035
2012-07-05 14:03:55,864 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:03:59,913 Stage-1 map = 20%, reduce = 0%
2012-07-05 14:04:00,923 Stage-1 map = 60%, reduce = 0%
2012-07-05 14:04:01,932 Stage-1 map = 80%, reduce = 0%
2012-07-05 14:04:07,019 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:04:09,213 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491035
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 19.339 seconds
查询三: 在case when中去掉 OR 条件:
select a.channel
from
(
select case when channel = 'NA' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where a.channel='A' limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select case when channel = 'NA' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where a.channel='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491066, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491066
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491066
2012-07-05 14:05:19,557 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:05:22,579 Stage-1 map = 20%, reduce = 0%
2012-07-05 14:05:23,736 Stage-1 map = 60%, reduce = 0%
2012-07-05 14:05:25,768 Stage-1 map = 80%, reduce = 0%
2012-07-05 14:05:26,779 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:05:27,855 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491066
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 15.219 seconds
从以上三例来看,当case when 条件中用了or,并且有where查询条件的情况下会出现问题,但如果这个 where 条件值不是 case when中的字段,是否会有不同结果,再来试验一下:
查询四:
select a.channel
from
(
select deviceid, case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where deviceid like '%a%' limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select deviceid, case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where deviceid like '%a%' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491636, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491636
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491636
2012-07-05 14:38:37,209 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:38:40,241 Stage-1 map = 40%, reduce = 0%
2012-07-05 14:38:41,250 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:38:43,271 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491636
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 11.774 seconds
从上面四个查询可以确定,当某个字段应用了case when 条件并且使用了or, 且where查询条件的里正好有该字段,查询结果会出错
之后也分析过SQL执行计划, 并没发现什么问题,最后发现出现这个问题的表面原因, 在查询不到数据的情况,提交作业时设置的 mapred.input.dir 值有问题, 在查询有值的情况下这个输入路径就是该表的某个分区路径:
正常:
mapred.input.dir hdfs://nn.dc.sh-wgq.sdo.com/group/p_sdo_data/p_sdo_data_etl/aa/pc_log/2012-04-10-00
异常:
mapred.input.dir hdfs://nn.dc.sh-wgq.sdo.com/group/p_sdo_data/user/p_sdo_data_etl/meta/hive-exec/hive_2012-07-05_14-50-55_734_7476312896857938653/-mr-10002/1
出现异常的情况下估计是在HIVE分析SQL语法时出现问题,计算出的输入路径并不是表分区路径
针对查询一出现问题,修改下SQL, 将case when then else end, 改成case when then when then else end即可
修改后SQL如下:
select a.channel
from
(
select case channel when 'NA' then 'A' when 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where channel ='A' limit 10;
查询结果:
hive> select a.channel
> from
> (
> select case channel when 'NA' then 'A' when 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where channel ='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1492108, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1492108
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1492108
2012-07-05 15:02:32,005 Stage-1 map = 0%, reduce = 0%
2012-07-05 15:02:35,032 Stage-1 map = 20%, reduce = 0%
2012-07-05 15:02:36,042 Stage-1 map = 40%, reduce = 0%
2012-07-05 15:02:37,054 Stage-1 map = 100%, reduce = 0%
2012-07-05 15:02:38,067 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1492108
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 10.628 seconds
现有表: t_aa_pc_log, 其中一个字段为channel, 当channel值为'NA'或者'EMPTY'时
设置为'A', 其他值设置为'B', 然后输出channel值为'A'的前10个记录
查询一:根据需求写出SQL:
select a.channel
from
(
select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where a.channel='A' limit 10;
查询结果为空:
hive>
>
>
> select a.channel
> from
> (
> select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where a.channel='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1490941, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1490941
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1490941
2012-07-05 14:00:10,528 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:00:14,669 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:00:15,731 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1490941
OK
Time taken: 9.974 seconds
hive>
查询二:去掉外部查询的where条件:
select a.channel
from
(
select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491035, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491035
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491035
2012-07-05 14:03:55,864 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:03:59,913 Stage-1 map = 20%, reduce = 0%
2012-07-05 14:04:00,923 Stage-1 map = 60%, reduce = 0%
2012-07-05 14:04:01,932 Stage-1 map = 80%, reduce = 0%
2012-07-05 14:04:07,019 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:04:09,213 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491035
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 19.339 seconds
查询三: 在case when中去掉 OR 条件:
select a.channel
from
(
select case when channel = 'NA' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where a.channel='A' limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select case when channel = 'NA' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where a.channel='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491066, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491066
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491066
2012-07-05 14:05:19,557 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:05:22,579 Stage-1 map = 20%, reduce = 0%
2012-07-05 14:05:23,736 Stage-1 map = 60%, reduce = 0%
2012-07-05 14:05:25,768 Stage-1 map = 80%, reduce = 0%
2012-07-05 14:05:26,779 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:05:27,855 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491066
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 15.219 seconds
从以上三例来看,当case when 条件中用了or,并且有where查询条件的情况下会出现问题,但如果这个 where 条件值不是 case when中的字段,是否会有不同结果,再来试验一下:
查询四:
select a.channel
from
(
select deviceid, case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where deviceid like '%a%' limit 10;
查询结果有值:
hive> select a.channel
> from
> (
> select deviceid, case when channel = 'NA' or channel = 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where deviceid like '%a%' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1491636, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1491636
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1491636
2012-07-05 14:38:37,209 Stage-1 map = 0%, reduce = 0%
2012-07-05 14:38:40,241 Stage-1 map = 40%, reduce = 0%
2012-07-05 14:38:41,250 Stage-1 map = 100%, reduce = 0%
2012-07-05 14:38:43,271 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1491636
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 11.774 seconds
从上面四个查询可以确定,当某个字段应用了case when 条件并且使用了or, 且where查询条件的里正好有该字段,查询结果会出错
之后也分析过SQL执行计划, 并没发现什么问题,最后发现出现这个问题的表面原因, 在查询不到数据的情况,提交作业时设置的 mapred.input.dir 值有问题, 在查询有值的情况下这个输入路径就是该表的某个分区路径:
正常:
mapred.input.dir hdfs://nn.dc.sh-wgq.sdo.com/group/p_sdo_data/p_sdo_data_etl/aa/pc_log/2012-04-10-00
异常:
mapred.input.dir hdfs://nn.dc.sh-wgq.sdo.com/group/p_sdo_data/user/p_sdo_data_etl/meta/hive-exec/hive_2012-07-05_14-50-55_734_7476312896857938653/-mr-10002/1
出现异常的情况下估计是在HIVE分析SQL语法时出现问题,计算出的输入路径并不是表分区路径
针对查询一出现问题,修改下SQL, 将case when then else end, 改成case when then when then else end即可
修改后SQL如下:
select a.channel
from
(
select case channel when 'NA' then 'A' when 'EMPTY' then 'A' else 'B' end as channel
from t_aa_pc_log where pt = '2012-04-10-00'
)a where channel ='A' limit 10;
查询结果:
hive> select a.channel
> from
> (
> select case channel when 'NA' then 'A' when 'EMPTY' then 'A' else 'B' end as channel
> from t_aa_pc_log where pt = '2012-04-10-00'
> )a where channel ='A' limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205162059_1492108, Tracking URL = http://jt.dc.sh-wgq.sdo.com:50030/jobdetails.jsp?jobid=job_201205162059_1492108
Kill Command = /home/hdfs/hadoop-current/bin/hadoop job -Dmapred.job.tracker=10.133.10.103:50020 -kill job_201205162059_1492108
2012-07-05 15:02:32,005 Stage-1 map = 0%, reduce = 0%
2012-07-05 15:02:35,032 Stage-1 map = 20%, reduce = 0%
2012-07-05 15:02:36,042 Stage-1 map = 40%, reduce = 0%
2012-07-05 15:02:37,054 Stage-1 map = 100%, reduce = 0%
2012-07-05 15:02:38,067 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205162059_1492108
OK
A
A
A
A
A
A
A
A
A
A
Time taken: 10.628 seconds
相关文章推荐
- hive case when 引发错误一例
- hive case when 引发错误一例 推荐
- 大数据Spark “蘑菇云”行动第87课:Hive嵌套查询与Case、When、Then案例实战
- HiveSQL的CASE-WHEN的使用
- 未付初值,引发不确定后续错误一例
- 每日总结:sql 转换为int时发生算术溢出错误、DatePart()、DateAdd()、DateDiff()函数、Case when then
- 每日总结:sql 转换为int时发生算术溢出错误、DatePart()、DateAdd()、DateDiff()函数、Case when then
- SQL SERVER:CASE判断空,错误一例
- hive case when 和osort by 和group by使用记录
- perl调用mysql时出现的错误--处理一例
- Oracle CASE WHEN 用法介绍
- SQL Case when 的使用方法
- Oracle中decode函数与case when的使用
- 启动hive hwi服务时出现 HWI WAR file not found错误
- oracle 截取字符(substr),检索字符位置(instr) case when then else end语句使用
- ASP.NET WEB控件命名低级错误一例
- sql 将横的记录显示为竖的记录 max(case when CASE ltrim(ps.SIZE) WHEN '4.5' THEN ps.PairPerCarton END is null then null else ps.PairPerCarton end ) AS [4.5]
- 启动hive hwi服务时出现 HWI WAR file not found错误
- 编译后错误提示为pls-00103:出现符号""在需要下列之一时:begin case declare
- SQl_update,case_when,end