您的位置:首页 > 数据库 > SQL

MySql表链接详解(结合Hadoop中的Hive数据仓库)

2016-07-22 12:13 681 查看
在做Hadoop黑马日志分析项目的过程中,进行了表的链接。本篇博客将结合Hive详细说明Mysql表链接。:

1、统计每日的pv(浏览量)

hive> create table hmbbs_pv
> as select count(1) as pv from hmbbs_table;


查看运行结果:

hive> describe hmbbs_pv;
OK
pv      bigint
Time taken: 0.102 seconds

hive> select pv from hmbbs_pv;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1469064014798_0058, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0058/ Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0058
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-07-22 10:29:35,920 Stage-1 map = 0%,  reduce = 0%
2016-07-22 10:29:42,164 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.84 sec
2016-07-22 10:29:43,211 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.84 sec
2016-07-22 10:29:44,254 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.84 sec
MapReduce Total cumulative CPU time: 1 seconds 840 msec
Ended Job = job_1469064014798_0058
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.84 sec   HDFS Read: 204 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 840 msec
OK
169857
Time taken: 13.668 seconds


2、统计每日的register(注册用户数)

hive> create table hmbbs_register
> as select count(1) as register
> from hmbbs_table
> where instr(urllog,'member.php?mod=register') > 0;


查看运行结果:

hive> describe hmbbs_register;
OK
register        bigint
Time taken: 0.098 seconds

hive> select register from hmbbs_register;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1469064014798_0061, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0061/ Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0061
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-07-22 10:37:48,848 Stage-1 map = 0%,  reduce = 0%
2016-07-22 10:37:54,047 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.04 sec
2016-07-22 10:37:55,094 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.04 sec
MapReduce Total cumulative CPU time: 1 seconds 40 msec
Ended Job = job_1469064014798_0061
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.04 sec   HDFS Read: 206 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 40 msec
OK
28
Time taken: 12.342 seconds


3、统计每日的独立的ip

hive> create table hmbbs_ip as
> select count(distinct iplog)  as ip
> from hmbbs_table;


查看运行结果:

hive> describe hmbbs_ip;
OK
ip      bigint
Time taken: 0.097 seconds

hive> select ip from hmbbs_ip;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1469064014798_0063, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0063/ Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0063
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-07-22 10:42:12,811 Stage-1 map = 0%,  reduce = 0%
2016-07-22 10:42:18,055 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.27 sec
2016-07-22 10:42:19,113 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.27 sec
2016-07-22 10:42:20,155 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.27 sec
MapReduce Total cumulative CPU time: 1 seconds 270 msec
Ended Job = job_1469064014798_0063
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.27 sec   HDFS Read: 203 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 270 msec
OK
10411
Time taken: 13.477 seconds


4、统计每日的独立的跳出率

hive> CREATE TABLE hmbbs_jumper AS SELECT COUNT(1) AS jumper FROM (SELECT COUNT(iplog) AS times FROM   hmbbs_table  GROUP BY iplog  HAVING times=1) e ;


查看运行结果:

hive> describe hmbbs_jumper;
OK
jumper  bigint
Time taken: 0.096 seconds

hive> select jumper from hmbbs_jumper;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1469064014798_0066, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0066/ Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0066
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-07-22 10:49:40,450 Stage-1 map = 0%,  reduce = 0%
2016-07-22 10:49:46,697 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.65 sec
2016-07-22 10:49:47,742 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.65 sec
MapReduce Total cumulative CPU time: 1 seconds 650 msec
Ended Job = job_1469064014798_0066
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.65 sec   HDFS Read: 206 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 650 msec
OK
3749
Time taken: 13.463 seconds


到此,上面四个表已经获取到了相应的运行结果:

hive> show tables;
OK
hmbbs_ip
hmbbs_jumper
hmbbs_pv
hmbbs_register
hmbbs_table
Time taken: 0.081 seconds
hive> select * from hmbbs_ip;
OK
10411
Time taken: 0.111 seconds
hive> select * from hmbbs_jumper;
OK
3749
Time taken: 0.107 seconds
hive> select * from hmbbs_pv;
OK
169857
Time taken: 0.108 seconds
hive> select * from hmbbs_register;
OK
28
Time taken: 0.107 seconds


接下来进行表链接:

表关联:1层

select from hmbbs_pv
join hmbbs_register on
join hmbbs_ip       on
join hmbbs_jumper   on

表关联:2层

select from hmbbs_pv
join hmbbs_register on  1=1
join hmbbs_ip       on  1=1
join hmbbs_jumper   on  1=1

表关联:3层  (给每个表起别名:hmbbs_pv  a  hmbbs_register b   hmbbs_ip     c    hmbbs_jumper     d )

select from hmbbs_pv  a
join hmbbs_register   b  on  1=1
join hmbbs_ip         c  on  1=1
join hmbbs_jumper     d  on  1=1

表关联:4层  (取每个表中特定的字段)

select  a.pv,b.register,c.ip,d.jumper
from hmbbs_pv  a
join hmbbs_register   b  on  1=1
join hmbbs_ip         c  on  1=1
join hmbbs_jumper     d  on  1=1

表关联:4层  (增加一个字段,变成5个字段)

select '2013_05_30',a.pv,b.register,c.ip,d.jumper
from hmbbs_pv  a
join hmbbs_register   b  on  1=1
join hmbbs_ip         c  on  1=1
join hmbbs_jumper     d  on  1=1


如有问题,欢迎指正!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: