MySql表链接详解(结合Hadoop中的Hive数据仓库)
2016-07-22 12:13
681 查看
在做Hadoop黑马日志分析项目的过程中,进行了表的链接。本篇博客将结合Hive详细说明Mysql表链接。:
1、统计每日的pv(浏览量)
查看运行结果:
2、统计每日的register(注册用户数)
查看运行结果:
3、统计每日的独立的ip
查看运行结果:
4、统计每日的独立的跳出率
查看运行结果:
到此,上面四个表已经获取到了相应的运行结果:
接下来进行表链接:
如有问题,欢迎指正!
1、统计每日的pv(浏览量)
hive> create table hmbbs_pv > as select count(1) as pv from hmbbs_table;
查看运行结果:
hive> describe hmbbs_pv; OK pv bigint Time taken: 0.102 seconds hive> select pv from hmbbs_pv; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1469064014798_0058, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0058/ Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0058 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-07-22 10:29:35,920 Stage-1 map = 0%, reduce = 0% 2016-07-22 10:29:42,164 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84 sec 2016-07-22 10:29:43,211 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84 sec 2016-07-22 10:29:44,254 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84 sec MapReduce Total cumulative CPU time: 1 seconds 840 msec Ended Job = job_1469064014798_0058 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.84 sec HDFS Read: 204 HDFS Write: 7 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 840 msec OK 169857 Time taken: 13.668 seconds
2、统计每日的register(注册用户数)
hive> create table hmbbs_register > as select count(1) as register > from hmbbs_table > where instr(urllog,'member.php?mod=register') > 0;
查看运行结果:
hive> describe hmbbs_register; OK register bigint Time taken: 0.098 seconds hive> select register from hmbbs_register; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1469064014798_0061, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0061/ Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0061 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-07-22 10:37:48,848 Stage-1 map = 0%, reduce = 0% 2016-07-22 10:37:54,047 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.04 sec 2016-07-22 10:37:55,094 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.04 sec MapReduce Total cumulative CPU time: 1 seconds 40 msec Ended Job = job_1469064014798_0061 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.04 sec HDFS Read: 206 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 40 msec OK 28 Time taken: 12.342 seconds
3、统计每日的独立的ip
hive> create table hmbbs_ip as > select count(distinct iplog) as ip > from hmbbs_table;
查看运行结果:
hive> describe hmbbs_ip; OK ip bigint Time taken: 0.097 seconds hive> select ip from hmbbs_ip; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1469064014798_0063, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0063/ Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0063 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-07-22 10:42:12,811 Stage-1 map = 0%, reduce = 0% 2016-07-22 10:42:18,055 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.27 sec 2016-07-22 10:42:19,113 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.27 sec 2016-07-22 10:42:20,155 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.27 sec MapReduce Total cumulative CPU time: 1 seconds 270 msec Ended Job = job_1469064014798_0063 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.27 sec HDFS Read: 203 HDFS Write: 6 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 270 msec OK 10411 Time taken: 13.477 seconds
4、统计每日的独立的跳出率
hive> CREATE TABLE hmbbs_jumper AS SELECT COUNT(1) AS jumper FROM (SELECT COUNT(iplog) AS times FROM hmbbs_table GROUP BY iplog HAVING times=1) e ;
查看运行结果:
hive> describe hmbbs_jumper; OK jumper bigint Time taken: 0.096 seconds hive> select jumper from hmbbs_jumper; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1469064014798_0066, Tracking URL = http://hadoop22:8088/proxy/application_1469064014798_0066/ Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0066 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-07-22 10:49:40,450 Stage-1 map = 0%, reduce = 0% 2016-07-22 10:49:46,697 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec 2016-07-22 10:49:47,742 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec MapReduce Total cumulative CPU time: 1 seconds 650 msec Ended Job = job_1469064014798_0066 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.65 sec HDFS Read: 206 HDFS Write: 5 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 650 msec OK 3749 Time taken: 13.463 seconds
到此,上面四个表已经获取到了相应的运行结果:
hive> show tables; OK hmbbs_ip hmbbs_jumper hmbbs_pv hmbbs_register hmbbs_table Time taken: 0.081 seconds hive> select * from hmbbs_ip; OK 10411 Time taken: 0.111 seconds hive> select * from hmbbs_jumper; OK 3749 Time taken: 0.107 seconds hive> select * from hmbbs_pv; OK 169857 Time taken: 0.108 seconds hive> select * from hmbbs_register; OK 28 Time taken: 0.107 seconds
接下来进行表链接:
表关联:1层 select from hmbbs_pv join hmbbs_register on join hmbbs_ip on join hmbbs_jumper on 表关联:2层 select from hmbbs_pv join hmbbs_register on 1=1 join hmbbs_ip on 1=1 join hmbbs_jumper on 1=1 表关联:3层 (给每个表起别名:hmbbs_pv a hmbbs_register b hmbbs_ip c hmbbs_jumper d ) select from hmbbs_pv a join hmbbs_register b on 1=1 join hmbbs_ip c on 1=1 join hmbbs_jumper d on 1=1 表关联:4层 (取每个表中特定的字段) select a.pv,b.register,c.ip,d.jumper from hmbbs_pv a join hmbbs_register b on 1=1 join hmbbs_ip c on 1=1 join hmbbs_jumper d on 1=1 表关联:4层 (增加一个字段,变成5个字段) select '2013_05_30',a.pv,b.register,c.ip,d.jumper from hmbbs_pv a join hmbbs_register b on 1=1 join hmbbs_ip c on 1=1 join hmbbs_jumper d on 1=1
如有问题,欢迎指正!
相关文章推荐
- MySQL运行原理与基础架构
- ubuntu下安装配置部署zabbix——mysql监控
- 大型网站应用中MySQL的架构演变史
- 从运维角度浅谈 MySQL 数据库优化
- MySQL数据库运维的五大指标
- MySQL配备HeartBeat实现心跳监控和浮动IP
- Android 数据存储详解(SharedPreferences, 文件, Sqlite, ContentProvider)
- How MySQL Opens and Closes Tables
- Sqoop---Got exception in update thread: com.mysql.jd bc.exceptions.jdbc4.MySQLSyntaxErrorException
- 利用Oracle SQL Developer来连接Oracle数据库
- 第一章 MYSQL的架构和历史
- PLSQL Developer常见问题
- mysqldump 安全 --skip-add-drop-table
- mysql导入大批量数据出现MySQL server has gone away的解决方法
- MAC中Django中runserver提示Can't connect to local MySQL server through socket '/tmp/mysql.sock错误
- pl/sql developer中如何导出oracle数据库结构? 参考文章一
- PLSQL Developer设置
- MySql的Delete、Truncate、Drop分析
- mysql 主从复制延迟监控
- 基于SQLiteDatabase使用ContentProvider共享数据