[转]Hive/Beeline 使用笔记
2015-07-22 09:59
344 查看
FROM : http://www.7mdm.com/1407.html
{hive/lib/hive-common-*.jar
hive/lib/hive-contrib-*.jar
hive/lib/hive-jdbc-*.jar
hive/lib/libthrift-*.jar
hive/lib/hive-service-*.jar
hive/lib/httpclient-*.jar
hive/lib/httpcore-*.jar
hadoop/share/hadoop/common/hadoop-common--*.jar
hadoop/share/hadoop/common/lib/common-configuration-*.jar
hadoop/share/hadoop/common/lib/log4j-*.jar
hadoop/share/hadoop/common/lib/slf4j-api-*.jar
hadoop/share/hadoop/common/lib/slf4j-log4j-*.jar}
->List Drivers(wait ..then class name will auto set org.apache.hive.jdbc/HiveDriver)->OK->Add aliases ->chose the hive driver->done
2.复制数据到另一个hdfs
hadoop distcp hdfs:
3.导入表
insert overwrite local directory './test-04'
row format delimited
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
select * from src;
输出到hdfs:
输出到hdfs好像不支持 row format,只能另辟蹊径了
INSERT OVERWRITE DIRECTORY '/outputable.txt'
select concat(col1, ',', col2, ',', col3) from myoutputtable;
当然默认的分隔符是\001
若要直接对文件进行操作课直接用stdin的形式
eg. hadoop fs -cat ../000000_0 |python doSomeThing.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
(a,b,c)=line.strip().split('\001')
需要用grouping sets
select col1 as col1 from table group by col1 grouping sets((col1))
hive2='JAVA_HOME=/opt/java7 HADOOP_HOME=/opt/hadoop /opt/hive/bin/beeline -u jdbc:hive2://n1.hd2.host.dxy:10000 -n hadoop -p fake -d org.apache.hive.jdbc.HiveDriver --color=true --silent=false --fastConnect=false --verbose=true'
beeline利用jdbc连接hive若需要执行多条命令使用
hive2 -e "xxx" -e "yyy" -e...
Hive:
利用squirrel-sql 连接hive
add driver -> name&example url(jdbc:hive2://xxx:10000)->extra class path ->Add{hive/lib/hive-common-*.jar
hive/lib/hive-contrib-*.jar
hive/lib/hive-jdbc-*.jar
hive/lib/libthrift-*.jar
hive/lib/hive-service-*.jar
hive/lib/httpclient-*.jar
hive/lib/httpcore-*.jar
hadoop/share/hadoop/common/hadoop-common--*.jar
hadoop/share/hadoop/common/lib/common-configuration-*.jar
hadoop/share/hadoop/common/lib/log4j-*.jar
hadoop/share/hadoop/common/lib/slf4j-api-*.jar
hadoop/share/hadoop/common/lib/slf4j-log4j-*.jar}
->List Drivers(wait ..then class name will auto set org.apache.hive.jdbc/HiveDriver)->OK->Add aliases ->chose the hive driver->done
Hive数据迁移
1.导出表EXPORT TABLE <table_name> TO
'path/to/hdfs'
;
2.复制数据到另一个hdfs
hadoop distcp hdfs:
//
:8020
/path/to/hdfs
hdfs:
///path/to/hdfs
3.导入表
IMPORT TABLE <table_name> FROM
'path/to/another/hdfs'
;
Hive 输出查询结果到文件
输出到本地文件:insert overwrite local directory './test-04'
row format delimited
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
select * from src;
输出到hdfs:
输出到hdfs好像不支持 row format,只能另辟蹊径了
INSERT OVERWRITE DIRECTORY '/outputable.txt'
select concat(col1, ',', col2, ',', col3) from myoutputtable;
当然默认的分隔符是\001
若要直接对文件进行操作课直接用stdin的形式
eg. hadoop fs -cat ../000000_0 |python doSomeThing.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
(a,b,c)=line.strip().split('\001')
Hive 语法:
hive好像不支持select dicstinct col1 as col1 from table group by col1需要用grouping sets
select col1 as col1 from table group by col1 grouping sets((col1))
Beeline:
文档:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients利用jdbc连接hive:
hive2='JAVA_HOME=/opt/java7 HADOOP_HOME=/opt/hadoop /opt/hive/bin/beeline -u jdbc:hive2://n1.hd2.host.dxy:10000 -n hadoop -p fake -d org.apache.hive.jdbc.HiveDriver --color=true --silent=false --fastConnect=false --verbose=true'
beeline利用jdbc连接hive若需要执行多条命令使用
hive2 -e "xxx" -e "yyy" -e...
相关文章推荐
- leetcode Reverse Nodes in k-Group
- Categories VS Extensions (分类 vs 扩展)
- MMDrawerController抽屉效果类库
- 仿Google分页的经典案例
- 查询Record中字段描述-PeoppleSoft
- 数字图像处理,bmp位图灰度化
- iOS原生二维码扫描(可以指定有效区域)
- 努力和上进不是为了做给别人看,而是为了不辜负自己
- 字节对齐
- 03-树3. Tree Traversals Again (25)
- product_store_shipment_meth
- Step into Kotlin - 18 - Extension 与 this
- 程序员的能力拓展模型
- 本机微信开发环境搭建
- 基本shell编程【3】- 常用的工具awk\sed\sort\uniq\od
- mq安装参考
- Gradle gitignore Gradle 模式 上传SVN 要忽略的文件
- sqlite 数据库 相关知识
- 在Chrome调试JavaScript代码以及审查元素各个tab说明
- OpenCV 中YUV420格式转换为IpImage格式