您的位置:首页 > 其它

Hive 处理CSV格式文件数据

2014-12-28 15:57 871 查看
一般情况下对于CSV格式文件数据,有多种第三方SerDer来处理。本文采用CSVSerDe:

一、添加第三方SerDe

首先在Hive classpath中添加第三方SerDe JAR包,命令如下:

hive> add jar /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar;
Added /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar to class path
Added resource: /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar


可以从该链接下载:csv-serde-1.1.2.jar,以某CSV文件为例介绍处理过程

二、某CSV日志文件格式如下:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!air, moon roof, loaded",4799.00
以逗号分隔,分别表示:年,制造商,型号,说明,价值

三、创建Hive表

hive> CREATE TABLE serde_csv(year STRING,company STRING,type STRING,description STRING,value STRING)
> ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
> STORED AS TEXTFILE ;
OK
Time taken: 0.072 seconds
四、导入数据

hive> LOAD DATA LOCAL INPATH "/home/hadoopUser/data/csv_serde.txt" INTO TABLE serde_csv;
Copying data from file:/home/hadoopUser/data/csv_serde.txt
Copying file: file:/home/hadoopUser/data/csv_serde.txt
Loading data to table hive.serde_csv
Table hive.serde_csv stats: [numFiles=1, numRows=0, totalSize=259, rawDataSize=0]
OK
Time taken: 0.389 seconds


五、查看Hive中导入的CSV数据

hive> select * from serde_csv;
OK
1997    Ford    E350    ac, abs, moon   3000.00
1999    Chevy   Venture "Extended Edition"              4900.00
1999    Chevy   Venture "Extended Edition, Very Large"          5000.00
1996    Jeep    Grand Cherokee  MUST SELL!air, moon roof, loaded        4799.00


参考:http://ogrodnek.github.io/csv-serde/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: