pig入门案例
2014-08-07 14:41
99 查看
测试数据位于:/home/hadoop/luogankun/workspace/sync_data/pig
person.txt中的数据以逗号分隔
score.txt中的数据以制表符分隔
pig只能针对HDFS上的文件进行操作,所以需要将文件先上传到HDFS中
[b]load文件(HDFS系统上的)[/b]
[b]查看表结构[/b]
[b]查看表数据[/b]
dump 会跑mapreduce任务。
[b]条件过滤[/b]
查询person中id小于4的人
pig中等号使用==, 例如:aa = filter a by id == 4;
[b]表关联[/b]
由于采用的是left join,所以只有四条数据,而且第四条数据是没有分数的。
[b]迭代数据[/b]
注意:foreach使用时只要等号前或者后有一个空格即可,如果等号两端都没有空格的话会报错。
[b]处理结果存储到HDFS系统上[/b]
[b]pig执行文件[/b]
将上面的所有pig shell脚本放到一个sh脚本中执行
/home/hadoop/luogankun/workspace/shell/pig/person_score.pig
执行person.score.pig脚本:
/home/hadoop/luogankun/workspace/shell/pig
[b]pig脚本传递参数[/b]
pig脚本位置:/home/hadoop/luogankun/workspace/shell/pig/mulit_params_demo01.pig
上传数据到hdfs文件中
传递方式一:逐个参数传递
传递方式二:将参数保存在txt文件中
/home/hadoop/luogankun/workspace/shell/pig/mulit_params.txt
person.txt中的数据以逗号分隔
1,zhangsan,112 2,lisi,113 3,wangwu,114 4,zhaoliu,115
score.txt中的数据以制表符分隔
1 20 2 30 3 40 5 50
pig只能针对HDFS上的文件进行操作,所以需要将文件先上传到HDFS中
cd /home/hadoop/luogankun/workspace/sync_data/pig hadoop fs -put person.txt input/pig/person.txt hadoop fs -put score.txt input/pig/score.txt
[b]load文件(HDFS系统上的)[/b]
a = load 'input/pig/person.txt' using PigStorage(',') as (id:int, name:chararray, age:int); b = load 'input/pig/score.txt' using PigStorage('\t') as (id:int, score:int);
[b]查看表结构[/b]
describe a a: {id: int,name: chararray,age: int} describe b b: {id: int,score: int}
[b]查看表数据[/b]
dump a (1,zhangsan,112) (2,lisi,113) (3,wangwu,114) (4,zhaoliu,115) dump b (1,20) (2,30) (3,40) (5,50)
dump 会跑mapreduce任务。
[b]条件过滤[/b]
查询person中id小于4的人
aa = filter a by id < 4; dump aa; (1,zhangsan,112) (2,lisi,113) (3,wangwu,114)
pig中等号使用==, 例如:aa = filter a by id == 4;
[b]表关联[/b]
c = join a by id left , b by id; describe c c: {a::id: int,a::name: chararray,a::age: int,b::id: int,b::score: int} #表名字段名之间两个冒号,字段与字段类型之间一个冒号 dump c (1,zhangsan,112,1,20) (2,lisi,113,2,30) (3,wangwu,114,3,40) (4,zhaoliu,115,,)
由于采用的是left join,所以只有四条数据,而且第四条数据是没有分数的。
[b]迭代数据[/b]
d =foreach c generate a::id as id, a::name as name, b::score as score, a::age as age; describe d; d: {id: int,name: chararray,score: int,age: int} dump d (1,zhangsan,20,112) (2,lisi,30,113) (3,wangwu,40,114) (4,zhaoliu,,115)
注意:foreach使用时只要等号前或者后有一个空格即可,如果等号两端都没有空格的话会报错。
[b]处理结果存储到HDFS系统上[/b]
store d into 'output/pig/person_score' using PigStorage(','); #导出到HDFS上的文件分隔符是逗号 hadoop fs -ls output/pig/person_score hadoop fs -cat output/pig/person_score/part-r-00000 1,zhangsan,20,112 2,lisi,30,113 3,wangwu,40,114 4,zhaoliu,,115 hadoop fs -rmr output/pig/person_score store d into 'output/pig/person_score'; #导出到HDFS上的文件分隔符是制表符 hadoop fs -ls output/pig/person_score hadoop fs -cat output/pig/person_score/part-r-00000 1 zhangsan 20 112 2 lisi 30 113 3 wangwu 40 114 4 zhaoliu 115
[b]pig执行文件[/b]
将上面的所有pig shell脚本放到一个sh脚本中执行
/home/hadoop/luogankun/workspace/shell/pig/person_score.pig
a = load 'input/pig/person.txt' using PigStorage(',') as (id:int, name:chararray, age:int); b = load 'input/pig/score.txt' using PigStorage('\t') as (id:int, score:int);
c = join a by id left , b by id;
d =foreach c generate a::id as id, a::name as name, b::score as score, a::age as age;
store d into 'output/pig/person_score2' using PigStorage(',');
执行person.score.pig脚本:
/home/hadoop/luogankun/workspace/shell/pig
pig person_score.pig
[b]pig脚本传递参数[/b]
pig脚本位置:/home/hadoop/luogankun/workspace/shell/pig/mulit_params_demo01.pig
log = LOAD '$input' AS (user:chararray, time:long, query:chararray); lmt = LIMIT log $size; DUMP lmt;
上传数据到hdfs文件中
cd /home/hadoop/luogankun/workspace/shell/pig hadoop fs -put excite-small.log input/pig/excite-small.log
传递方式一:逐个参数传递
pig -param input=input/pig/excite-small.log -param size=4 mulit_params_demo01.pig
传递方式二:将参数保存在txt文件中
/home/hadoop/luogankun/workspace/shell/pig/mulit_params.txt
input=input/pig/excite-small.log size=5 pig -param_file mulit_params.txt mulit_params_demo01.pig
相关文章推荐
- Hadoop入门进阶课程7--Pig介绍、安装与应用案例
- Hadoop入门进阶课程7--Pig介绍、安装与应用案例
- Hadoop入门进阶课程7--Pig介绍、安装与应用案例
- Hadoop入门进阶课程7--Pig介绍、安装与应用案例
- MapReduce经典入门小案例
- jpa入门案例
- kylin从入门到实战:实际案例
- JAVA_WEB项目之Lucene检索框架入门案例
- sklearn与GBDT入门案例
- Freemarker入门案例
- day2-180315-springboot经典入门案例
- Hadoop入门案例(三)全排序之自定义分区 数字排序
- Freemarker入门案例
- Spring 基于XML配置的IOC入门案例
- Golang入门教程(九)复合数据类型使用案例二
- Glide入门教程——17.Glide Module 案例: 接受自签名HTTPS证书
- nginx入门级配置案例
- 02-【入门案例篇】The first Demo,求两点间的最短路线
- SpringMvc之入门案例-yellowcong
- Java Web入门案例详细步骤(内附Java环境搭建:jdk1.8+tomcat8+MyEclipse)