您的位置:首页 > 运维架构 > Shell

shell中按列值统计行数及去除重复值

2014-03-28 14:32 441 查看
测试文件在系统文件中的shell文件夹中的test.log文件,内容为:

abcdefg higklmn 12345 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 12345 fuck!
afdsaff adfgaga 63542 fdasg
sdfasfd sdafadf 12345 asdga
jfaldjf sdfasfs 63542 sdfad
abcddfg higdfmn 12345 fuck!
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
abcdefg higklmn 67890 fuck!
afdsaff adfgaga 63542 fdasg
sdfasfd sdafadf 67890 asdga
jfaldjf sdfasfs 67890 sdfad
abcddfg higdfmn 63542 fuck!
afdscff adfgada 67890 fdasg
sdfagfd sdavadf 67890 asdga
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
afdscff adfgada 12345 fdasg
sdfagfd sdavadf 12345 asdga


1:首先查看日志文件:

[root@master ~]# cat /shell/test.log | sort -n
abcddfg higdfmn 12345 fuck!
abcddfg higdfmn 63542 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 67890 fuck!
afdsaff adfgaga 63542 fdasg
afdsaff adfgaga 63542 fdasg
afdscff adfgada 12345 fdasg
afdscff adfgada 67890 fdasg
jfaldjf sdfasfs 63542 sdfad
jfaldjf sdfasfs 67890 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
jfalsjf sdf4sfs 67890 sdfad
sdfagfd sdavadf 12345 asdga
sdfagfd sdavadf 67890 asdga
sdfasfd sdafadf 12345 asdga
sdfasfd sdafadf 67890 asdga


2:按照第三列的值的不同,统计各个值出现的行数,结果如下:

      使用awk命令:awk '{a[$3]++}END{for i in a}print i,a[i]}' /shell/test.log
[root@master ~]# awk '{a[$3]++}END{for(i in a)print i,a[i]}' /shell/test.log
63542 4
67890 9
12345 11

3:查看某列中有几种不同的数值,输出:

      awk '{if(!a[$3]++) print $3}' /shell/test.log

[root@master ~]# awk '{if(!a[$3]++) print $3}' /shell/test.log
12345
63542
67890


4:查看某列中不同值的个数,并输出第一次在此列中出现的值的行:

      awk ‘{if(!($3 in a)){a[$3];print}}’ /shell/test.log

[root@master ~]# awk '{if(!($3 in a)){a[$3];print}}' /shell/test.log
abcdefg higklmn 12345 fuck!
afdsaff adfgaga 63542 fdasg
abcdefg higklmn 67890 fuck!


5:uniq命令是去掉重复行,不过只能去掉相邻的重复行。    

[root@master ~]# uniq /shell/test.log | wc -l
16


[root@master ~]# uniq /shell/test.log | sort -n
abcddfg higdfmn 12345 fuck!
abcddfg higdfmn 63542 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 67890 fuck!
afdsaff adfgaga 63542 fdasg
afdsaff adfgaga 63542 fdasg
afdscff adfgada 12345 fdasg
afdscff adfgada 67890 fdasg
jfaldjf sdfasfs 63542 sdfad
jfaldjf sdfasfs 67890 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 67890 sdfad
sdfagfd sdavadf 12345 asdga
sdfagfd sdavadf 67890 asdga
sdfasfd sdafadf 12345 asdga
sdfasfd sdafadf 67890 asdga


      里面有重复值没有被完全去除

6:awk脚本中可以完全去掉重复行:

[root@master ~]# awk '{if(!(a[$0]++)){a[$0];print}}' /shell/test.log | wc -l
15
[root@master ~]# awk '{if(!(a[$0]++)){a[$0];print}}' /shell/test.log | sort -n
abcddfg higdfmn 12345 fuck!
abcddfg higdfmn 63542 fuck!
abcdefg higklmn 12345 fuck!
abcdefg higklmn 67890 fuck!
afdsaff adfgaga 63542 fdasg
afdscff adfgada 12345 fdasg
afdscff adfgada 67890 fdasg
jfaldjf sdfasfs 63542 sdfad
jfaldjf sdfasfs 67890 sdfad
jfalsjf sdf4sfs 12345 sdfad
jfalsjf sdf4sfs 67890 sdfad
sdfagfd sdavadf 12345 asdga
sdfagfd sdavadf 67890 asdga
sdfasfd sdafadf 12345 asdga
sdfasfd sdafadf 67890 asdga
      完全去除重复值

      通过结果可以看出,uniq命令得到16行,awk命令得到15行,上面两行重复值在此处已经去除了。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  linux shell awk命令