您的位置:首页 > 运维架构 > Shell

Linux Shell编程实战---统计特定文件中单词的词频

2017-09-08 11:52 211 查看
方法1:使用sedShell>cat a1.txt 123a123,555456.333566。555!88,thisis a good boy.
Shell>cat a1.txt|sed 's/[[:space:]|[:punct:]]/\n/g'|sed '/^$/d'|sort|uniq -c|sort -n-k1 -r 2 555 1 this 1 is 1 good 1 boy 1 a123 1 a 1 88 1 566 1 456 1 333 1 123Shell>
sed 's/[[:space:]|[:punct:]]/\n/g'[]表示正则表达式集合,[:space:]代表空格。[:punct:]代表标点符号。[[:space:]|[:punct:]]代表匹配空格或者标点s/[[:space:]|[:punct:]]/\n/g代表把空格或标点替换成\n换行符
sed '/^$/d' 删除掉空行
方法2:使用awk#!/bin/bash
filename=$1
cat$filename|awk '{ #getline var; split($0,a,/[[:space:]|[:punct:]]/); for(i in a) { word=a[i]; b[word]++; }} END{ printf("%-14s%s\n","Word","Count"); for(i in b) { printf("%-14s%d\n",i,b[i])|"sort-r -n -k2"; }
}'运行结果[root@Test01awk]# cat a1.txt 123a123,555456.333566。555!88,thisis a good boy.
[root@Test01awk]# ./word_freq.sh a1.txt Word Count555 2this 1is 1good 1boy 1a123 1a 188 1566 1456 1333 1123 1 1[root@Test01awk]#
方法3:使用tr[root@Test01awk]# cat a1.txt 123a123,555456.333566i555!88,this is a good boy.
[root@Test01awk]# cat a1.txt |tr '[:space:]|[:punct:]' '\n'|tr -s '\n'|sort|uniq -c|sort -n-k1 -r 2 555 1 this 1 is 1 good 1 boy 1 a123 1 a 1 88 1 566i 1 456 1 333 1 123[root@Test01awk]#
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Linux