您的位置:首页 > 其它

统计文本中每个单词的个数

2017-03-22 19:41 537 查看
/**
* 统计文本每个单词的个数
*
* @param text
*            文本
* @param ignoreCase
*            是否忽略大小写
* @return
*/
public static Map<String, Integer> countEachWorld(String text,
boolean ignoreCase) {
Matcher m = Pattern.compile("\\w+").matcher(text);
String matcheStr = null;
Map<String, Integer> map = new LinkedHashMap<>();
Integer count = 0;
while (m.find()) {
matcheStr = m.group();
matcheStr = ignoreCase ? matcheStr.toLowerCase() : matcheStr;
count = map.get(matcheStr);
map.put(matcheStr, count != null ? count + 1 : 1);
}
return map;
}

匹配的文本:

Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.


结果:

1、忽略大小写

countEachWorld(text, true);


{java=3, provides=1, the=2, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, perl=1, programming=1, language=1, and=2, easy=1, learn=1, a=4, expression=1, is=1, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, they=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}


2、对大小写敏感

countEachWorld(text, false);


{Java=2, provides=1, the=2, java=1, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, Perl=1, programming=1, language=1, and=2, easy=1, learn=1, A=1, expression=1, is=1, a=3, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, They=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: