统计文本中每个单词的个数
2017-03-22 19:41
537 查看
/** * 统计文本每个单词的个数 * * @param text * 文本 * @param ignoreCase * 是否忽略大小写 * @return */ public static Map<String, Integer> countEachWorld(String text, boolean ignoreCase) { Matcher m = Pattern.compile("\\w+").matcher(text); String matcheStr = null; Map<String, Integer> map = new LinkedHashMap<>(); Integer count = 0; while (m.find()) { matcheStr = m.group(); matcheStr = ignoreCase ? matcheStr.toLowerCase() : matcheStr; count = map.get(matcheStr); map.put(matcheStr, count != null ? count + 1 : 1); } return map; }
匹配的文本:
Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
结果:
1、忽略大小写
countEachWorld(text, true);
{java=3, provides=1, the=2, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, perl=1, programming=1, language=1, and=2, easy=1, learn=1, a=4, expression=1, is=1, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, they=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}
2、对大小写敏感
countEachWorld(text, false);
{Java=2, provides=1, the=2, java=1, util=1, regex=1, package=1, for=1, pattern=2, matching=1, with=1, regular=3, expressions=2, are=1, very=2, similar=1, to=3, Perl=1, programming=1, language=1, and=2, easy=1, learn=1, A=1, expression=1, is=1, a=3, special=1, sequence=1, of=2, characters=1, that=1, helps=1, you=1, match=1, or=3, find=1, other=1, strings=2, sets=1, using=1, specialized=1, syntax=1, held=1, in=1, They=1, can=1, be=1, used=1, search=1, edit=1, manipulate=1, text=1, data=1}
相关文章推荐
- 统计文本英文单词总个数,并列出每个单词的个数
- 实验四-1 统计文本单词的个数
- 【ThinkingInC++】4、统计txt文本中单词的个数
- C#统计文本单词的个数
- 第四章实验:统计文本单词的个数
- 统计文本中每个单词出现的频率(附C++完整程序)
- 统计文本中每个单词的序列 和 出现次数
- Python 统计文本中单词的个数
- c++ 统计英文文本中每个单词的词频并且按照词频对每行排序
- c++ 统计英文文本中每个单词的词频并且按照词频对每行排序
- go语言之map练习(二):编写一个程序wordfreq程序,统计输入文本中每个单词出现的频率(次数)
- 使用hadoop统计多个文本中每个单词数目
- 统计文本单词的个数
- 用hash表统计文本文件中每个单词出现的频率
- C语言用二叉树统计一个源文件中每个单词的次数
- PAT循环-06. 统计一行文本的单词个数(15)
- 统计字符串中单词的个数
- 用array_count_values统计一篇英文文档中每个单词的出现次数,结果用表格展示出来
- C++文件中读单词并统计输出改单词及其个数