NLP01-python的wordcloud实现中文词云小例
2017-10-25 14:30
337 查看
上图是下面歌词生成的
《When You Are Old》 William Butler Yeats When you are old and grey and full of sleep, And nodding by the fire, take down this book, And slowly read, and dream of the soft look Your eyes had once, and of their shadows deep; How many loved your moments of glad grace, And loved your beauty with love false or true, But one man loved the pilgrim soul in you, And loved the sorrows of your changing face; And bending down beside the glowing bars, Murmur, a little sadly, how love fled And paced upon the mountains overhead And hid his face amid a crowd of stars.
摘要:只是wordcloud的安装与演示测试,可为入门者提供帮助。
1. 安装
构建词云的方法很多, 但是个人觉得python的wordcloud包功能最为强大,可以自定义图片.官网: https://amueller.github.io/word_cloud/
github: https://github.com/amueller/word_cloud
安装:pip install wordcloud
或 下载:http://www.lfd.uci.edu/~gohlke/pythonlibs/#wordcloud 然后安装。
2. 查看API
API中,WordCloud类是重要类。class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9,mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None,background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True) font_path : string Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path. [对于win7,这个得修改了,否则会乱码] width : int (default=400) Width of the canvas. 画布宽 height : int (default=200) Height of the canvas. 画布高 prefer_horizontal : float (default=0.90) The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.) mask : nd-array or None (default=None) scale : float (default=1) Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words. min_font_size : int (default=4) Smallest font size to use. Will stop when there is no more room in this size. 最小字号大小 font_step : int (default=1) Step size for the font. font_step > 1 might speed up computation but give a worse fit. max_words : number (default=200) The maximum number of words. 显示的最多中词数据上限 stopwords : set of strings or None The words that will be eliminated. If None, the build-in STOPWORDS list will be used. 停用词 background_color : color value (default=”black”) Background color for the word cloud image. 前景色 max_font_size : int or None (default=None) Maximum font size for the largest word. If None, height of the image is used. 词的最大大小; mode : string (default=”RGB”) Transparent background will be generated when mode is “RGBA” and background_color is None. relative_scaling : float (default=.5) Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. color_func : callable, default=None Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead. regexp : string or None (optional) Regular expression to split the input text into tokens in process_text. If None is specified,r"\w[\w']+" is used. collocations : bool, default=True Whether to include collocations (bigrams) of two words. colormap : string or matplotlib colormap, default=”viridis” Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified. normalize_plurals : bool, default=True Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’.
3.图片
图片名为:mask_png.png4.测试中文文档
题目:脚抽筋怎么办网址:http://health.china.com/html/jiankang/jijiuzhinan/richangjijiu/201603/26-328450.html
5.代码
# -*- coding: utf-8 -*- from os import path import jieba import matplotlib.pyplot as plt from scipy.misc import imread from wordcloud import WordCloud def doWordcloud(): comment_text = open('test.txt', 'r', encoding='UTF-8').read() cut_text = " ".join(jieba.cut(comment_text)) color_mask = imread("mask_png.png") cloud = WordCloud( # 设置字体,不指定就会出现乱码; # 在win7的路径:C:\Windows\Fonts进行查看 font_path="simsun.ttc", mask=color_mask, max_words=200, max_font_size=80, width=1000, height=1000 ) word_cloud = cloud.generate(cut_text) # 产生词云 # word_cloud.to_file("pic.jpg") # 保存图片 plt.imshow(word_cloud) plt.axis('off') plt.show()
说明:test.txt内容是《脚抽筋怎么办》的文章内容;
mask_png.png是上面那个小女孩的图片;
6.显示结果
【作者:happyprince ;http://blog.csdn.net/ld326/article/details/78341147】
相关文章推荐
- python实现中文字符繁体和简体中文转换
- Python版Appium实现中文输入
- Python小程序分享01——用Python实现账号登录与注册界面【EasyGui】
- 用Python实现基本排序算法01——冒泡排序
- python基于隐马尔可夫模型实现中文拼音输入
- Python实现中文数字转换为阿拉伯数字的方法示例
- Python中文分词实现方法(安装pymmseg)
- python实现中文转成拼音
- 中文分词的python实现-基于HMM算法
- python基于隐马尔可夫模型实现中文拼音输入
- python 中文字符串的处理实现代码
- python实现中文转换url编码的方法
- Python实现的json文件读取及中文乱码显示问题解决方法
- python实现中文字符繁体和简体中文转换
- Python OpenCV实现图片上输出中文
- python+soket实现UDP协议的客户/服务端中文聊天程序
- 中文分词的python实现-基于FMM算法
- python Django 1.7 中文入门 (官网) 01 开始
- [置顶] 【python 走进NLP】NLP WordEmbedding的概念和实现