Python英语单词查询
2011-01-11 12:41
309 查看
要考英语了, 单词一大堆, 索性就用Python自动到网上找单词的中文意思了~.
目前只是盲目的摘下来而已. 写的过程中,终于知道编码问题是多么的严重了. 下次一定要用chardet这个库了,方便快捷...
# http://dict.youdao.com/search?q=hello&tab=chn&keyfrom=dict.result can' use it , be-
# cause it is python's bug
import urllib
from BeautifulSoup import BeautifulSoup
import sys
global file
def getWebContent(url, word):
html = urllib.urlopen(url).read()
#html = html.decode("gb2312","ignore").encode("utf-8","ignore")
html = unicode(html,"gb2312","ignore").encode("utf-8","ignore")
soup = BeautifulSoup(html)
#filter 1
data = str(soup.find("div", {"class":"explain"}))
#strContent = data.renderContents()+"\n" # default the string s is coded with ASCII
# but the original is UTF-8, because the
# beautifulSoup use it...
#fileter 2
soup = BeautifulSoup(data)
# beautifulsoup generator http://www.crummy.com/software/BeautifulSoup/documentation.zh.html#Generators
outtext=''.join([element for element in soup.recursiveChildGenerator() if isinstance(element,unicode)])
#make some rendering
for item in range(1,10):
outtext=outtext.replace(str(item),"\n%s" % str(item))
outtext=outtext.replace(" ","\n")
outtext =word +":\n" +outtext +"\n"
file.write(outtext)
print outtext.decode("utf-8").encode("gbk")
def word_FromFile():
file = open("F:/Whu/EnghlishWords.txt","r")
for word in file.readlines():
print isinstance(word, unicode)
print word.decode("utf-8")
#must be carefully!!!
#because we use the utf-8 to store the Chinese words in notepad
#it will add another 3 words to mark
# if file[:3] == codes.BOM_UTF8;
# data = data[3:]
# print data.decode("utf-8")
url = "http://dict.baidu.com/s?wd=%s" % word
getWebContent(url, word)
if __name__ == '__main__':
reload(sys)
sys.setdefaultencoding('utf-8')
file = open("F:/Whu/EnghlishWords_translate.txt",'w')
word_FromFile()
file.flush()
file.close()
本文出自 “咖啡时间” 博客,请务必保留此出处http://tuoxie174.blog.51cto.com/1446064/476486
目前只是盲目的摘下来而已. 写的过程中,终于知道编码问题是多么的严重了. 下次一定要用chardet这个库了,方便快捷...
# http://dict.youdao.com/search?q=hello&tab=chn&keyfrom=dict.result can' use it , be-
# cause it is python's bug
import urllib
from BeautifulSoup import BeautifulSoup
import sys
global file
def getWebContent(url, word):
html = urllib.urlopen(url).read()
#html = html.decode("gb2312","ignore").encode("utf-8","ignore")
html = unicode(html,"gb2312","ignore").encode("utf-8","ignore")
soup = BeautifulSoup(html)
#filter 1
data = str(soup.find("div", {"class":"explain"}))
#strContent = data.renderContents()+"\n" # default the string s is coded with ASCII
# but the original is UTF-8, because the
# beautifulSoup use it...
#fileter 2
soup = BeautifulSoup(data)
# beautifulsoup generator http://www.crummy.com/software/BeautifulSoup/documentation.zh.html#Generators
outtext=''.join([element for element in soup.recursiveChildGenerator() if isinstance(element,unicode)])
#make some rendering
for item in range(1,10):
outtext=outtext.replace(str(item),"\n%s" % str(item))
outtext=outtext.replace(" ","\n")
outtext =word +":\n" +outtext +"\n"
file.write(outtext)
print outtext.decode("utf-8").encode("gbk")
def word_FromFile():
file = open("F:/Whu/EnghlishWords.txt","r")
for word in file.readlines():
print isinstance(word, unicode)
print word.decode("utf-8")
#must be carefully!!!
#because we use the utf-8 to store the Chinese words in notepad
#it will add another 3 words to mark
# if file[:3] == codes.BOM_UTF8;
# data = data[3:]
# print data.decode("utf-8")
url = "http://dict.baidu.com/s?wd=%s" % word
getWebContent(url, word)
if __name__ == '__main__':
reload(sys)
sys.setdefaultencoding('utf-8')
file = open("F:/Whu/EnghlishWords_translate.txt",'w')
word_FromFile()
file.flush()
file.close()
本文出自 “咖啡时间” 博客,请务必保留此出处http://tuoxie174.blog.51cto.com/1446064/476486
相关文章推荐
- Python实现单词查询&文件查找
- java英语单词查询,输入一个单词根据字典查询单词意思
- 一些专业的英语单词的积累及查询
- python:倒排索引,单词查询
- 利用在线词典批量查询英语单词
- python里使用capwords()函数来把字符里每一个英语单词首字母变大写
- JAVA英语单词数组实现的查询--英语单词的翻译实现
- 在线英语单词词根查询!
- Python 对英语单词单数变复数
- python:统计历年英语四六级试卷单词词频
- 在vim中使用sdcv查询英语单词
- 自汉语而来的英语单词
- 英语单词拼写游戏开发纪录
- [LintCode 107] 单词切分(Python)
- 5.15 英语单词小记
- [Elasticsearch] 邻近匹配 (三) - 性能,关联单词查询以及Shingles
- python实现统计单词出现的位置和数量
- 聊聊程序员如何学习英语单词:写了一个记单词的小程序
- python解析网页查询IP所属地
- 20140319十个英语单词