您的位置:首页 > 编程语言 > Python开发

Python英语单词查询

2011-01-11 12:41 309 查看
要考英语了, 单词一大堆, 索性就用Python自动到网上找单词的中文意思了~.
目前只是盲目的摘下来而已. 写的过程中,终于知道编码问题是多么的严重了. 下次一定要用chardet这个库了,方便快捷...


# http://dict.youdao.com/search?q=hello&tab=chn&keyfrom=dict.result can' use it , be-


# cause it is python's bug


import urllib


from BeautifulSoup import BeautifulSoup


import sys


global file



def getWebContent(url, word):


html = urllib.urlopen(url).read()


#html = html.decode("gb2312","ignore").encode("utf-8","ignore")


html = unicode(html,"gb2312","ignore").encode("utf-8","ignore")


soup = BeautifulSoup(html)




#filter 1


data = str(soup.find("div", {"class":"explain"}))


#strContent = data.renderContents()+"\n" # default the string s is coded with ASCII


# but the original is UTF-8, because the


# beautifulSoup use it...


#fileter 2


soup = BeautifulSoup(data)


# beautifulsoup generator http://www.crummy.com/software/BeautifulSoup/documentation.zh.html#Generators

outtext=''.join([element for element in soup.recursiveChildGenerator() if isinstance(element,unicode)])


#make some rendering


for item in range(1,10):


outtext=outtext.replace(str(item),"\n%s" % str(item))


outtext=outtext.replace(" ","\n")


outtext =word +":\n" +outtext +"\n"


file.write(outtext)


print outtext.decode("utf-8").encode("gbk")




def word_FromFile():


file = open("F:/Whu/EnghlishWords.txt","r")


for word in file.readlines():


print isinstance(word, unicode)


print word.decode("utf-8")



#must be carefully!!!


#because we use the utf-8 to store the Chinese words in notepad


#it will add another 3 words to mark


# if file[:3] == codes.BOM_UTF8;


# data = data[3:]


# print data.decode("utf-8")




url = "http://dict.baidu.com/s?wd=%s" % word


getWebContent(url, word)


if __name__ == '__main__':


reload(sys)


sys.setdefaultencoding('utf-8')


file = open("F:/Whu/EnghlishWords_translate.txt",'w')


word_FromFile()


file.flush()


file.close()



本文出自 “咖啡时间” 博客,请务必保留此出处http://tuoxie174.blog.51cto.com/1446064/476486
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: