您的位置:首页 > 编程语言 > Python开发

[Python爬虫]2.豆瓣图书Top250

2017-06-24 21:34 931 查看
# 豆瓣图书Top250

import requests
from bs4 import BeautifulSoup

for page in range(10):
url = 'https://book.douban.com/top250?start={}'.format(page*25)
r = requests.get(url).text
bsObj = BeautifulSoup(r,'html.parser')
td_tags = bsObj.find_all('td',{'valign':'top','width':None})
#print(td_tags)
for td_tag in td_tags:
try:
name = td_tag.find('a').get_text().strip('\n').replace('\n','').replace(' ','')
info = td_tag.find('p',{'class':'pl'}).get_text()
rating_nums = td_tag('div',{'class':'star clearfix'})[0].get_text().replace('\n','').replace(' ','')
jianjie = td_tag.find('span',{'class':'inq'}).get_text()
dd = name + '\n' + info + '\n' + rating_nums + '\n' + jianjie + '\n'
#print(dd)
with open('E:/豆瓣图书Top250.txt','a+',encoding='utf-8') as f:
f.write(dd + '\n')
except:
continue
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: