Python 爬虫 抓取豆瓣读书TOP250
2017-10-30 17:23
417 查看
# -*- coding:utf-8 -*- # author: yukun import requests from bs4 import BeautifulSoup # 发出请求获得HTML源码的函数 def get_html(url): # 伪装成浏览器访问 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} resp = requests.get(url, headers=headers).text return resp # 解析页面,获得数据信息 def html_parse(): # 调用函数,for循环迭代出所有页面 for url in all_page(): # BeautifulSoup的解析 soup = BeautifulSoup(get_html(url), 'lxml') # 书名 alldiv = soup.find_all('div', class_='pl2') names = [a.find('a')['title'] for a in alldiv] # 作者 allp = soup.find_all('p', class_='pl') authors = [p.get_text() for p in allp] # 评分 starspan = soup.find_all('span', class_='rating_nums') scores = [s.get_text() for s in starspan] # 简介 sumspan = soup.find_all('span', class_='inq') sums = [i.get_text() for i in sumspan] for name, author, score, sum in zip(names, authors, scores, sums): name = '书名:' + str(name) + '\n' author = '作者:' + str(author) + '\n' score = '评分:' + str(score) + '\n' sum = '简介:' + str(sum) + '\n' data = name + author + score + sum # 保存数据 f.writelines(data + '=======================' + '\n') # 获得所有页面的函数 def all_page(): base_url = 'https://book.douban.com/top250?start=' urllist = [] # 从0到225,间隔25的数组 for page in range(0, 250, 25): allurl = base_url + str(page) urllist.append(allurl) return urllist # 文件名 filename = '豆瓣图书Top250.txt' # 保存文件操作 f = open(filename, 'w', encoding='utf-8') # 调用函数 html_parse() f.close() print('保存成功。')
相关文章推荐
- 实践Python的爬虫框架Scrapy来抓取豆瓣电影TOP250
- Python爬虫----抓取豆瓣电影Top250
- 实践Python的爬虫框架Scrapy来抓取豆瓣电影TOP250
- Python3之爬虫爬取豆瓣读书Top250
- python 爬虫 保存豆瓣TOP250电影海报及修改名称
- [python爬虫入门]爬取豆瓣电影排行榜top250
- Python爬虫豆瓣电影top250
- python爬虫 Scrapy2-- 爬取豆瓣电影TOP250
- [Python爬虫]1.豆瓣电影Top250
- [Python爬虫]2.豆瓣图书Top250
- Python爬虫初学(1)豆瓣电影top250评论数
- (7)Python爬虫——爬取豆瓣电影Top250
- [Python/爬虫]利用xpath爬取豆瓣电影top250
- [python爬虫] BeautifulSoup和Selenium对比爬取豆瓣Top250电影信息
- Python爬虫初学(2)豆瓣电影top250评论数
- Python爬虫(二)—— 再探豆瓣Top250
- python爬虫实现获取豆瓣图书的top250的信息-beautifulsoup实现
- python爬虫实战:抓取猫眼电影TOP100存放到MongoDB中
- 简单的python爬虫爬豆瓣图书TOP250
- python爬虫日志(10)多进程爬取豆瓣top250