您的位置:首页 > 其它

请教大佬,在用pycharm调试的时候如何进入callback调用的函数

2018-02-10 22:14 483 查看
import scrapy
import re
from scrapy.http import Request
from urllib import parse

class JobboleSpider(scrapy.Spider):
name = 'jobbole'
allowed_domains = ['blog.jobbole.com']
start_urls = ['http://python.jobbole.com/all-posts/']

def parse(self, response):
"""
1. 解析文章列表页中的文章url并且交给scrapy下载后完成解析
2. 获取下一页的url并交给scrapy下载,完成后交给parse
"""
# 获取当前页面所有文章的url并且交给scrapy进行下载
post_nodes = response.css("#archive .floated-thumb .post-thumb a")
for post_node in post_nodes:
image_url = post_node.css("img::attr(src)").extract_first("")
post_url = post_node.css("::attr(href)").extract_first("")
yield Request(url=parse.urljoin(response.url, post_url), meta={"front_image_url": image_url},
callback=self.parse_detail)

# 提取下一页url
next_page = response.css(".next.page-numbers::attr(href)").extract_first()
if next_page:
yield Request(url=parse.urljoin(response.url, next_page), callback=self.parse)

def parse_detail(self, response):
"""
1.解析具体页面的内容
2.打算下载
"""
# 标题,创建时间,点赞数
title = response.css(".entry-header > h1::text").extract_first()
create_data = response.css(".entry-meta-hide-on-mobile::text").extract_first().strip().replace('·', '').strip()
praise_num = response.css(".vote-post-up h10::text").extract_first()
# 点赞数
fav_num = response.css(".bookmark-btn::text").extract_first().strip()
fav_num_re = re.match(".*?(\d).*", fav_num)
if fav_num_re:
fav_num = fav_num_re.group(1)
else:
fav_num = 0
# 评论数
comment_num = response.css('a[href = "#article-comment"] span::text').extract()[0]
comment_num_re = re.match(".*?(\d).*", comment_num)
if comment_num_re:
comment_num = comment_num_re.group(1)
else:
comment_num = 0
content = response.css('div.entry').extract()[0]
# 标签
tag_list = response.css('.entry-meta-hide-on-mobile a::text').extract()
tag_list = [element for element in tag_list if not element.strip().endwith('评论')]
tags = ",".join(tag_list)
pass
小弟学习利用scrapy爬取jobbole所有文章的信息,想确认具体页面信息爬取是否正确,进行单步调试的时候从yield Request(url=parse.urljoin(response.url, post_url), meta={"front_image_url": image_url},
callback=self.parse_detail)字段跳转不到parse_detail函数中,在函数中也打了断点但时就是在Request字段反复for循环,进入不到parse_detail中,请问各位大佬有什么好的办法
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  求救贴
相关文章推荐