URLError: <urlopen error [Errno 10051] >
2016-09-06 14:58
525 查看
在写一个简单小爬虫时,命令行执行时遇到下面这个错误:
百度之后,发现原因:
That particular error message is being generated by
解决方法:
在settings.py文件中添加
DOWNLOAD_HANDLERS = {'s3': None,}
另贴出一个关于海投网的超简单爬虫:
items.py文件如下:
pipelines.py文件如下:
Traceback (most recent call last): File "E:\Anaconda2\lib\site-packages\boto\utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "E:\Anaconda2\lib\urllib2.py", line 431, in open response = self._open(req, data) File "E:\Anaconda2\lib\urllib2.py", line 449, in _open '_open', req) File "E:\Anaconda2\lib\urllib2.py", line 409, in _call_chain result = func(*args) File "E:\Anaconda2\lib\urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "E:\Anaconda2\lib\urllib2.py", line 1197, in do_open raise URLError(err) <span style="color:#ff0000;">URLError: <urlopen error [Errno 10051] ></span>
百度之后,发现原因:
That particular error message is being generated by
boto(boto 2.38.0 py27_0), which is used to connect to Amazon S3. Scrapy doesn't have this enabled by default。
解决方法:
在settings.py文件中添加
DOWNLOAD_HANDLERS = {'s3': None,}
另贴出一个关于海投网的超简单爬虫:
items.py文件如下:
import scrapy from scrapy.item import Item,Field class XuanjianghuiItem(Item): # define the fields for your item here like: title = Field() holdTime = Field()settings.py文件如下:
BOT_NAME = 'XuanJiangHui' SPIDER_MODULES = ['XuanJiangHui.spiders'] NEWSPIDER_MODULE = 'XuanJiangHui.spiders' DOWNLOAD_HANDLERS = {'s3': None,} ITEM_PIPELINES = { 'XuanJiangHui.pipelines.XuanjianghuiPipeline': 300, }
pipelines.py文件如下:
import codecs class XuanjianghuiPipeline(object): def __init__(self): self.file = codecs.open('F://XuanJiangHui.txt','wb',encoding='utf-8') def process_item(self,item,spider): title = item['title'].strip() holdTime = item['holdTime'] self.file.write(title+'\n'+holdTime) self.file.write('\r\n') self.file.write('\r\n') return itemXuanJiangHui.py文件如下:
# -*- coding:utf-8 -*- from scrapy.spider import Spider from scrapy.http import Request from scrapy.selector import HtmlXPathSelector from XuanJiangHui.items import XuanjianghuiItem class XuanjianghuiSpider(Spider): name = "XuanJiangHui" download_deplay = 1 start_urls = [ "http://xjh.haitou.cc/wh/uni-1", "http://xjh.haitou.cc/bj/uni-13", "http://xjh.haitou.cc/cd/uni-147", "http://xjh.haitou.cc/hf/uni-47", "http://xjh.haitou.cc/gz/uni-32", "http://xjh.haitou.cc/gz/uni-34", "http://xjh.haitou.cc/gz/uni-36" ] header={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'} def parse(self,response): sel = HtmlXPathSelector(response) item = XuanjianghuiItem() for tr in sel.xpath('//div[@id="w0"]//tbody/tr'): title = tr.xpath('./td[@class="cxxt-title"]/a/@title') holdTime = tr.xpath('./td[@class="text-left cxxt-holdtime"]/span[@class="hold-ymd"]/text()') item['title'] = title.extract()[0] item['holdTime'] = holdTime.extract()[0] yield item urls = sel.xpath('//*[@id="w0"]/ul/li[@class="next"]/a/@href').extract() for url in urls: url = "http://xjh.haitou.cc"+url yield Request(url,headers=self.header,callback=self.parse)
相关文章推荐
- scrapy [boto] ERROR: Caught exception reading instance data URLError: <urlopen error [Errno 10051] >
- URLError: <urlopen error [Errno 10061]<urllib2 报错问题>
- python运行报错:urllib2.URLError: <urlopen error [Errno 10061] >
- urllib2.URLError: <urlopen error [Errno 10061] >
- urllib2.URLError: <urlopen error [Errno 104] Connection reset by peer>
- urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
- appium genymotion URLError: <urlopen error [Errno 10061] >
- Errno 12] Timeout: <urlopen error timed out>
- urllib2.URLError: <urlopen error [Errno 104] Connection reset by peer>
- urllib2.URLError: <urlopen error [Errno 104] Connection reset by peer>
- 解决用虚拟机或者公司网络下载android源码,遇到DownloadError: android.googlesource.com: <urlopen error [Errno 110] Connection timed out>的问题
- urllib2.URLError: <urlopen error [Errno 104] Connection reset by peer>
- urllib.error.URLError: <urlopen error [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。>
- python URLError: <urlopen error [Errno 8] _ssl.c:510: EOF occurred in violation of protocol>
- Linux:proxy,,,<urlopen error [Errno 111] Connection refused>)
- 【Python】 URLError: <urlopen error timed out> 错误
- urllib2.URLError<urlopen error no host given>
- python3.5 URLError: <urlopen error [Errno 10061]报错解决方法
- python爬虫<urlopen error [Errno 10061] >