您的位置:首页 > 产品设计 > UI/UE

Scrapy 在Mac OSX 10.10 上安装错误的解决。Failed building wheel for lxml

2015-05-27 20:08 836 查看

Scrapy 在Mac OSX 10.10 上安装错误的解决

Scrapy 是一个基于Python的爬虫框架。它简洁而跨平台,适合爬虫类软件的快速开发。

Scrapy的官方安装文档中给出的安装方法是使用pip进行安装

pip install Scrapy

但是在OSX 10.10中运行以上代码会出现lxml模块无法编译的问题。错误信息为

Failed building wheel for lxml


更详细的错误信息类似于

'CC' can't be find


这个错误是因为pip在安装Scrapy模块时依赖lxml模块,而pip的默认行为是下载源码进行编译。在MAC终端中又没有指定C编译器的环境变量。

如果没有在MAC上编译C代码的需求,我们可以直接安装lxml的二进制版本,步骤如下:

下载并安装Macport

Macport官网下载Yosemite版本的Macport并安装

安装二进制版本lxml, 在终端中运行如下命令

sudo port install py27-lxml

安装Scrapy

sudo pip install Scrapy

等到终端提示Done。安装完成。运行如下命令可以进行一个scrapy性能测试

scrapy bench

安装成功的输出结果应该类似于下面这样:

White-Knight:~ zhengcai$ scrapy bench

2015-05-27 19:54:07+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)

2015-05-27 19:54:07+0800 [scrapy] INFO: Optional features available: ssl, http11

2015-05-27 19:54:07+0800 [scrapy] INFO: Overridden settings: {‘CLOSESPIDER_TIMEOUT’: 10, ‘LOG_LEVEL’: ‘INFO’, ‘LOGSTATS_INTERVAL’: 1}

2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState

2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats

2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware

2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled item pipelines:

2015-05-27 19:54:07+0800 [follow] INFO: Spider opened

2015-05-27 19:54:07+0800 [follow] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:09+0800 [follow] INFO: Crawled 106 pages (at 6360 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:09+0800 [follow] INFO: Crawled 195 pages (at 5340 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:11+0800 [follow] INFO: Crawled 298 pages (at 6180 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:11+0800 [follow] INFO: Crawled 394 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:12+0800 [follow] INFO: Crawled 490 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:13+0800 [follow] INFO: Crawled 586 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:14+0800 [follow] INFO: Crawled 674 pages (at 5280 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:16+0800 [follow] INFO: Crawled 770 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:17+0800 [follow] INFO: Crawled 850 pages (at 4800 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:18+0800 [follow] INFO: Crawled 939 pages (at 5340 pages/min), scraped 0 items (at 0 items/min)

2015-05-27 19:54:18+0800 [follow] INFO: Closing spider (closespider_timeout)

2015-05-27 19:54:18+0800 [follow] INFO: Dumping Scrapy stats:

{‘downloader/request_bytes’: 345797,

‘downloader/request_count’: 954,

‘downloader/request_method_count/GET’: 954,

‘downloader/response_bytes’: 1905484,

‘downloader/response_count’: 954,

‘downloader/response_status_count/200’: 954,

‘dupefilter/filtered’: 1642,

‘finish_reason’: ‘closespider_timeout’,

‘finish_time’: datetime.datetime(2015, 5, 27, 11, 54, 18, 253375),

‘log_count/INFO’: 17,

‘request_depth_max’: 49,

‘response_received_count’: 954,

‘scheduler/dequeued’: 954,

‘scheduler/dequeued/memory’: 954,

‘scheduler/enqueued’: 17437,

‘scheduler/enqueued/memory’: 17437,

‘start_time’: datetime.datetime(2015, 5, 27, 11, 54, 7, 972465)}

2015-05-27 19:54:18+0800 [follow] INFO: Spider closed (closespider_timeout)

Scrapy 安装成功后,就可以进行爬虫的编写啦~ Happy Your Crawling!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐