Lubuntu14.04(Ubuntu)安装爬虫框架Scrapy
2014-10-20 20:22
537 查看
Scrapy,Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结 Scrapy Pthyon爬虫框架 logo[1]构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。Scrapy吸引人的地方在于它是一个框架,任何人都可以根据需求方便的修改。它也提供了多种类型爬虫的基类,如BaseSpider、sitemap爬虫等,最新版本又提供了web2.0爬虫的支持。
准备工作
Python 2.5, 2.6, 2.7 (3.x is not yet supported)(一般Linux都会默认安装了Python2.7)
Twisted 2.5.0, 8.0 or above (Windows users: you’ll need to install Zope.Interface and maybe pywin32 because of this Twisted bug)
w3lib
lxml or libxml2 (if using libxml2, version 2.6.28 or above is highly recommended)
simplejson (not required if using Python 2.6 or above)
python-dev(很重要,否则在安装pyopenssl时会提示找不到Python.h)
pyopenssl (for HTTPS support. Optional, but highly recommended)
---------------------------------------------
Twisted安装过程
sudo apt-get install python-twisted python-libxml2 python-simplejson
安装完成后进入python,测试Twisted是否安装成功
python-dev安装
apt-get install python-dev
pyOpenSSL安装
wget http://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz#md5=767bca18a71178ca353dff9e10941929
tar -zxvf pyOpenSSL-0.13.tar.gz
cd pyOpenSSL-0.13
sudo python setup.py install
pycrypto安装
wget http://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz#md5=783e45d4a1a309e03ab378b00f97b291
tar -zxvf pycrypto-2.5.tar.gz
cd pycrypto-2.5
sudo python setup.py install
测试是否安装成功
$python
>>> import Crypto
>>> import twisted.conch.ssh.transport
>>> print Crypto.PublicKey.RSA
<module 'Crypto.PublicKey.RSA' from '/usr/python/lib/python2.5/site-packages/Crypto/PublicKey/RSA.pyc'>
>>> import OpenSSL
>>> import twisted.internet.ssl
>>> twisted.internet.ssl
<module 'twisted.internet.ssl' from '/usr/python/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/ssl.pyc'>
如果出现类似提示,说明pyOpenSSL模块已经安装成功了,否则,请检查上面的安装过程(OpenSSL需要pycrypto)。
w3lib安装
首先安装python setuptool
sudo python-setuptool
然后
sudo easy_install -U w3lib
Scrapy
wget http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.3.tar.gz#md5=59f1225f7692f28fa0f78db3d34b3850
tar -zxvf Scrapy-0.14.3.tar.gz
cd Scrapy-0.14.3
sudo python setup.py install
Scrapy安装验证
经过上面的安装和配置过程,已经完成了Scrapy的安装,我们可以通过如下命令行来验证一下:
$ scrapy
Scrapy 0.14.3 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
Use "scrapy <command> -h" to see more info about a command
至此Linux下Scrapy已安装成功了
开始学习爬虫了
在windows下安装一直不成功,郁闷啊,每次安装pyOpenSSL时都编译失败了,说缺少openssl/aes.h,网上找了很多方法都不行,安装OpenSSL也一直编译不成功,如果有人遇到同样的问题,希望能一起交流下
准备工作
Python 2.5, 2.6, 2.7 (3.x is not yet supported)(一般Linux都会默认安装了Python2.7)
Twisted 2.5.0, 8.0 or above (Windows users: you’ll need to install Zope.Interface and maybe pywin32 because of this Twisted bug)
w3lib
lxml or libxml2 (if using libxml2, version 2.6.28 or above is highly recommended)
simplejson (not required if using Python 2.6 or above)
python-dev(很重要,否则在安装pyopenssl时会提示找不到Python.h)
pyopenssl (for HTTPS support. Optional, but highly recommended)
---------------------------------------------
Twisted安装过程
sudo apt-get install python-twisted python-libxml2 python-simplejson
安装完成后进入python,测试Twisted是否安装成功
python-dev安装
apt-get install python-dev
pyOpenSSL安装
wget http://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz#md5=767bca18a71178ca353dff9e10941929
tar -zxvf pyOpenSSL-0.13.tar.gz
cd pyOpenSSL-0.13
sudo python setup.py install
pycrypto安装
wget http://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz#md5=783e45d4a1a309e03ab378b00f97b291
tar -zxvf pycrypto-2.5.tar.gz
cd pycrypto-2.5
sudo python setup.py install
测试是否安装成功
$python
>>> import Crypto
>>> import twisted.conch.ssh.transport
>>> print Crypto.PublicKey.RSA
<module 'Crypto.PublicKey.RSA' from '/usr/python/lib/python2.5/site-packages/Crypto/PublicKey/RSA.pyc'>
>>> import OpenSSL
>>> import twisted.internet.ssl
>>> twisted.internet.ssl
<module 'twisted.internet.ssl' from '/usr/python/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/ssl.pyc'>
如果出现类似提示,说明pyOpenSSL模块已经安装成功了,否则,请检查上面的安装过程(OpenSSL需要pycrypto)。
w3lib安装
首先安装python setuptool
sudo python-setuptool
然后
sudo easy_install -U w3lib
Scrapy
wget http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.3.tar.gz#md5=59f1225f7692f28fa0f78db3d34b3850
tar -zxvf Scrapy-0.14.3.tar.gz
cd Scrapy-0.14.3
sudo python setup.py install
Scrapy安装验证
经过上面的安装和配置过程,已经完成了Scrapy的安装,我们可以通过如下命令行来验证一下:
$ scrapy
Scrapy 0.14.3 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
Use "scrapy <command> -h" to see more info about a command
至此Linux下Scrapy已安装成功了
开始学习爬虫了
在windows下安装一直不成功,郁闷啊,每次安装pyOpenSSL时都编译失败了,说缺少openssl/aes.h,网上找了很多方法都不行,安装OpenSSL也一直编译不成功,如果有人遇到同样的问题,希望能一起交流下
相关文章推荐
- Ubuntu14.04中安装Scrapy爬虫框架
- ubuntu14.04安装python爬虫框架Scrapy
- Ubuntu 12.04 安装Scrapy爬虫框架
- Python网络爬虫3 ---- ubuntu下安装爬虫框架scrapy
- ubuntu14.04下安装爬虫工具scrapy
- Ubuntu pip 安装网络爬虫框架 scrapy 出现的错误
- Ubuntu16 python scrapy爬虫框架安装
- Ubuntu 12.04 安装Scrapy爬虫框架
- ubuntu下安装scrapy爬虫框架
- Ubuntu16.04安装爬虫框架scrapy
- Ubuntu 12.04 安装Scrapy爬虫框架
- 如何在Ubuntu 14.04 LTS安装网络爬虫工具:Scrapy
- ubuntu14.04安装Scrapy爬虫
- Python爬虫--Ubuntu14.04 上Scrapy的安装和错误处理
- 安装Twisted、Scrapy爬虫框架
- Linux 安装python爬虫框架 scrapy
- Kali 安装Scrapy爬虫框架
- 【网络爬虫】【python】网络爬虫(四):scrapy爬虫框架(架构、win/linux安装、文件结构)
- scrapy爬虫之sublime Text 2在ubuntu下的安装
- Kali 安装scrapy爬虫框架