您的位置:首页 > 编程语言 > Python开发

python3爬虫-urllib+BeautifulSoup

2018-02-05 22:49 471 查看

urllib

在Python2版本中,有urllib和urlib2两个库可以用来实现request的发送。而在Python3中,已经不存在urllib2这个库了,统一为urllib。Python3 urllib库包括了四个模块。

urllib.request for opening and reading URLs

urllib.error containing the exceptions raised by urllib.request

urllib.parse for parsing URLs

urllib.robotparser for parsing robots.txt files

import urllib.request
from bs4 import BeautifulSoup

response = urllib.request.urlopen("http://www.biqukan.com/1_1094/")
html = response.read().decode("gbk")
div_bf = BeautifulSoup(html)
div = div_bf.find_all('div', class_ = 'listmain')
a_bf = BeautifulSoup(str(div[0]))
a = a_bf.find_all('a')
for each in a:
print(each.string, each.get('href'))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  urllib BeautifulSoup