您的位置:首页 > 编程语言 > Python开发

Python3之利用requests和BeautifulSoup抓取部分信息

2017-09-03 14:49 716 查看
import requests
import os
from bs4 import BeautifulSoup

imgPath = r'D:\Users\Quincy_C\PycharmProjects\S6\bs模块\汽车图片'
response = requests.get(url='http://www.autohome.com.cn/news/')
response.encoding = response.apparent_encoding
bs = BeautifulSoup(response.text, features='html.parser')
bs_obj = bs.find(id="auto-channel-lazyload-article")
li_list = bs_obj.find_all('li')
for i in li_list:
a = i.find('a')
if a:
txt = a.find('h3').text
print(a.find('img').attrs.get('src'))
# requests.get('url').content返回的是字节
imgContent = requests.get(a.find('img').attrs.get('src')).content
import uuid
if not os.path.isdir(imgPath):
os.mkdir(imgPath)
else:
imgUrl = str(uuid.uuid4()) + '.jpg'
with open(os.path.join(imgPath, imgUrl), 'wb') as f:
f.write(imgContent)


如果要讲图片存放在指定的文件夹,可以这样:

with open(os.path.join(imgPath, imgUrl), 'wb') as f:
f.write(imgContent)


或者:

os.chdir(imgPath)


都可以的,之前搞过,忘记了。记录一下!

总结一下:

requests

requests.get(‘url’,headers=headers)发送一个请求

response.encoding = response.apparent_encoding指定编码

requests.get(‘url’).text获取网页内容

requests.get(‘url’).content获取图片的字节

BeautifulSoup

bs = BeautifulSoup(requests.get(‘url’).text,features=’html.parser’)

bs.find(‘div’,id=”)

bs.find_all(‘div’,id=”)

bs.find_all(‘div’,class=”)

a.attrs获取一个字典

a.ttrs.get(”)获取具体的内容
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python url c语言 图片 os