python爬虫入门之requests库详解1
2018-03-30 08:58
585 查看
****## python爬虫入门之requests库详解 ##****
简单说就是:resqusts.text返回的是Unicode型的数据。
resquests.content返回的是bytes型也就是二进制的数据。
如果你想取文本,可以通过resquests.text。
如果想取图片,文件,则可以通过resquests.content。
(resquests.json()返回的是json格式数据)
import requests
response = requests.get(url)
print(type(response))
print(response.status_code) #返回状态码
print(type(response.text))
print(response.text) #返回响应头
print(response.cookies) #返回cookies
基本get请求
基本写法:
import requests
response = requests.get(url)
print(response.text)
带参数的get 请求
import requests
data = {“name”:”ljy”,”age”:18}
response = requests.get(url,params=data)
print(response.text)
解析json
import requests
import json
response = resquests.get(url)
print(type(response.text))
print(response.json())
print(type(response.json()))
获取二进制数据(图片,视频)
import requests
response = requests.get(url)
print(type(response.text),type(response.content))
print(response.text)
print(response.content) #获取二进制数据流
with open(文件路径,”wb”) as f:
f.write(response.content)
f.close()
添加headers
import requests
headers = {‘User-Agent’:’[b]*[/b]’}
response = requests.get(url,headers=hesders)
print(response.text)
基于post请求(表单上传)
import requests
data = {“name”=”ljy”,age = 18}
response = requests.post(url,data=data)
print(response.text)
添加headers
import requests
headers = {‘User-Agent’:’*‘}
data = {“name”=”ljy”,age = 18}
response = requests.post(url,data=data,headeres=headers)
print(response.json())
响应
response属性
print(type(response.status_code),response.status_code)
返回状态码
print(type(response.headers),response.headers)
返回请求头
print(type(response.cookies),response.cookies)
返回cookies内容
print(type(response.url),response.url)
返回url
print(type(response.history),response.history)
返回访问历史记录
高级操作
文件上传
import requests
files = {‘file’:open(文件路径,”rb”)}
response = requests.post(url,files=files)
print(response.text)
获取cookies
cookies是字典格式
import requests
response = requests.get(url)
print(response.cookies)
for key,value in response.cookies.items():
print(key+’=’+value)
会话维持
import requests
s = requests.Session()
s.get(url_1)
response = s.get(url_2)
print(response.text)
证书验证
import requests
response = requests.get(url,verify=False)
print(response.status_code)
verify=False代表访问网站不需要证书,会出现警告,告诉你需要一个安全证书
import requests
from requests.packages import urllib3
urllib3.disable_warning()
response = requests.get(url,verify=False)
print(response.status_code)
这样设置可以取消warning的出现
手动设置证书
import requests
response = requests.get(url,cert=(‘/path/server.crt’,’/path/key’))
print(response.status_code)
代理设置
import requests
proxies={“http”:”http://127.0.0.1:9743”
“https”:”https://127.0.0.1:9743“}
response = requests.get(url,proxies=proxies)
print(response.status_code)
超时设置
import requests
response = requests.get(url,timeout = 1)
print(response.status_code)
认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get(url,auth=HTTPBasicAuth(“user”,”passwd”))
print(r.status_code)
import requests
r = requests.get(url,auth=(“user”,”passwd”))
print(r.status_code)
相关文章推荐
- python爬虫入门笔记:Requests库
- Python爬虫天气预报实例详解(小白入门)
- python 爬虫入门(4) opener详解
- python爬虫入门笔记:Requests库
- python爬虫入门--Requests库介绍及实例
- 数据爬虫(三):python中requests库使用方法详解
- Python爬虫入门:Urllib parse库使用详解(二)
- Python爬虫小白入门(二)requests库
- 数据爬虫(三):python中requests库使用方法详解
- Python爬虫入门:Urllib库使用详解(模拟CSDN登录)
- Python爬虫从入门到放弃(十三)之 Scrapy框架的命令行详解
- python中数据爬虫requests库使用方法详解
- 爬虫入门【9】Python链接Excel操作详解-openpyxl库
- 学习Python爬虫(三):Requests库入门级使用
- Python爬虫入门(二)requests库
- python3 爬虫入门(二)requests库基本使用
- python爬虫从入门到放弃(四)之 Requests库的基本使用
- Python爬虫入门 第一章 Requests库入门
- Python爬虫从入门到放弃(十三)之 Scrapy框架的命令行详解
- python 爬虫系列(1) --- requests库入门