您的位置:首页 > 编程语言 > Python开发

python爬虫入门之requests库详解1

2018-03-30 08:58 585 查看
****## python爬虫入门之requests库详解 ##****


简单说就是:resqusts.text返回的是Unicode型的数据。

resquests.content返回的是bytes型也就是二进制的数据。

如果你想取文本,可以通过resquests.text。

如果想取图片,文件,则可以通过resquests.content。

(resquests.json()返回的是json格式数据)

import requests

response = requests.get(url)

print(type(response))

print(response.status_code) #返回状态码

print(type(response.text))

print(response.text) #返回响应头

print(response.cookies) #返回cookies

基本get请求

基本写法:

import requests

response = requests.get(url)

print(response.text)

带参数的get 请求

import requests

data = {“name”:”ljy”,”age”:18}

response = requests.get(url,params=data)

print(response.text)

解析json

import requests

import json

response = resquests.get(url)

print(type(response.text))

print(response.json())

print(type(response.json()))

获取二进制数据(图片,视频)

import requests

response = requests.get(url)

print(type(response.text),type(response.content))

print(response.text)

print(response.content) #获取二进制数据流

with open(文件路径,”wb”) as f:

f.write(response.content)

f.close()

添加headers

import requests

headers = {‘User-Agent’:’[b]*[/b]’}

response = requests.get(url,headers=hesders)

print(response.text)

基于post请求(表单上传)

import requests

data = {“name”=”ljy”,age = 18}

response = requests.post(url,data=data)

print(response.text)

添加headers

import requests

headers = {‘User-Agent’:’*‘}

data = {“name”=”ljy”,age = 18}

response = requests.post(url,data=data,headeres=headers)

print(response.json())

响应

response属性

print(type(response.status_code),response.status_code)

返回状态码

print(type(response.headers),response.headers)

返回请求头

print(type(response.cookies),response.cookies)

返回cookies内容

print(type(response.url),response.url)

返回url

print(type(response.history),response.history)

返回访问历史记录

高级操作

文件上传

import requests

files = {‘file’:open(文件路径,”rb”)}

response = requests.post(url,files=files)

print(response.text)

获取cookies

cookies是字典格式

import requests

response = requests.get(url)

print(response.cookies)

for key,value in response.cookies.items():

print(key+’=’+value)

会话维持

import requests

s = requests.Session()

s.get(url_1)

response = s.get(url_2)

print(response.text)

证书验证

import requests

response = requests.get(url,verify=False)

print(response.status_code)

verify=False代表访问网站不需要证书,会出现警告,告诉你需要一个安全证书

import requests

from requests.packages import urllib3

urllib3.disable_warning()

response = requests.get(url,verify=False)

print(response.status_code)

这样设置可以取消warning的出现

手动设置证书

import requests

response = requests.get(url,cert=(‘/path/server.crt’,’/path/key’))

print(response.status_code)

代理设置

import requests

proxies={“http”:”http://127.0.0.1:9743

“https”:”https://127.0.0.1:9743“}

response = requests.get(url,proxies=proxies)

print(response.status_code)

超时设置

import requests

response = requests.get(url,timeout = 1)

print(response.status_code)

认证设置

import requests

from requests.auth import HTTPBasicAuth

r = requests.get(url,auth=HTTPBasicAuth(“user”,”passwd”))

print(r.status_code)

import requests

r = requests.get(url,auth=(“user”,”passwd”))

print(r.status_code)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: