您的位置：首页 > 其它

douban 模拟自动登录爬去数据

2017-04-07 10:13 141 查看

python 自动登录

import urllib.parse,urllib.request,http.cookiejar

#########################封装cookie信息################################

cookie = http.cookiejar.CookieJar()
cookieProc = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(cookieProc)
urllib.request.install_opener(opener)

########################构造函数，进行页面上的请求：

def GetUrlRequest(iUrl,iStrPostData,header):
postdata=urllib.parse.urlencode(iStrPostData)
postdata=postdata.encode(encoding='UTF8')
req= urllib.request.Request(
url = iUrl,
data = postdata,
headers = header)
result=urllib.request.urlopen(req).read().decode("UTF8")
return result

############################header和postdata数据准备##########################

header = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0',
'Referer' : 'https://accounts.douban.com/login'  }

iStrPostData = {
'form_email' : '自己的账号',
'form_password' : '密码'
}

##############iUrl 为post对应的网址
iUrl='https://accounts.douban.com/login'

# print(GetUrlRequest(iUrl,iStrPostData,header))
GetUrlRequest(iUrl,iStrPostData,header)

#########执行了上面的函数后，爬虫程序便已经具有cookie信息，以后在访问其他页面的时候，直接传入header和url即可，不要重复传入postdata，同样，也不要再次执行上面的函数操作。
for i in range(0, 3):

url='https://www.douban.com/?p='+str(i)
req=urllib.request.Request(url=url,headers=header)
result=urllib.request.urlopen(req).read().decode("UTF8")
print(url)
print(result)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航