您的位置:首页 > 其它

如何在百度下载图片?

2015-10-03 21:17 555 查看
最近要在百度上下载图片座测试, 不想手动下载, 因此研究了一下自动下载脚本.

成果如下:

# -*- coding: utf-8 -*-

import os
import urllib2
import json

tags = ['运动服']

urls = [];

savePath = './'

for tag2 in tags:
print 'start download theme :' , tag2

startNum = 0 ; # the index of the start image to download
resultNum = 60 # the number of images one time can be got form baidu image by json , 60 is the upper bound

endnum = 3000

totalNum = -1 # the total number of the theme images
downloadNum = 0

path = unicode(savePath + '/' + tag2 + '/' , 'utf8')
if not os.path.exists(path):
os.makedirs(path)

while totalNum == -1 or startNum < totalNum or startNum > endnum:

oneRequeseNum = 0

try:

url = 'http://image.baidu.com/i?tn=baiduimagejson&width=&height=&ie=utf8&oe=utf-8&word=' + tag2 + '&pn=' + str(startNum) + '&rn=' + str(resultNum)

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {"User-Agent" : user_agent}
req = urllib2.Request(url , headers=headers)
html = urllib2.urlopen(req , timeout=100)

jsonData = json.loads(html.read())

# print jsonData
if totalNum == -1:
totalNum = jsonData['displayNum']
print 'toatl number :', totalNum

data = jsonData['data']

for index , item in enumerate(data):

oneRequeseNum += 1

if item.has_key("objURL"):
url = item['objURL']
urls.append(url);

except Exception , e:
print "Exception : " , str(e)
print url
oneRequeseNum = oneRequeseNum+100

finally:
startNum = startNum + oneRequeseNum
print 'Finish download theme : ' , tag2
print 'Download images number :' , startNum

ff = open('urls.txt','w');
for url in urls:
ff.write('%s\n'% url)
ff.close()


这里有个注意的地方: url中的utf8等关键字需要加载在str之前. 如果加载再之后, 我的程序报错.

参考:
http://blog.csdn.net/yuanwofei/article/details/16343743 http://www.devba.com/index.php/archives/3321.html http://blog.csdn.net/viomag/article/details/38340993
以及原本代码是https://github.com/busz/BaiduImageDownloader
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: