如何在百度下载图片?
2015-10-03 21:17
555 查看
最近要在百度上下载图片座测试, 不想手动下载, 因此研究了一下自动下载脚本.
成果如下:
# -*- coding: utf-8 -*-
import os
import urllib2
import json
tags = ['运动服']
urls = [];
savePath = './'
for tag2 in tags:
print 'start download theme :' , tag2
startNum = 0 ; # the index of the start image to download
resultNum = 60 # the number of images one time can be got form baidu image by json , 60 is the upper bound
endnum = 3000
totalNum = -1 # the total number of the theme images
downloadNum = 0
path = unicode(savePath + '/' + tag2 + '/' , 'utf8')
if not os.path.exists(path):
os.makedirs(path)
while totalNum == -1 or startNum < totalNum or startNum > endnum:
oneRequeseNum = 0
try:
url = 'http://image.baidu.com/i?tn=baiduimagejson&width=&height=&ie=utf8&oe=utf-8&word=' + tag2 + '&pn=' + str(startNum) + '&rn=' + str(resultNum)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {"User-Agent" : user_agent}
req = urllib2.Request(url , headers=headers)
html = urllib2.urlopen(req , timeout=100)
jsonData = json.loads(html.read())
# print jsonData
if totalNum == -1:
totalNum = jsonData['displayNum']
print 'toatl number :', totalNum
data = jsonData['data']
for index , item in enumerate(data):
oneRequeseNum += 1
if item.has_key("objURL"):
url = item['objURL']
urls.append(url);
except Exception , e:
print "Exception : " , str(e)
print url
oneRequeseNum = oneRequeseNum+100
finally:
startNum = startNum + oneRequeseNum
print 'Finish download theme : ' , tag2
print 'Download images number :' , startNum
ff = open('urls.txt','w');
for url in urls:
ff.write('%s\n'% url)
ff.close()
这里有个注意的地方: url中的utf8等关键字需要加载在str之前. 如果加载再之后, 我的程序报错.
参考:
http://blog.csdn.net/yuanwofei/article/details/16343743 http://www.devba.com/index.php/archives/3321.html http://blog.csdn.net/viomag/article/details/38340993
以及原本代码是https://github.com/busz/BaiduImageDownloader
成果如下:
# -*- coding: utf-8 -*-
import os
import urllib2
import json
tags = ['运动服']
urls = [];
savePath = './'
for tag2 in tags:
print 'start download theme :' , tag2
startNum = 0 ; # the index of the start image to download
resultNum = 60 # the number of images one time can be got form baidu image by json , 60 is the upper bound
endnum = 3000
totalNum = -1 # the total number of the theme images
downloadNum = 0
path = unicode(savePath + '/' + tag2 + '/' , 'utf8')
if not os.path.exists(path):
os.makedirs(path)
while totalNum == -1 or startNum < totalNum or startNum > endnum:
oneRequeseNum = 0
try:
url = 'http://image.baidu.com/i?tn=baiduimagejson&width=&height=&ie=utf8&oe=utf-8&word=' + tag2 + '&pn=' + str(startNum) + '&rn=' + str(resultNum)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {"User-Agent" : user_agent}
req = urllib2.Request(url , headers=headers)
html = urllib2.urlopen(req , timeout=100)
jsonData = json.loads(html.read())
# print jsonData
if totalNum == -1:
totalNum = jsonData['displayNum']
print 'toatl number :', totalNum
data = jsonData['data']
for index , item in enumerate(data):
oneRequeseNum += 1
if item.has_key("objURL"):
url = item['objURL']
urls.append(url);
except Exception , e:
print "Exception : " , str(e)
print url
oneRequeseNum = oneRequeseNum+100
finally:
startNum = startNum + oneRequeseNum
print 'Finish download theme : ' , tag2
print 'Download images number :' , startNum
ff = open('urls.txt','w');
for url in urls:
ff.write('%s\n'% url)
ff.close()
这里有个注意的地方: url中的utf8等关键字需要加载在str之前. 如果加载再之后, 我的程序报错.
参考:
http://blog.csdn.net/yuanwofei/article/details/16343743 http://www.devba.com/index.php/archives/3321.html http://blog.csdn.net/viomag/article/details/38340993
以及原本代码是https://github.com/busz/BaiduImageDownloader
相关文章推荐
- uvaoj 11825 - Hackers' Crackdown
- 链表004
- [java学习笔记]java语言基础概述之运算符&程序流程控制&for循环嵌套
- SI中Macro的使用
- Linux内核导出符号宏定义EXPORT_SYMBOL源代码分析
- 链表003
- for或while循环的break
- HTTPS的工作原理
- Hive笔记一:初识
- Apworks框架实战
- hdu5494 Card Game(BestCoder Round #58 (div.2))
- LintCode 数组划分
- GeoHash解析及java实现
- 内心长大
- c语言面试之字符串
- POJ 2082 Terrible Sets
- 单例Singleton
- Log4j官方文档翻译(五、日志输出的方法)
- 多进程和多线程的优缺点
- 第六篇 Replication:合并复制-发布