Python 批量下载xkcd漫画
2015-12-20 11:48
726 查看
#coding=utf-8 import urllib import re #start page number start = 1 #end page number end = 1613 prevUrl = 'http://xkcd.com/' #download html file def getHtml(url): page = urllib.urlopen(url) html = page.read() return html #parse comic image url from html def getImgUrl(html): reg = r'src="(.+?\.png)" title=' imgre = re.compile(reg) imglist = re.findall(imgre,html) if(len(imglist) > 0): return imglist reg = r'src="(.+?\.jpg)" title=' imgre = re.compile(reg) imglist = re.findall(imgre,html) return imglist #down load comic image and save it to file with name def getImg(url,name): conn = urllib.urlopen(url) f = open(name,'wb') f.write(conn.read()) f.close() #test function def loopPrintUrl(imglist): for imgurl in imglist: url = 'http:' + imgurl print (url) #append image download url def getImgFileNameFromUrl(url): strlist = url.split('/') return strlist[4] # download xkcd comic image def loopDownLoadXKCDImg(): for i in range(start,end + 1): downloadUrl = prevUrl + str(i) + '/' #print(downloadUrl) html = getHtml(downloadUrl) #print(html) urlList = getImgUrl(html) for tmpurl in urlList: filename = str(i)+ "_" + getImgFileNameFromUrl(tmpurl) imgDownLoadurl = "http:" + tmpurl getImg(imgDownLoadurl,filename) print (str(i) + " " + imgDownLoadurl + " -> down") loopDownLoadXKCDImg()
start 是起始漫画索引
end是结束漫画的索引(此脚本写完的时候xkcd最新的是1613张)
以上代码在 Python2.7测试通过
相关文章推荐
- python two-dimensional array assignment initialize
- 一入python深似海--dict(字典)的一种实现
- python 爬虫时遇到问题及解决
- Python学习笔记一 ,环境搭建。
- 关于python的一些笔记
- python从入门到精通(DAY 3)
- 《利用python进行数据分析》读书笔记--第十一章 金融和经济数据应用(一)
- python数据类型详解
- python从入门到精通(DAY 2)
- [Python数据分析]绪
- python: line=f.readlines() 后如何消除line中的'\n'
- 利用python代码写的12306订票代码
- python从入门到精通(DAY 1)
- python之编写登陆接口(第一天)
- Python基本数据类型之list列表
- python从入门到精通(DAY 1)
- 利用python代码写的12306订票代码
- python从入门到精通(DAY 2)
- python从入门到精通(DAY 3)
- python 反射