您的位置:首页 > 编程语言 > Python开发

使用python爬下了某网站的图片

2015-01-15 15:47 281 查看
昨天突然想起乌青体,找到了他的网站:http://wuqing.org

点进了“手写诗”:http://wuqing.org/sxs/sxs032

发现了后台的文件路径:http://wuqing.org/wp-content/uploads/2013/02/sxs032.jpg

然后发现:http://wuqing.org/wp-content/uploads/ 居然不添加访问权限。。。。。。

因为之前学过一点python,也看过一点前端,于是无聊就想把图片爬下来,于是写了下面一段粗糙的小脚本。。。。。

写了个同步单线程的,后来还改成了多线程的,发现网络连接有问题,所以又注释掉了,什么时候有时间再看看吧_(:з」∠)_

import urllib,urllib2  
#import bs4, re  
import sys  
reload(sys)  
sys.setdefaultencoding('utf8')  
import os,shutil
import BeautifulSoup,re
import threading
#sys.setdefaultencoding('utf-8')  

class getImgThread(threading.Thread):
	def __init__(self,imgUrl,fileName):
		threading.Thread.__init__(self)
		self.url=imgUrl
		self.fileName=fileName
	def run(self):
		mutex.acquire()
		#print self.url
		print 'getting...',self.url
		mutex.release()
		urllib.urlretrieve(self.url,self.fileName)
		print 'saving...',self.fileName
		
if __name__ == '__main__':

	purl =  'http://wuqing.org/wp-content/uploads/2013/'
	psavepath  = r'D:/mycode/Python/MyWorks/wuqingshi'
	headers = { 'Use-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6' }  
	
#	if os.path.isdir("wuqingshi"):  
#		shutil.rmtree("wuqingshi")      # delete dir 
	if not os.path.isdir('wuqingshi'):  
		os.makedirs('wuqingshi')        # make dir
		
	mutex = threading.Lock()
	threads = []
	
	for i in range(1,13):
		if i < 10:
			url = purl + '0' + str(i)
		else:
			url = purl + str(i)
		try:
			req = urllib2.Request(url, headers=headers)  
			content = urllib2.urlopen(req).read()  
			#content = BeautifulSoup.BeautifulSoup(content, from_encoding='GB18030')   # BeautifulSoup  
			content = BeautifulSoup.BeautifulSoup(content) 
		except Exception,e:
			pass
		
		file = content.findAll(href=re.compile(r'.jpg'))
		for ii in range(1,len(file)):
			picname = str(file[ii].text)
			picurl = url + '/' + picname
			filename = psavepath + r'/' + picname
			
			print 'getting...',picurl
			try:
				urllib.urlretrieve(picurl, filename)
			except Exception,e:
				pass
			print 'saving...',filename
#			try:
#				threads.append(getImgThread(picurl,filename))
#			except Exception,e:
#				pass
			
#	for t in threads:
#		t.start()
#	for t in threads:
#		t.join()
#	print 'End'
	print 'all downloading is done!'


ps:只是爬着玩玩,没有用作商业用途,程序也仅供学习。乌青的诗还是挺好玩的,尊重知识产权,想买的同志还是点其淘宝链接进去买吧。侵权则删。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: