您的位置:首页 > 运维架构 > Shell

郑云搞笑视频全集批量下载(using python bash)

2011-12-19 18:27 411 查看
郑云是谁?你懂得, 还在为下载他的搞笑视频而嫌麻烦么?两个脚本就搞定!

共需三步:

(1)还记得之前发的那个下载视频的python脚本么?就在下面(有所改进,可以传递第二个视频质量参数,默认是high,即高清质量):

(请用指定的python安装路径替换#!/usr/local/bin/python)

#!/usr/local/bin/python

#test for command line parameter(s)
#import sys
#print 'scriptname: ', sys.argv[0]
#(i, len) = (1, len(sys.argv))
#while i < len:
#  print 'command parameter', i, sys.argv[i]
#  i = i+1
#exit(0)

import sys

argc = len(sys.argv)
if argc == 2:
format = 'high'
elif argc == 3:
format = sys.argv[2]
else:
print("Usage: %s videourl [videoquality=normal|high|super|...]" % sys.argv[0])
print(" e.g.");
print("   %s http://v.youku.com/v_show/id_XMzMzMjE0MjE2.html super" % sys.argv[0])
exit(1)

videourl = sys.argv[1];

import urllib2
import urllib
url = 'http://www.flvcd.com/parse.php?kw=' + urllib.quote(videourl)  + '&format=' + format;

req = urllib2.Request(url);
# add some headers to fake Firefox Browser(if we don't do so, there will be a problem when try to get tudou video)
req.add_header('host', 'www.flvcd.com');
req.add_header('Referer', url[:-4]);
req.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0) Gecko/20100101 Firefox/4.0');
req.add_header('Accept-Language', 'en-us,en;q=0.5');
req.add_header('Accept-Encoding', 'gzip, deflate');
req.add_header('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7');
req.add_header('Keep-Alive', '115');
res = urllib2.urlopen(req);
html = res.read()

import re
pattern = re.compile('<input\s+type="hidden"\s+name="inf"\s+value="([^"]+)');
firstmatch = pattern.search(html);
urls = firstmatch.group(1);
urls = unicode(urls, 'gbk'); # urls turns out to be utf8 encoding

urlpattern = re.compile('<[NU]>(.+)');
result = urlpattern.findall(urls);

data = [result[i:i+2] for i in range(0, len(result), 2)]

print '\n--- Start to download from url "%s" (%d block(s) in total):' % (videourl, len(data))
for k, v in enumerate(data):
print '  >downloading Block %.2d ...' % (k+1,)
urllib.urlretrieve(v[1], v[0] + '.flv')
print '  downloaded Block.%.2d completely<' % (k+1,)
print '--- finished ---\n'
保存到~/dl.py
然后chmod u+x ~/dl.py使其可执行

(2)然后编写批量下载郑云优酷空间的视频的shell脚本

(请确保linux or mac系统中已经安装curl, 否则需要用其他方法解析郑云优酷空间视频地址了, 比如可以用python)

#!/bin/bash

echo
echo 'press Control-Z to exit downloading if you wanna, now begin downloading...'
echo

downloadpage () {
page=$1
set $(curl http://u.youku.com/user_video/id_UODU1ODc1NTI=_order_1_type_1_page_$page.html 2> /dev/null | grep -oP '<a href="http://v.youku.com/v_show/[^"]+' | grep -oP 'http://.+')

echo "downloading page $page"

while [ "$1" != "" ]
do
echo " downloading from $1: "
~/dl.py "$1" super
shift
done
echo "page $page downloaded"
}

set 1 2 3 4 5 6 7
while [ "$1" != "" ]
do
downloadpage "$1"
shift
done
保存为~/zy.sh

然后chmod u+x ~/zy.sh使其可执行

(3)进入欲下载到的目录, 然后执行~/zy.sh就可以将郑云优酷空间所有搞笑视频下载下来了

PS: 视频很多,请确保磁盘有足够的空间和足够的时间(可能需要相当一段时间, 大概有7*30个视频, 你懂得),另外偶没有在python中检查错误(偶才学一点点呢),Windows用户请当做没看过这篇文章伐。。。没有合并相同视频的多个分块,目前正在研究flv视频的合并在。。。

预览图:





updated log:

2011-12-20 16:15 更新解决编码转换问题(flvcd.com是gbk编码的, 所以之前flvcd.com网站上写的gb2312和偶写的gb2312都有问题)

试试看!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: