您的位置:首页 > 其它

在《大耳朵英语》的播客上,批量获取MP3地址

2012-01-20 10:58 253 查看
先不废话了,直接上代码

#!C:\Perl\bin\perl.exe
use strict;
use warnings;
use LWP::Simple;
++$|;
my $BaseUrl = 'http://oral.ebigear.com';

#以后需要添加新的播客时,就在这里添加信息就可以了
#http://oral.ebigear.com/mypodcastlist-3065994-1.html 这个是faith老师的播客地址,她的用户ID就是3065994;
#13是她播客中的总页数。
my %TeacherInfo = (
'Faith'         => '3065994,13',
'xydj'          => '1507470,3',
'Creature X'    => '1555304,22',
);

#选择要下载的播客用户名
my $Username = 'Creature X';
#my $Username = 'Faith';
#my $Username = 'xydj';

my ($UserId, $PageSum) = split /,/, $TeacherInfo{$Username};
print << "PRINT";
Getting $Username info:
UserId is $UserId
Podcast PageSum is $PageSum
PRINT

my %Count = ();
my @Mp3List = ();
my @FinalList = ();
my $Destination = "D:\\${UserId}_mp3_list.txt";

for my $Page (1 .. $PageSum)
{
my $WebSite = "$BaseUrl/mypodcastlist-$UserId-$Page.html";
print "\n=========== Here is $WebSite ===========\n";
my $Content = get("$WebSite");
my @Results = ($Content =~ /(?<=window\.location\.href='\/)(podcast-\d{1}-\d{5}\.html)/g);
my @ChildUrlList = grep {++$Count{$_} < 2;} @Results;

for my $ChildUrl (@ChildUrlList)
{
my $ChildWebSite = "$BaseUrl/$ChildUrl";
print "Catching $ChildWebSite.\n";
my $ChildContent = get("$ChildWebSite");
$ChildContent =~ /(http:.*?(?:$UserId)\.mp3)/;
push @Mp3List, $1;
print "    Get $1\n";
}
}

%Count= ();
@FinalList = grep {++$Count{$_} < 2;} @Mp3List;
print "\n\nCongratulations! Obtain @{[scalar(@FinalList)]} mp3 Sucesslly!\n";

open my $fh, '>', $Destination;
print $fh (join "\n", @FinalList);

print "You cat find the mp3 list in $Destination.\n";
exit;


以上代码在获取网页的时候,在有些平台上会显示乱码,不过不影响我的正则表达式取关键字。

如果实在看不惯,则需要encode来转码。

#!C:\Perl\bin\perl.exe
use strict;
use warnings;
use LWP::Simple;
use Encode qw{encode}; #将网页转码

my $content =  get("http://oral.ebigear.com/mypodcastlist-3065994-1.html");
my $transfor = encode('gb2312', $content);   #转成gb2312即可
print $transfor,"\n";
my $prefix = qr/href="/;
my $suffix = qr/" target="_blank">大耳朵FAITH口语课堂-天天学/;
my @list = ($transfor =~ /(?<=$prefix)(.*?)(?=$suffix)/mg);
print join "\n", @list;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: