Beautifulsoup 小用
2016-02-08 13:57
330 查看
用 beautifulsoup 爬了下伯克利大学 programming languages and compilers 的课件
import re
import requests
from bs4 import BeautifulSoup
r = requests.get( "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/index.html" )
soup = BeautifulSoup( r.text, "html.parser" )
for elem in soup.findAll( name = "a", attrs = { "href" : re.compile( "lecture[0-9]*.pdf" ) } ):
file_name = elem["href"][:-4] + "-" +\
reduce( lambda a, b: a + " " + b,
elem.find_parent().find_previous_sibling().get_text().split( ":" ) ) + ".pdf"
file_url = "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/" + elem["href"]
file_get = requests.get( file_url, stream = True )
with open( file_name, "wb" ) as f:
for chunk in file_get.iter_content( chunk_size = 1024 ):
if chunk:
f.write( chunk )
import re
import requests
from bs4 import BeautifulSoup
r = requests.get( "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/index.html" )
soup = BeautifulSoup( r.text, "html.parser" )
for elem in soup.findAll( name = "a", attrs = { "href" : re.compile( "lecture[0-9]*.pdf" ) } ):
file_name = elem["href"][:-4] + "-" +\
reduce( lambda a, b: a + " " + b,
elem.find_parent().find_previous_sibling().get_text().split( ":" ) ) + ".pdf"
file_url = "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/" + elem["href"]
file_get = requests.get( file_url, stream = True )
with open( file_name, "wb" ) as f:
for chunk in file_get.iter_content( chunk_size = 1024 ):
if chunk:
f.write( chunk )
相关文章推荐
- STL之map
- eclipse开发环境打造系列----->Python开发环境集成
- shell
- Intent(意图)转跳页面
- Reverse Integer LeetCode 第七题
- Linux入门回顾
- eclipse启动报错
- Ubuntu 14.10安装手记
- GitHub学习
- Linux中使用Boxes安装windows7
- cocos2d-x 帧动画创建一
- 浅谈 C#委托
- 创建敌人基类
- eclipse在Android xml布局中提示
- 物理引擎一
- java的13种核心技术
- codevs 1992 聚会
- break和continue跳出指定for循环(for一些特点)
- MEF依赖注入无法在在构造函数中使用的解决办法
- Linux下的Nginx安装