慕课爬虫
2016-06-02 09:56
225 查看
https://www.crummy.com/software/BeautifulSoup/
结果:
#!/usr/bin/python # coding=utf-8 from bs4 import BeautifulSoup import re html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ soup = BeautifulSoup(html_doc,'html.parser',from_encoding = 'utf-8') print '获取所有的链接' links = soup.find_all('a') for link in links: print link.name, link['href'],link.get_text() print '获取lacie的链接' link_node = soup.find('a',href='http://example.com/lacie') print link_node.name, link_node['href'],link_node.get_text() print '正则匹配 ill' #r"" ,字符串中反斜线 只用写一次 link_node = soup.find('a',href=re.compile(r"ill") ) print link_node.name, link_node['href'],link_node.get_text() print '获取p段落文字' #r"" ,字符串中反斜线 只用写一次 p_node = soup.find('p',class_="title" ) print p_node.name, p_node.get_text()
结果:
获取所有的链接 a http://example.com/elsie Elsie a http://example.com/lacie Lacie a http://example.com/tillie Tillie 获取lacie的链接 a http://example.com/lacie Lacie 正则匹配 ill a http://example.com/tillie Tillie 获取p段落文字 p The Dormouse's story
相关文章推荐
- 一种高性能与高可用的流媒体系统之媒体流状态管理方法
- 软件测试笔记
- ol2已知经纬度在图层上画圆
- jsp页面中的post提交方式
- AndroidStudio使用Git上传代码至GitHub
- java.lang.IllegalArgumentException: Service Intent must be explicit: Intent { act=xxx}
- Spring MVC返回JSON数据
- 面试题91:清除矩阵0所在行列
- iOS内IPC(进程间通信)方法小结
- 课堂练习-最低价购书方案
- 第二阶段团队站立会议07
- 子网划分题目与解析
- codeforces 677 D. Vanya and Treasure
- Mybatis使用<foreach collection="***" item="**" open="(" separator="," close=")">问题
- object c中的多态
- 2016.06.01回顾
- (selenium 小知识点)解决使用Webdrive打开Firefox不含有插件的问题(python)
- 使用CAShapeLayer与UIBezierPath画出想要的图形
- Apache+tomcat实现负载均衡
- 关于ios object-c 类别-分类 category 的静态方法与私有变量,协议 protocol