Beatiful Soup获取淘宝商品详情
2014-07-02 17:29
183 查看
Beatiful Soup生成商品详情页面的剖析树,
主要函数:findAll(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)
利用findAll先获取标签范围的内容,再利用正则表达式进行匹配输出。
Beatiful Soup的中文文档:
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html#Searching%20the%20Parse%20Tree
程序:
运行结果:
主要函数:findAll(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)
利用findAll先获取标签范围的内容,再利用正则表达式进行匹配输出。
Beatiful Soup的中文文档:
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html#Searching%20the%20Parse%20Tree
程序:
#!/usr/bin/python import urllib2 import sys import chardet import re from BeautifulSoup import BeautifulSoup def html(): # rfile = open(urllist,'rb') # buf = rfile.read().split('\n') # rfile.close() # for i in range(len(buf)): # website = buf[i] # print website website = raw_input("input link:") page = urllib2.urlopen(website).read() mychar=chardet.detect(page) # print mychar html = BeautifulSoup(page) # print html.originalEncoding # html = BeautifulSoup(pageg, fromEncoding="gbk") m = re.match('http:\/\/(.*).(com|cn)',website).group(1) patt = '[1-9][0-9]*(?:\.[0-9]+)?|0\.[0-9]+]' if m == 'item.taobao': price = html.find(attrs={"class":"tb-public-price"}) match1 = re.search(patt,str(price)) img = html.find(attrs={"id":"J_ImgBooth"}) match2 = re.search('src="(http.*jpg)"',str(img)) print "title:",html.title.text print "price:",match1.group() print "img:",match2.group(1) elif m == 'detail.tmall' or m == 'chaoshi.detail.tmall': price = html.find(attrs={"class":"detail-price tm-clear"}) match1 = re.search(patt,str(price)) img = html.find(attrs={"id":"J_ImgBooth"}) match2 = re.search('src="(http.*jpg)"',str(img)) print "title:",html.title.text print "price:",match1.group() print "img:",match2.group(1) elif m == 'detail.ju.taobao': price = html.find(attrs={"class":"currentPrice floatleft"}) img = html.find(attrs={"class":"normal-pic "}) if img == None : img = html.find(attrs={"class":"item-pic-wrap"}) match1 = re.search(patt,str(price)) match2 = re.search('src="(http[^\"]*?)"',str(img)) print "title:",html.title.text print "price:",match1.group() print "img:",match2.group(1) else: print website if __name__ == '__main__': html()
运行结果:
----@ubuntu:~/python$ python html.py input link:http://item.taobao.com/item.htm?spm=1.7274553.1997522421.1.FKA5Ar&id=38443208410&scm=2004.1.515.0 title: 2014夏装新款欧美风ZARA MICN女装衬衫白底定位印花长袖雪纺衫女-淘宝网 price: 43.00 img: http://img03.taobaocdn.com/bao/uploaded/i3/T1MnJaFJXeXXXXXXXX_!!0-item_pic.jpg_400x400.jpg[/code]-----@ubuntu:~/python$ python html.py input link:http://detail.ju.taobao.com/home.htm?spm=608.2291429.1.d1.tmDQQs&item_id=39165873670&id=10000002887630 title: 【聚_世界杯】【三只松鼠】爆款坚果组合750g-聚划算团购 price: 42.90 img: http://gju3.alicdn.com/bao/uploaded/i1/T1aV7LFGRcXXb1upjX.jpg_400x400Q90.jpg[/code]
相关文章推荐
- BeatifuSoup获取淘宝商品分类
- 仿淘宝商品详情页TabLayout+ListView
- android 自定义ViewGroup实现仿淘宝的商品详情页
- Android--仿淘宝商品详情(继续拖动查看详情)及标题栏渐变
- 淘宝客,根据淘宝Url,获取到商品的ID
- android 自定义ViewGroup实现仿淘宝的商品详情页
- Android之仿淘宝商品详情浏览效果
- 自定义ViewGroup实现仿淘宝的商品详情页
- 求助:关于淘宝商品详情数据的抓取问题
- 高仿淘宝商品详情标题栏渐变
- Android中仿淘宝商品详情ViewPager页面数据手动滑动
- 微信小程序的剪贴板 +复制剪贴,在淘宝中打开就可以获取到商品
- 仿京东、淘宝、一号店商品详情上拉显示图文详情
- 淘宝 获取商品列表流程
- ecshop 详情页面获取商品销量和评论数
- python爬取淘宝商品详情页数据
- 仿京东、淘宝商品详情中上滑tableView的cell与headerView之间的动画效果
- PHP实例函数:获取淘宝商品价格
- Android开发之仿淘宝商品详情页
- PHP 获取淘宝商品价格 函数