您的位置:首页 > 运维架构 > 网站架构

爬取动态网站数据(soup的css方式处理数据)

2018-02-05 11:52 295 查看
import requests
from bs4 import BeautifulSoup

url = 'https://knewone.com/discover?page='

def get_info(url,data=None):
wd_data = requests.get(url)
soup = BeautifulSoup(wd_data.text,'lxml')
titles = soup.select('section.content > h4 > a')
imgs = soup.select('a.cover-inner > img')
links = soup.select('section.content > h4 > a')

for title,img,link in zip(titles,imgs,links):
data = {
'title':title.get('title'),
'img':img.get('src'),
'link':link.get('href')
}
print(data)

def get_more(start,end):
for one in range(start,end):
get_info(url+str(one))

get_more(1,5)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: