Scraping JavaScript webpages with webkit | WebScraping.com
2014-03-04 10:03
465 查看
Scraping JavaScript webpages with webkit | WebScraping.com
Scraping JavaScript webpages with webkitPosted 12 Mar 2010 in javascript, python, qt, and webkit
In the previous post I covered how to tackle JavaScript based websites with Chickenfoot. Chickenfoot is great but not perfect because it:
requires me to program in JavaScript rather than my beloved Python (with all its great libraries)
is slow because have to wait for FireFox to render the entire webpage
is somewhat buggy and has a small user/developer community, mostly at MIT
An alternative solution that addresses all these points is webkit, the open source browser engine used most famously in Apple's Safari browser. Webkit has now been ported to the Qt framework and can be used through its Python bindings.
Here is a simple class that renders a webpage (including executing any JavaScript) and then saves the final HTML to a file:
In the previous post I covered how to tackle JavaScript based websites with Chickenfoot. Chickenfoot is great but not perfect because it:
requires me to program in JavaScript rather than my beloved Python (with all its great libraries)
is slow because have to wait for FireFox to render the entire webpage
is somewhat buggy and has a small user/developer community, mostly at MIT
An alternative solution that addresses all these points is webkit, the open source browser engine used most famously in Apple's Safari browser. Webkit has now been ported to the Qt framework and can be used through its Python bindings.
Here is a simple class that renders a webpage (including executing any JavaScript) and then saves the final HTML to a file:
import sys from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWebKit import * class Render(QWebPage): def __init__(self, url): self.app = QApplication(sys.argv) QWebPage.__init__(self) self.loadFinished.connect(self._loadFinished) self.mainFrame().load(QUrl(url)) self.app.exec_() def _loadFinished(self, result): self.frame = self.mainFrame() self.app.quit() url = 'http://webscraping.com' r = Render(url) html = r.frame.toHtml()I can then analyze this resulting HTML with my standard Python tools like the webscraping module.
相关文章推荐
- Scraping JavaScript webpages with webkit | WebScraping.com
- Scraping Web Pages With R
- Alter the structure of web pages with JavaScript
- 《Ajax关键技术与典型案例》AJAX: Creating Web Pages with Asynchronous JavaScript and XML (Bruce Perens' Open Source Series) (Paperback)
- Web scraping with Nightmare.js | azurelogic.com
- AJAX: Creating Web Pages with Asynchronous JavaScript and XML
- CS001496 - Gather data from web page with JavaScript, WebKit, and Qt
- CS001496 - Gather data from web page with JavaScript, WebKit, and Qt
- Issue 6 - phantomjs - Debugging with Web Inspector - headless WebKit with JavaScript API - Google Project Hosting
- CS001497 - Add data to a web page with JavaScript, WebKit, and Qt
- Issue 6 - phantomjs - Debugging with Web Inspector - headless WebKit with JavaScript API - Google Project Hosting
- CS001496 - Gather data from web page with JavaScript, WebKit, and Qt
- 使用R语言和XML包抓取网页数据-Scraping data from web pages in R with XML package
- 严重: A web application created a ThreadLocal with key of type [null] (value [com.sun.faces.config.Con
- Could not find result map com.rmyy.web.model.Department] with root cause
- LoadRunner levels of integration with web pages
- 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---BeautifulSoup---findAll
- JS DOM推荐forum(www.webmasterworld.com/javascript/)
- pjscrape: A web-scraping framework written in Javascript, using PhantomJS and jQuery
- [翻译]<Web Scraping with Python>Chapter 2.高级HTML解析