How to serve AJAX pages (Ember.js, Angular, etc) to Google's dummy bots?
2014-01-23 23:02
591 查看
There may / must be better ways but here we go.
Recipes:
Headless browser component, e.g.
PhantomJS [1] is the preferred case because it's 1) v8, 2) lightweight, comparing to the next choince
Firefox + Xvfb. I had to use this one because my site breaks under PhantomJS (even if it works fine under Chrome)
Selenium to drive the browser and generate the HTML.
Web server that serves the bots.
As defined by Google [2], AJAX apps should use #! to indicate the bots that it's an AJAX page, and bots will try to look for ?_escaped_fragment_= URL for this AJAX address and expect a JavaScript-free page. So there must be something to run the JavaScripts,
generate proper DOM for the dummy bots. Here comes in the headless browsers.
Xvfb is a special X server that runs (at least for me) on Linux and requires no interaction with graphics devices. It renders everything inside memory so can be run on headless servers like Amazon EC2 Linux servers easily. Firefox is the de facto for Linux,
works pretty well with Xvfb, and is the default driver for Selenium so it's the definite choice.
Selenium was designed for browser based test automation. It can drives different browsers starting Firefox (with built-in support), Chrome and IE (both require extra "driver"s). In Python there's an API for Selenium but also there are easier APIs like
Splinter, which is my choice.
If simply forwarding every URL to the firefox, we're loading a page 20 - 100x slower than actually loading in Firefox, because for each resources (CSS, JavaScript, Images) the server is actually starting a new Firefox tab (if not window) to retrieve that,
while the first AJAX page would have loaded them once already. This is slow, so a hack is done here to load all static resources via Requests instead. Better optimisations available, though.
So everything together is at [3].
Good luck.
[1] PhantomJS http://phantomjs.org
[2] Making AJAX applications crawlable https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=de-DE
[3] AJAX for dummies https://github.com/wolf0403/ajax-for-dummies
Recipes:
Headless browser component, e.g.
PhantomJS [1] is the preferred case because it's 1) v8, 2) lightweight, comparing to the next choince
Firefox + Xvfb. I had to use this one because my site breaks under PhantomJS (even if it works fine under Chrome)
Selenium to drive the browser and generate the HTML.
Web server that serves the bots.
As defined by Google [2], AJAX apps should use #! to indicate the bots that it's an AJAX page, and bots will try to look for ?_escaped_fragment_= URL for this AJAX address and expect a JavaScript-free page. So there must be something to run the JavaScripts,
generate proper DOM for the dummy bots. Here comes in the headless browsers.
Xvfb is a special X server that runs (at least for me) on Linux and requires no interaction with graphics devices. It renders everything inside memory so can be run on headless servers like Amazon EC2 Linux servers easily. Firefox is the de facto for Linux,
works pretty well with Xvfb, and is the default driver for Selenium so it's the definite choice.
Selenium was designed for browser based test automation. It can drives different browsers starting Firefox (with built-in support), Chrome and IE (both require extra "driver"s). In Python there's an API for Selenium but also there are easier APIs like
Splinter, which is my choice.
If simply forwarding every URL to the firefox, we're loading a page 20 - 100x slower than actually loading in Firefox, because for each resources (CSS, JavaScript, Images) the server is actually starting a new Firefox tab (if not window) to retrieve that,
while the first AJAX page would have loaded them once already. This is slow, so a hack is done here to load all static resources via Requests instead. Better optimisations available, though.
So everything together is at [3].
Good luck.
[1] PhantomJS http://phantomjs.org
[2] Making AJAX applications crawlable https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=de-DE
[3] AJAX for dummies https://github.com/wolf0403/ajax-for-dummies
相关文章推荐
- How to redirect to login page after cookie expires in Angular JS?
- How to call a service function in AngularJS ng-click
- How to use a keypress event in angularjs
- angularjs How to set focus on input field?
- How to use ngMessages in AngularJS
- How to use $http and $resource in Angular JS
- How to Implement Add/Edit/Delete/View with PHP using Angular JS (Part-2)
- HowToNodejs - fly-js-wbean - 在线游戏平台网络传输框架 - Google Project Hosting
- How to Implement Add/Edit/Delete/View with PHP using Angular JS (Part-1)
- How to Create Custom Filters in AngularJs
- How to conditionally apply CSS styles in AngularJS?
- [转]How to Create Custom Filters in AngularJs
- How to define multiple controllers for one view in angularJS?
- Angular.js VS. Ember.js:谁将成为Web开发的新宠?
- How did Google manage to do this? Slide ActionBar in Android application
- How to use UpdateLayeredWindow with UI Controls like buttons etc?:)
- How to check backend SQL query from a OAF pages
- AngularJS 、Backbone.js 和 Ember.js 的比较
- SharePoint 2007 - /_layouts and how to create pages that run in site context
- How to convert web pages and word doc to PDF files?