您的位置:首页 > Web前端 > JavaScript

How to serve AJAX pages (Ember.js, Angular, etc) to Google's dummy bots?

2014-01-23 23:02 591 查看
There may / must be better ways but here we go.

Recipes: 

Headless browser component, e.g.
PhantomJS [1] is the preferred case because it's 1) v8, 2) lightweight, comparing to the next choince
Firefox + Xvfb. I had to use this one because my site breaks under PhantomJS (even if it works fine under Chrome)

Selenium to drive the browser and generate the HTML.
Web server that serves the bots.
As defined by Google [2], AJAX apps should use #! to indicate the bots that it's an AJAX page, and bots will try to look for ?_escaped_fragment_= URL for this AJAX address and expect a JavaScript-free page. So there must be something to run the JavaScripts,
generate proper DOM for the dummy bots. Here comes in the headless browsers.

Xvfb is a special X server that runs (at least for me) on Linux and requires no interaction with graphics devices. It renders everything inside memory so can be run on headless servers like Amazon EC2 Linux servers easily. Firefox is the de facto for Linux,
works pretty well with Xvfb, and is the default driver for Selenium so it's the definite choice.

Selenium was designed for browser based test automation. It can drives different browsers starting Firefox (with built-in support), Chrome and IE (both require extra "driver"s). In Python there's an API for Selenium but also there are easier APIs like
Splinter, which is my choice.

If simply forwarding every URL to the firefox, we're loading a page 20 - 100x slower than actually loading in Firefox, because for each resources (CSS, JavaScript, Images) the server is actually starting a new Firefox tab (if not window) to retrieve that,
while the first AJAX page would have loaded them once already. This is slow, so a hack is done here to load all static resources via Requests instead. Better optimisations available, though.

So everything together is at [3].

Good luck.

[1] PhantomJS http://phantomjs.org 
[2] Making AJAX applications crawlable https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=de-DE
[3] AJAX for dummies https://github.com/wolf0403/ajax-for-dummies
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: