Monday, 30 June 2014

The Technology behind Google’s great results..

As a Google user, you’re familiar with the speed and accuracy of a Google search. How exactly does Google manage to find the right results for every query as quickly as it does? The heart of Google’s search technology is PigeonRank™, a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University.
Building upon the breakthrough work of B. F. Skinner, Page and Brin reasoned that low cost pigeon clusters (PCs) could be used to compute the relative value of web pages faster than human editors or machine-based algorithms. And while Google has dozens of engineers working to improve every aspect of our service on a daily basis, PigeonRank continues to provide the basis for all of our web search tools.

Tuesday, 3 June 2014

Searchable Dynamic Content With AJAX Crawling:


Google Search likes simple, easy-to-crawl websites. You like dynamic websites that show off your work and that really pop. But search engines can’t run your JavaScript. That cool AJAX routine that loads your content is hurting your SEO.
Google’s robots parse HTML with ease; they can pull apart Word documents, PDFs and even images from the far corners of your website. But as far as they’re concerned, AJAX content is invisible.

The Problem With AJAX

AJAX has revolutionized the Web, but it has also hidden its content. If you have a Twitter account, try viewing the source of your profile page. There are no tweets there — just code! Almost everything on a Twitter page is built dynamically through JavaScript, and the crawlers can’t see any of it. That’s why Google developed AJAX crawling.
Because Google can’t get dynamic content from HTML, you will need to provide it another way. But there are two big problems: Google won’t run your JavaScript, and it doesn’t trust you.
Google indexes the entire Web, but it doesn’t run JavaScript. Modern websites are little applications that run in the browser, but running those applications as they index is just too slow for Google and everyone else.
The trust problem is trickier. Every website wants to come out first in search results; your website competes with everyone else’s for the top position. Google can’t just give you an API to return your content because some websites use dirty tricks like cloakingto try to rank higher. Search engines can’t trust that you’ll do the right thing.
Google needs a way to let you serve AJAX content to browsers while serving simple HTML to crawlers. In other words, you need the same content in multiple formats.

Two URLs For The Same Content

Let’s start with a simple example. I’m part of an open-source project called Spiffy UI. It’s a Google Web Toolkit (GWT) framework for REST and rapid development. We wanted to show off our framework, so we made SpiffyUI.org using GWT.
GWT is a dynamic framework that puts all of our content in JavaScript. Our index.htmlfile looks like this:
<body>
   <script type="text/javascript" language="javascript"
   src="org.spiffyui.spsample.index.nocache.js"></script>
</body>
Everything is added to the page with JavaScript, and we control our content with hash tags (I’ll explain why a little later). Every time you move to another page in our application, you get a new hash tag. Click on the “CSS” link and you’ll end up here:
http://www.spiffyui.org#css
The URL in the address bar will look like this in most browsers:
http://www.spiffyui.org/?css
This simple hash works well for our application and makes it bookmarkable, but it isn’t crawlable. Google doesn’t know what a hash tag means or how to get the content from it, but it does provide an alternate method for a website to return content. So, we let Google know that our hash is really JavaScript code instead of just an anchor on the page by adding an exclamation point (a “bang”), like this:
http://www.spiffyui.org#!css
This hash bang is the secret sauce in the whole AJAX crawling scheme. When Google sees these two characters together, it knows that more content is hidden by JavaScript. It gives us a chance to return the full content by making a second request to a special URL:
http://www.spiffyui.org?_escaped_fragment_=css
The new URL has replaced the #! with ?_escaped_fragment_=. Using a URL parameter instead of a hash tag is important, because parameters are sent to the server, whereas hash tags are available only to the browser.