Friday, June 11, 2010

Google Caffeine ( A complete idiots guide?? ) :P

The recent buzzword from Google. So what the hell is it?

Ok first of all, Caffeine has nothing to do with the visibility of google. If you've heard this term and you go to google and type a query and think everything seems to be the same and nothing has changed, its because caffeine is purely a backend (something that is not very apparent to a layman user) change and if you had noticed it, probably you might be seeing better results for your search query (or atleast different ones than you saw last time for the same query).

Before looking into google caffeine, lets first see how basically google works. When you search for something in google, it doesn't actually search the web, it searches the index of the web that it has stored in its exabytes of memory (1 exabyte = 1024 petabytes, 1 petabyte = 1024 terabytes, 1 terabyte = 1024 gigabytes). So what exactly is an index? Index is something like you find at the back of every book. Its a set of keywords along with page numbers, except that its web urls in this case instead of page numbers.

WOW! An index for the entire web !! Thats pretty huge and interesting. So how did google manage to build it? Thats not the question i'm gonna answer here as that infrastructure has evolved over years since its inception. The problem with this approach is that a book is static and once its printed nothing ever changes after. Whereas the world wide web is highly (probably the best example of) dynamic. Url's keep changing. Millions of pages are added and removed every day. Now comes the buzzword into picture, the process by which google lives up to the dynamic nature of the world wide web is known as indexing (the process of finding newly added pages is known as crawling).

Ok enough of background stories, come to the point. Google Caffeine, is the new algorithm that is gonna govern how google carries out its indexing process. The thing is is, they had an old layered architecture (for example, one layer for each type of content like text, image, flash, etc.) in which every layer was refreshed at a certain rate and the information was available to us only when the entire layer has refreshed. That means a significant delay in the time at which the information was actually updated by google and the time at which the information was actually available to us via google search. Whereas with caffeine, they are eliminating the layered architecture and converting it into several smaller portions so that the delay between information retrieval and availability is significantly reduced.

Which is clearly depicted in the image from the official google blog:


Here is an excerpt from google's official blog:

"Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles."

Whether caffeine makes a difference to search results and relavence today or not, it is sure to govern the future of the world wide web wherein data is tending to change to the oftenest possible and smaller partitions of processing means better local search results.

Points to note about Google Caffeine:

  • Google announced the plan for Caffeine around the same time Bing was launched (fishy, they probably had something in mind then ?)

  • Google Caffeine might force the Search Engine Optimizers to change their existing models very significantly (some blogs say that Google has been hinting about it all the time)

  • A nice site that compares the new and old results for any given query and displays it side by side (there seems to be a good difference after all !!) - Compare Caffeine

  • Unlike Google's other search features, google has offered developers preview for webmasters and power searchers and asked them about their feedbacks

  • With yahoo out of search soon, here is a post in mashable comparing old google, new google and bing - Mashable Caffeine Review


Comments welcome as usual. But i guess i wont get any (as usual) ;-)

-Vignesh