Tech Revel

.....Guiding you to the world of technology......


Google has achieved an iconic status on the internet by maximizing the relation between what people search for and what they actually get as their search results. But do you really know how Google delivers to you the most relevant pages from the whole of internet and how it achieves this task in a fraction of a second??

If you really wish to know it, read on!!!



Google performs this task, not through some high end servers but by using thousands of ordinary computers networked together in the form of clusters.

When you perform a query(search for something) on Google, first of all the browser performs a look-up of the domain name system(DNS) to link an IP address for the Google .com domain name.Since Google Web Servers are located at numerous locations geographically,the one nearest to the the user (and having less load) is selected for performing the search. This geographical load balancing helps is preventing the overuse a particular cluster, there by ensuring that the search is performed in the minimum possible time.

So how does Google ensures that there is no failure of the total system??



Of course as the GWSs are geographically distributed ,there is a reduced possibility of the whole system collapsing due to a power failure for example.If on of the clusters even fails, the other clusters in other regions will still work thereby ensuring that Google system is up all the time!!!


So if Google uses cheap computers than how will their failure not effect the Google system??


That is the question to really ask!!! See the diagram below-



Google instead of relying on the hardware, relies on the softwares that control them.The load balancer takes you to the server of proximity and less load balance. Each server has got several copies of the whole index of the internet(...almost!!).That is roughly some hundreds of terabytes of index.
That index is divided into several small parts each of which is shown as a small square in the diagram.Now each small part is having a cluster of computers . Each computer of that cluster is having all the components of the index which the cluster has got. In other words ,all the computers of a cluster have the same copy of internet index. So even if one of the systems get down, information can be retrieved from others systems of that cluster. So the internet index never dies on the cheap Google systems.


The diagram below shows one such Google cluster.





Besides making the information always available, this system introduces a whole lot of parallelism in Google web searches which further improves the speed of the internet search!!!

From the indexes, the search moves to Google Docs where information corresponding to most relevant indexes is retrieved.This information is the ordered in accordance with the relevance with the query.Almost in parallel advertisements corresponding to the search results are searched by the servers.
The final result you see is the output of all these and more complex procedures performed in less than a second!!!!!

A blog post soon on how Google actually algorithmically brings out the most relevant search results!!!!!