This is one of those questions that you can never get an accurate answer even if you were to search on Google. On a funnier note, it is like asking your mother when you were young how many stars are there in the Universe.
As we all know it, new PCs and phones get rolled out every few months. Commonly, a PC holds Terabytes of storage data while a smartphone holds about 32GB. Nevertheless, even with all the shiny new upgrades to both hardware and software, our devices are incapable of effortlessly and efficiently storing all our files i.e. documents, music, videos, etc.
On the other hand, Google seemingly holds the answer to almost any question asked in the world and can generate millions of search results in the matter of milliseconds. Unfortunately, the search giant does not provide and report numbers on the amount of data it possesses. Hence, we shall use the statistics and assumptions to make the most accurate guess possible for this impossible question.
Before we carry on, here are a few terms to get yourself familiar with to comprehend the massive chunk of text ahead:
• 1 Petabyte (PB) is equivalent to 1024 Terabytes (TB)
• 1 Exabyte (EB) is comparable to 1024 Petabytes or around 1 million TB
• A data centre is like a virtual warehouse for information and data. It holds petabytes to exabytes of data. Even though Google does not take the record of holding the biggest data centre, it still handles an enormous amount of data.
• Google converted its search indexing systems to MapReduce systems in 2003 and has been used at their data centres ever since. An average MapReduce job runs across a $1 million hardware cluster. This enormous cost does not include bandwidth fees, data centre costs, or staffing. The video below nailed the concept of MapReduce:
Analysis on Google Search
A long way back to 2008, it has been reported that Google handles around 20 Petabytes per day with an average of 100,000 MapReduce jobs spread across its massive computing clusters. In internet terms, 2008 is indeed a long way back. We can safely assume by applying Moore’s law prediction to it that this number has grown exponentially to at least 200 Petabytes per day.
Currently, Google processes its data on a standard machine cluster node consisting two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. This setup costs approximately $2400 each through providers such as Penguin Computing or Dell or roughly $900 a month through a managed hosting provider such as Verio (for startup comparisons).
We also know that Google processes over 40,000 search queries every second on average today, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.
The search giant also collaborates with other data centres to store their data. Each data centre would cover an area of 20 Football fields combined. Calculating this huge amount of data is hard. But with some educated guessing using the capital expenditures at remote locations and electricity consumption at each of the data centres and number of servers they have respectively, we can come to a conclusion that Google holds 10-15 Exabytes of data. This amount of data translates to the storage of 30 Million PCs combined.
So now when someone stops you to ask you how much data does Google handle, you can bravely answer that they manage at least 10-15 Exabytes of data.
Side note: Companies like Facebook and Google are actively collecting user data. One of its known purposes is not to improve your user experience as publically claimed, but to improve their ad delivery system. This smart way of delivering advertisements guarantees a higher clickthrough rate, resulting in a further optimisation of ad revenue.