Information Center

Google Open Data Center Network Design

  

Core tips: Google's data center and infrastructure are recognized as the most advanced in the industry. In the past, they have been very strict about confidentiality, so all relevant information disclosure will attract attention. After all, today's popular Hadoop was originally just a few papers written by copycat people.

China IDC Circle reported on June 23, Google's data center and infrastructure are recognized as the most advanced in the industry. In the past, they have been very strict about confidentiality, so all relevant information disclosure will attract attention. After all, today's popular Hadoop was originally just a few papers written by copycat people.

The article on NetEase introduced the network design in Google data center, which was very scarce before. The article should be extracted from Wired. Wired articles are characterized by a lot of gossip information (because they want to write stories, not technology itself), few technical details, and frequent technical errors (at least the data in this article about the Jupiter switch's processing ability is outdated), but it is still useful for understanding the context of a matter.

The general meaning of the article is that Google has started to develop its own network equipment for a long time. Because their system is growing rapidly, Cisco and other equipment can not meet the needs (it is too expensive to be used). Anyway, it is the same as other fields of cloud computing: Internet companies have developed far beyond the scope of traditional IT, so they have to play by themselves.

In any case, companies like Google can recruit the best talents, and the industry does not directly recruit professors from schools (Amin Vahdat was invited to PortLand because he has done relevant research). Then use ordinary chips and Linux to get the switch, design the network protocol, and design the network control software. Finally, a super large network system is created. Related technologies include SDN.

The striking point is: "The amount of data exchanged between data centers within Google's network has exceeded the amount of data exchanged between Google and the entire Internet."

Fortunately, Fellow Amin Vahdat, Google's network manager, also wrote an official blog introducing their data center network design, emphasizing that it was the first time to disclose the details of Google's internal five generation network technology. From Firehose to Jupiter, the latest Jupiter can provide a total split bandwidth of 1Pb/s, enough to exchange information between 100000 servers at 10Gb/s, All scan data from the Library of Congress can be read in one tenth of a second.

However, the article is very short and vague, only stating a few principles:

The network is arranged in Clos topology. This network configuration uses a group of smaller (cheaper) switches to provide the function of a much larger logical switch.

The centralized software control stack is used to manage thousands of switches in the data center and make them work like a large fabric.

Self built software and hardware (using chips from Broadcom and other vendors) do not rely too much on standard Internet protocols, and more use protocols customized according to the data center. (Wired said that the custom protocol is called Firepath, which is simpler, faster and easier to expand than BGP and OSPF.)

Fortunately, Urs H ö lzle, the boss of Google's infrastructure management, commented below the blog: "Wait for our paper, SIGCOMM 2015 in August." Their paper submitted on SIGCOMM is entitled "Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network".

Let's wait and see.