Information Center

How to prevent heatstroke in the data center?

  

Even if it rains, I don't know where spring is going. After the beginning of summer, the extreme heat became a matter of course; Continuous high temperature and humidity, intermittent thunderstorms and typhoons, all of which demonstrate the nature's defiance. The data center practitioners facing the enemy are waiting for the summer defense battle of the data center without gunpowder smoke. This article is about to reveal the tactics of the data center's summer defense war.

1、 Know yourself and your enemy, and you will never be defeated

To know oneself and the enemy is to know the battlefield situation and the advantages and disadvantages of the enemy and ourselves. For data center managers, the first thing is to weave a highly reliable information network.

1、 weather forecast

In the battle of Chibi, Zhuge's military strategist watched the sky at night and used the east wind to defeat Cao Cao's 800000 troops. Although Zhuge has passed away and our generation is unable to see the sky, modern weather forecast can be regarded as an important source of information. With the continuous improvement of weather forecast accuracy, the meteorological information in the next 15 days has provided sufficient preparation time for the data center operation guarantee.

2、 Power supply information

Summer is also an important period for power system to carry out large-scale maintenance. For example, if the power supply line is transferred to power supply for maintenance, the power failure time of the line may be more than 2 days. If extreme conditions such as thunderstorms and gales occur at this time, the operating pressure of the data center will increase abruptly. Therefore, the timely and accurate power supply information is very important.

3、 Municipal water supply

For the data center of water cooling system, municipal water supply is an indispensable and important resource. Especially in hot summer, the shortage of municipal water supply will have a fatal impact on the operation of the data center. Therefore, in addition to mastering the water storage in the reservoir, attention should also be paid to the municipal water supply information in a timely manner.

4、 Business planning

Summer is often a rare time for Internet application carnival. The unbridled graduation season and the exciting European Cup all foreshadowed another climax of the Internet business. In terms of business support, the data center will also face higher business load.

5、 Data center

When the above four intelligence collections are completed, it is not far from "knowing the enemy". The data center situation is the only way for us to measure our own strength to reach the "bosom friend". Only by combing and evaluating the infrastructure operation and maintenance, spare parts storage, emergency plans and drills, can we be confident.

2、 If you anticipate everything, you will stand up; if you don't, you will lose

For the digital economy, in the face of many uncontrollable external factors, without a mature security mechanism, it will face challenges, even panic. So how can we keep the same to cope with changes?

1、 Routine maintenance

Routine maintenance is a systematic maintenance (or repair and maintenance) work for the data center. It is an essential and important link to ensure that all systems in the data center are in good operating condition and rooted in the daily work. For example:

 

2、 Patrol inspection of high-risk equipment

At the same time of routine maintenance, we need to carry out targeted patrol inspection for high-risk equipment. After four years of cooperation, Tencent Data Center has established and improved the quarterly inspection mechanism for high-risk equipment with operators. We exchange what we need in an open and inclusive manner, providing strong guarantee for the basic implementation of the data center. Over the past four years, through patrol inspection of high-risk equipment, a number of high-risk risk hazards have been identified and disposed of at the first time under the coordination of all parties, ensuring the operation safety of the data center. Common high-risk risks are as follows:

 

In addition to the inspection content of infrastructure equipment, in recent years we have also expanded the scope of inspection to the security field of the data center, and comprehensively evaluated the physical security, personal security, and information security.

3、 Spare parts reserve

In addition to the common spare parts of each major system of the data center infrastructure (see Tencent Data Center Public Account Article "Management of Spare Parts of Data Center Infrastructure"), we solemnly recommend several killer weapons, which can help you turn the tide at a critical moment.

 WeChat screenshot _20160627154646

4、 Emergency plan and drill

In order to improve the emergency response ability of the operation and maintenance personnel of the data center, the formulation and exercise of the emergency plan is crucial. The emergency plan shall cover the common emergency scenarios of the data center and be operable. Common emergency response plans include:

 WeChat screenshot _20160627154659

The formulation of emergency plans is usually not the bottleneck, and whether emergency drills can be carried out in a serious manner often determines the real response ability of the team. Now we will present a group of photos of the flood control emergency response drill of a Tencent data center: under the temperature of 30 ℃, the emergency rescue personnel were fully armed and carried out the drill in strict accordance with the emergency plan; In this exercise, sweating, they demonstrated the rigorous and dedicated attitude of the data center operation and maintenance personnel with a textbook style exercise.

 WeChat screenshot _20160627154724

5、 Emergency response team

In order to ensure the orderly development of emergency response, an emergency response command team should be established to ensure that each emergency response personnel takes his own position. At the same time of daily operation and maintenance scheduling, give full play to the rapid emergency support ability of the standby personnel to ensure that there is sufficient manpower support in the first time.

3、 It takes a thousand days to raise troops and use them for a while

Although, in the planning and design stage, the system redundancy is ensured from the architecture, such as introducing the main and standby mains lines from different substations; In the operation guarantee, we also promote and implement according to the above established process. However, the impact of unstable factors such as extreme weather should not be underestimated.

Since the beginning of summer this year, rainstorm and thunderstorm weather have increased significantly. Faced with the impact of extreme weather, Tencent Data Center responded in an orderly manner to ensure the normal operation of business. On June 4, severe thunderstorm weather occurred in Shenzhen, and the abnormal situation of city power supply occurred in many regions. In two phases of a data center in Tencent, a total of four high-voltage mains power supplies (from four different substations) were broken. The operation and maintenance team of the data center quickly took emergency response measures, including diesel generator system loading, cold storage tank discharging and other measures, to smoothly ensure the normal operation of the business.

 WeChat screenshot _20160627154735

Soldiers often say, "Use me in the first battle, and I will win". In fact, this is also the inner reflection of the data center operation and maintenance personnel, who are ready to fight to defeat the one in ten thousand chance.

 WeChat screenshot _20160627154748

When a war of defense without gunpowder started quietly, the data center operation and maintenance soldiers faced the horn. They didn't even have time to appreciate the blue sky and white clouds, but their sweat will surely condense into the cloud sea of the Internet.

The evening drum and morning bell go with the wind, and the cold and summer alternate and spring comes again. In fact, for the data center operation and maintenance team, summer security is just a microcosm of our data center operation and maintenance security.