Information Center

Hidden danger of "cold" under the upsurge of computing power

  

Continuous cooling is crucial to the data center. Due to the interruption of the power supply and the failure of refrigeration equipment, the data center's cooling will be interrupted, and servers and network equipment will go on strike due to overheating. The business they carry will be interrupted, and the number of unsaved data will be lost. This will not only cause serious economic losses, but also cause serious damage to the reputation of enterprises. When you browse websites and use APP daily, you will encounter online service interruption, many of which are due to cooling failures in the data center. According to the research of the authoritative Uptime Institute, the failure rate of the refrigeration system has exceeded that of the IT system in recent years, becoming the power supply ECS The largest reason for data center downtime after network failure.

Continuous cooling problem of small and medium-sized data centers

How to solve the service interruption of data center caused by refrigeration failure has always been a major issue in the industry. Over the years, through continuous improvement of construction standards, strengthening disaster recovery systems, improving emergency plans and other measures, most large and super large data centers have been able to easily cope with refrigeration failure and achieve continuous refrigeration to ensure business stability. For small and medium-sized data centers, because of their small business scale and low power density of a single cabinet, they are often not equipped with continuous cooling, which means that small and medium-sized data centers are more prone to high temperature downtime when facing the power outage. With the increase of single cabinet power of small and medium-sized data centers, this problem is facing a growing trend. In recent years, the service interruptions of small and medium-sized data centers caused by refrigeration system failures can be described as endless. In the middle of October, a small and medium-sized data center in Guangzhou experienced a refrigeration system failure, which led to the temperature rise of the machine room and the strike of some servers, The business of multiple customers is affected. Previously, in May, a small self use data center of an organization in Shanghai had a cooling system failure, which caused the temperature of the computer room to exceed 40 degrees Celsius, and some servers carrying business automatically shut down. In December last year, a data center in Hong Kong had a cooling failure, which caused the Macao Monetary Authority, Lianhua Satellite TV and a large number of Hong Kong and Macao enterprises and media to be unable to access normally. Last August, a small and medium-sized data center in Nanjing experienced server downtime due to overheating, and failed to restart the refrigeration system for many times, resulting in business interruption of more than 3 hours

The power supply is interrupted, and the downtime only takes "minutes"

Traditional small and medium-sized data centers are often characterized by small device density, low server power, open space, etc. At the same time, due to cost, energy supply, use space and other factors, small and medium-sized data centers are rarely equipped with backup systems such as condensate towers and air conditioning UPS that are common in large data centers, and basically do not have refrigeration redundancy and disaster recovery systems. When such small and medium-sized data centers have cooling outages, they often rely on the original room space for cold storage, window ventilation, fan blowing and other means to carry over the time gap of equipment restart. But today, things are changing. With the deepening of digital transformation, all walks of life have moved their businesses to online, and their dependence on online services has been increasing. Once the service is interrupted, the loss will be incalculable. At the same time, due to the migration of business and the application of various digital tools, the computing power scale of small and medium-sized data centers has been greatly improved, IT load and energy consumption are rising synchronously. In this case, once a cooling failure occurs, the server temperature will soar to the extent that it cannot operate normally within a few minutes. According to the white paper "Data Center Temperature Rise During Cooling System Interruption", It only takes about 5 minutes after the cooling of the machine room is interrupted, "the temperature of all positions reaches the unacceptable temperature range" With the increase of IT load, the time for stable operation of high-power and high-density cabinets in the case of cooling interruption is also shrinking. The measured data shows that the traditional After the 3KW cabinet loses cooling, the server thermal protection shutdown time is about 480s, while the 4KW cabinet is shortened to 300s. When the cabinet density reaches 8KW, the thermal protection shutdown time is reduced to less than 240s, only half of the 3KW cabinet

Continuous cooling is imperative for small and medium-sized data centers

Continuous cooling has become an indispensable requirement for small and medium-sized data centers. Especially at present, small and medium-sized data centers have gradually become a new form with high equipment density, high server power and closed space. Under this new situation, new refrigeration schemes must be adopted, and continuous refrigeration is facing strong demand. However, the continuous cooling of small and medium-sized data centers is not as simple as it sounds. Common continuous cooling schemes in the industry include cold storage tanks, air conditioning dedicated UPS, etc., which are not suitable for small data centers The cold storage tank is mainly used in large water cooling data centers. The diameter of the cold storage tank itself is about 10 meters, and the height is even more than tens of meters, not to mention the chiller, evaporation tower and various large pipes in the water cooling system. Many small and medium-sized data centers have limited space. Many are in office buildings and base stations, so it is impossible to equip such exaggerated cold storage tanks. Another scheme is to equip the cold storage system with a separate UPS, which is designed to deal with the cooling interruption caused by sudden power failure. However, traditional small and medium-sized data centers often use small power UPS and constant frequency air conditioners with large starting current. Considering the large starting current of air conditioning equipment, Virtual machine The UPS capacity needs to reach 6~8 times of the air conditioning power, which will greatly increase the investment in the construction of the computer room, and it is rarely used in the actual computer room.

The current feasible scheme is to use frequency conversion precision air conditioners, and equip the air conditioners with UPS and battery backup. In this case, precision air conditioners, as power equipment, will generate harmonics during operation, and relevant means of harmonic suppression or compensation need to be added. At the same time, the failure rate of power equipment such as air conditioners is generally higher than that of electronic information equipment. If UPS is used to supply power for air conditioners, it is necessary to consider that in the event of sudden short circuit of air conditioners and other abnormalities, faults can be quickly isolated to avoid affecting other equipment at the back end of UPS.

Therefore, in a new form and application scenario, Small and medium-sized data centers need to find a suitable way of continuous cooling This is the demand of industrial development, and also the important task of ensuring the digital process.

Only by adopting efficient and reliable cooling technology and establishing a perfect continuous cooling mechanism can we ensure the continuity and stability of the business of small and medium-sized data centers and provide users with better service experience.