Information Center

Alibaba Cloud went down in September, and Wu Hanqing wrote a detailed explanation of the accident

  

Core tips: In the early morning of September 3, Wu Hanqing, a senior director of Alibaba and head of Alibaba Cloud Yundun (online name: Brother Tao, Spike), posted a message on WeChat, telling the whole story of the accident.

Recently, a news spread on social media, and many customers reported that Alibaba Cloud had a large-scale failure on September 1, and all the basic commands of customers could not be run. In the early morning of September 3, Wu Hanqing, a senior director of Alibaba and head of Alibaba Cloud Yundun (online name: Brother Tao, Spike), posted a message on WeChat, telling the whole story of the accident.

The fault is caused by the programmer writing a wrong line of code

First of all, Wu Hanqing admitted in the article that Alibaba Cloud did have a failure. "On September 1, I experienced a very unforgettable day. The upgrade of Alibaba Cloud Yundun's Anqi product triggered a bug, which caused some normal files in the user's ECS to be mistakenly isolated. During the recovery period, friends kept asking me, and all Alibaba Cloud customer services were busy with this problem. I, the person in charge of Yundun, was also busy with this problem."

 twenty trillion and one hundred and fifty billion nine hundred and six million one hundred and fifty-two thousand seven hundred and forty

Wu Hanqing also gave a clear answer to the cause of the accident that people are most concerned about. In short, the programmer wrote a line of code incorrectly. "The reason for this failure is that the engineer carelessly wrote a wrong line of code, so that all newly launched executable files were isolated as malicious files. Due to our previous lack of design, there was no quick recovery mechanism for this special exception, and only a temporary program was written for emergency recovery, so the entire failure lasted for a long time."

As for the possible sequelae of the accident, Wu Hanqing assured everyone in the article that "this failure will not cause any data loss, nor will it cause data leakage, as some rumors said."

Alibaba Cloud proposes a compensation plan, including pleading guilty

In order to appease customers, Alibaba Cloud has immediately launched a 100 times time compensation plan after this failure recovery. Wu Hanqing said in the article that he would also give more compensation to customers, including:

1. For customers affected by this failure, we will give a batch of Yundun paid products, including: elastic security network, situation awareness, and Anqi cloud hosting. We will draw up a plan and open the service in the near future, and customers can choose not to use it.

2. Anqi will provide a convenient and quick one button shutdown function as soon as possible.

3. We will write a handwritten apology letter to the affected customers.

4. For customers with great influence, we will immediately go to the door to apologize and face your anger and suggestions.

The insecurity of the Internet is beyond your imagination

Wu Hanqing, who graduated from the youth class of Xi'an Jiaotong University, is one of the most influential figures in the field of Internet security in China and the youngest senior security expert of Alibaba Group. After the accident, Wu Hanqing could not help sighing: "The insecurity of the Internet is beyond your imagination."

In fact, in recent years, downtime accidents of cloud service providers are not uncommon. Alibaba Cloud also actively participated in China's trusted cloud certification, and obtained the first batch of trusted cloud service certification. However, the occurrence of downtime still shows that network security still needs more efforts.

Wu Hanqing wrote in the article, "The culture I have always advocated and emphasized within the team is" transparency ". I hope everyone's work can be transparent, so we will put up Yundun billboards on the wall to show the work and progress of each team transparently, so that all passers-by can see; I hope that our products can become more transparent and make everything they do transparent to users, especially those operations that require user authorization and permission. In this regard, we have not done well in the past, and we are trying to change it. This time, Anqi mistakenly isolated the function involved in the user's normal file, which is an optional function in the design, but other users are configured because of the bug, which is also a very serious mistake. "

Query on trust

After the accident, many people questioned why Alibaba Cloud, as a cloud computing service provider, can delete files from a client's server?

Wu Hanqing explained that this is related to Alibaba Cloud's security model. "The IT industry in China is very different from that in the United States, IDC has a variety of mature IT solutions and commercial products, and the U.S. enterprise security market has a strong ability to pay, so the U.S. security market has been relatively mature. "

"AWS adopts a shared responsibility model for customer security, that is, AWS is only responsible for its own security as a cloud computing platform, and chooses to hand over customer security to a third-party security vendor."

"In China, a large number of SME customers are actually in a naked state, and their security needs are often not met. In this market environment, in order to better foster the cloud computing market and enable customers to focus more on their own business, we have integrated self-developed security products into cloud computing solutions. We hope to share Alibaba's years of experience in security technology with all Alibaba Cloud customers. This is a series of products of Yundun. "