Information Center

Hard drive failure exposes backup vulnerability

  

Hard disk drive replacement seems simple, but actually it is the opposite, IT departments must go deep into the core of the server to truly provide solutions.

A server in the data center is blinking amber light, which means there is a potential problem with the hard disk drive. Usually, when a light starts blinking here and there, people call for replacing the drive, hoping to hot plug the drive and take a happy way. But one experience is quite different.

On that day, two drives started blinking when they were busy. It has been listed in the staff's to-do list for several days. When Bob, another IT employee, asked whether he needed to pay attention to this situation, he handed the task to Bob for processing. He asked to deliver a new hard disk the next day.

A few days later, Bob said that the hard disk drive has been replaced, one has been rebuilt, and the other will take some time to recover.

An ominous omen

Soon, however, an employee reported that he could not access the company's shared drive. Technicians began to study it. When they contacted another user, they said they had encountered the same problem. The staff began to realize that all the signs showed that these obvious new problems were related to the recently replaced hard disk drives

The staff remotely accesses the server where the problem occurs. This server hosts five virtual servers. At this point, the heart and soul of the company, that is, the main database of the enterprise, is hosted on different physical servers.

When the staff logged in remotely, they saw a warning that the virtual disk no longer existed and realized that the two hard drives Bob had exchanged were pulled out from the same array at the same time. The original settings of the server in RAID5+0 are relatively early and have not been destroyed.

Deeper questions

After the initial rejection and the hope that the server can be started correctly, the staff turned to backup, which is said to be set to provide NAS via iSCSI. The staff has checked the logs over time, but cannot verify this because some virtual servers also include the company's backup software.

Eventually, the staff realized that backups had disappeared. It seems that the server has been replicated and exists on the same host as the original virtual server. In this case, there is obviously no benefit.

The staff are panicked about this. They hope to do something to restore and run, at least to allow users to log in (because the domain controller is erased) and access the company data migrated to NAS a few months ago.

Bob quickly rebuilt the domain controller from scratch with his colleagues after reporting the problem, Office 365 controls, print server and many other functions. In the next few weeks, the staff began to recover the missing information lost in the server, and finally mined out a large amount of data created by the virtual disk damage time.

Now is a good time for people to re-examine their core IT processes and remind them of key points:

Always check the physical location of the backup to verify its existence, not a separate backup log.

Understand the RAID array of the enterprise and the specific situation of the company or customer, and be careful when making changes.

Perform tasks, such as exchanging hard disk drives after a data disaster.

Check the backup again.

IT work should be more careful, just in case.

Don't put all your eggs (or virtual servers) in one basket.

For better implementation, please check these backups again.