Collection
zero Useful+1
zero

Data disaster recovery

Terminology in the field of computer technology
Data disaster recovery refers to establishing a remote data system , to protect data security And improve the continuous availability of data, enterprise To get from RAID Protection, redundant structure Data backup , fault warning, etc file Copy to storage device The process of backup is the most important thing to consider in the system, although they are in the overall planning of the system.
Chinese name
Data disaster recovery
Nature
Remote data system
Object
IT
Properties
Environment capable of dealing with various disasters

brief introduction

Announce
edit
For IT, data disaster recovery system is Computer Information System Provide an environment that can cope with various disasters. When the computer system suffers from irresistible natural disasters such as fire, flood, earthquake and war, as well as computer crime, computer virus, power failure, network/communication failure, hardware/software error and human operation error, it is a disaster, Disaster recovery system It will ensure the security of user data (data disaster recovery), and even a more complete disaster recovery system can provide uninterrupted application services (applications Disaster recovery )。 It can be said that the disaster recovery system is data storage The highest level of backup.
Generally speaking, in order to protect data security And improve the continuous availability of data Data backup , failure warning, etc. A complete disaster recovery system should include local disaster recovery and Remote disaster recovery This is especially true for users and industries whose key business cannot be interrupted, such as telecommunications, customs, and financial industries. Below, this article will discuss Disaster recovery system Some key technologies of, including data backup, data replication, network storage, and HP storage devices are used to illustrate how to build a three-level disaster recovery system.

Implementation mode

Announce
edit
I Data backup The so-called backup is the process of copying the necessary files of the database to the dump device through a specific method. The dump device refers to the tape or disk used to place the database copy.
The basis for selecting backup is the ratio of the cost of losing data to the cost of ensuring that data is not lost. Sometimes, hardware backup can not meet the actual needs at all, such as when you mistakenly delete a table and want to restore it, Database backup It becomes important.
Oracle provides a powerful backup and recovery strategy, including regular database backup (logical backup, Cold backup And Hot Backup )And High availability Database (such as standby database and Parallel database )The following backups mainly refer to the regular backup of the database.
1. The importance of backup
Backup is the most important thing to consider in the system. Although they even account for less than 1% in the whole system planning, development and testing process, seemingly unimportant and unknown work can really reflect its importance only when it is restored. Any data loss and long-term data downing machine cannot be received. If the backup fails to provide the necessary information for recovery, making the recovery process impossible or lasting for a long time (such as a backup scheme that has not been strictly tested), such backup is not considered or a good backup.
If there is a disaster of system crash, the database must be recovered. The success of recovery depends on two factors, accuracy and timeliness. What kind of recovery can be performed depends on what kind of backup is available. As a DBA, it is responsible for maintaining the database recoverability from the following three aspects:
(1) Minimize the number of database failures, so as to maintain the maximum availability of the database.
(2) When the database fails, the recovery time is reduced to the minimum, so as to maximize the benefits of recovery.
(3) When the database fails, ensure that the data is lost as little as possible or not at all, so as to maximize the recoverability of the data.
Data backup yes Disaster recovery It refers to the process of copying all or part of the data set from the hard disk or array of the application host to other storage media in order to prevent data loss caused by system operation errors or system failures. Traditional data backup mainly adopts built-in or external Tape drive conduct Cold backup However, this method can only prevent operation errors from being caused by others, and its recovery time is also very long. With the continuous development of technology and the massive increase of data, many enterprises begin to adopt Network Backup Network backup is generally through professional data storage The management software is realized by combining the corresponding hardware and storage devices.
2. Common Backup mode
(1) Backup data to tape regularly.
(2) Remote Tape Library Jukebox Backup. soon data transfer Make a complete backup tape or optical disc at the remote backup center.
(3) Remote critical data+tape backup. Tape backup data is used, and the production machine sends key data to the backup machine in real time.
long-range Database backup It is to create a copy of the primary database on the backup machine that is separate from the production machine where the primary database is located.
(4) Network data mirroring. This way is right production system The update of database data and important target files to be tracked is monitored and tracked, and the update log is transmitted to the backup system through the network in real time. The backup system updates the disk according to the log.
(5) Remote mirror disk. Via high speed Fibre Channel Line and disk control technology extends the mirror disk to a place far away from the production machine. The mirror disk data is completely consistent with the primary disk data, and the update mode is synchronous or asynchronous.
Data backup Data recovery must be considered, including the adoption of hot standby Disk mirroring Or fault tolerance, remote storage of backup tapes, redundancy of key components and other disaster prevention measures. These measures can be used for system recovery after system failure. But these measures can only deal with computers Single point of failure They are helpless and have no disaster recovery ability against regional and devastating disasters.
2、 Data replication
Data disaster recovery
SAN focuses on the unique problems of enterprise storage, and is mainly used in the working environment with large amount of storage. The two root causes of the problems encountered in the current enterprise storage solution are the structural restrictions caused by the close combination of data and application systems, and Small computer system interface SCSI )Limitations of standards. Most analysts believe that SAN is the future enterprise class storage solution, because SAN is easy to integrate, can improve data availability and network performance, and can also reduce Storage management Job.
SAN is recognized as the most promising storage technology solution, and the development trend of SAN in the future will be open, intelligent and integrated. NAS It is the fastest growing storage technology. However, as far as the development trend of the two is concerned, SAN and NAS will be fully integrated at the application level. It can be said that NAS and SAN technology has become the mainstream technology of data disaster recovery backup today. The key lies in how to develop and improve a comprehensive and multi-level data disaster recovery backup system based on this technology data storage Management software, combined with corresponding hardware and storage devices, can Data backup Centralized management for automated backup File archiving , hierarchical data storage, disaster recovery and other functions.

Three level system

Announce
edit
Relatively complete Disaster recovery The system design is generally three-tier architecture Disaster recovery system The whole system includes storage, backup and disaster recovery. The following backup servers produced by HP are used, modularization Disk array, backup Tape Library And relevant disaster recovery software to illustrate the establishment of a three-level architecture disaster recovery system.
1、 data storage Subsystem
Under normal circumstances, the business system runs on the main central server, and the business data is stored in the main central storage disk array EMA12000. EMA12000 has 12 disc drive Scalability to up to 126 disk drives, spanning multiple mainframes and mixed UNIX , multi vendor Windows NT, Windows 2000 and other open system platforms.
The ASC array control software designed by HP for the EMA12000 system realizes the centralized control of data across multiple server platforms, so that the availability of data can be guaranteed with true zero downtime no matter when, where, or how it is needed.
2、 Data backup Subsystem
In order to realize the real-time disaster backup function of business data, two data centers can be set for key applications, namely the main center and the backup center. Main Center system configuration The host includes two or more HP ALPHA servers and other related servers, which form a multi machine high reliability environment by forming a SCSI CLUSTER. The main center passes ATM/E3/ WDM Connect to the Backup Center.
stay Disaster recovery In the system solution, normally, the business system runs on the main central server data storage In the primary central storage disk array EMA12000, configure the EMA12000 storage disk array in the backup center. The primary central storage disk array is connected to the backup central disk array through ATM/E3/WDM, DRM (Data Replication Manager) enables the data stored in the primary center to be completely consistent with the data in the backup center in real time.
3. Disaster recovery subsystem
In the scheme, backup data Tape Library Placed in the backup center, directly connected to the Storage array EMA12000 and tape library TL895 via EBS (Enterprise Data backup )And Legato NetWorker data storage Backup of management system control system. In case of an unexpected disaster in the primary data center, the system can automatically switch to the backup data center to quickly restore the business data of the primary data center on the basis of continuous operation.
The three-level system Disaster recovery The scheme has high availability. Level 1: In order to avoid the single point failure of the system affecting the whole system, redundant means are used, ranging from host, storage device to Fiber optic adapter , all have redundant fault tolerance function; The second level, whether the host or storage device fails, can pass the primary/backup center Optical switches To ensure the integrity of communication and data; Level 3: In case of an unexpected disaster in the primary data center, the system can automatically switch to the backup data center. The scientific design of the three-level system ensures the high availability and reliability of the data disaster tolerance system.
Not only that, HP's unique HP OpenView network device management software fundamentally system management Personnel are freed. Although there are many equipment in the whole system Host system , storage device, or optical switch, Optical card , can be centrally managed and monitored through one workstation, which ensures the continuous operation of the entire business system from another aspect. In addition to normal planned downtime, the system can achieve 365 × 24 availability.

Remote disaster recovery

Announce
edit
long-range Disaster recovery As a new concept, it has been accepted by most industries in China, especially in information intensive enterprises such as finance and telecommunications data protection Our work has been put on the agenda. However, for Chinese enterprises and institutions, the realization of remote disaster recovery still faces many problems. In addition to the common factor of high investment, how to accurately understand disaster recovery and the technical problems in the specific implementation process have become barriers for enterprises to establish remote backup centers.
Before talking about disaster recovery technology, we should first understand what a disaster is. In the daily computing environment, system management People sometimes encounter system problems and interruptions, but "interruption" is not exactly equivalent to "disaster". Broadly speaking, disasters generally include three types: unpredictable natural disasters (earthquake, typhoon, flood, thunder and lightning, fire, etc.); Infrastructure damage (CPU, hard disk damage, building collapse, power interruption, etc.); Operation error (misoperation, sabotage, etc.). In short, for a computer system, all events that cause abnormal system downtime are called disasters.
According to statistics, the causes of system disasters are generally: hardware failures accounting for 44%, human errors accounting for 32%, software failures accounting for 14%, virus effects accounting for 7% Natural disaster 3%. Therefore, it is urgent to formulate and establish a complete disaster recovery plan as soon as possible to enhance the system's ability to resist disasters and minimize losses.
The concept is evolving: remote Disaster recovery Is it offsite storage?
The idea of how to keep data intact when encountering any disaster originated from computer systems. When it comes to disaster recovery, most people will immediately discuss how to connect two storage systems that are far enough away. However, the implementation of disaster recovery is not so simple. Disaster recovery pursues business continuity and requires the realization of queries and business activities on the network. It includes long-distance clusters of servers and mirror backups of servers and application systems in both places.
System of Boke Communication Company in China Manager of Engineering Department Mr. Mascong believes that real disaster recovery must meet three elements: first, the components and data in the system are redundant, that is, when one system fails, the other system can maintain data transfer Smooth; Secondly, it is long-distance. Because disasters always occur within a certain range, sufficient long-distance can ensure that data will not be destroyed by a disaster; Third, Disaster recovery system To pursue fast data recovery, also known as Disaster recovery "3R" (Redundance Remote、Replication)。
From the perspective of real-time, disaster tolerance should be divided into three levels: the lowest level is tape level disaster tolerance, the upper level is disaster tolerance with mirroring function and data recovery, and the highest level should be: image+data recovery+ Server cluster