Data disaster recovery refers to establishing a remotedata system , to protectdata securityAnd improve the continuous availability of data,enterpriseTo get fromRAIDProtection, redundant structureData backup, fault warning, etcfileCopy tostorage deviceThe process of backup is the most important thing to consider in the system, although they are in the overall planning of the system.
Chinese name
Data disaster recovery
Nature
Remote data system
Object
IT
Properties
Environment capable of dealing with various disasters
For IT, data disaster recovery system isComputer Information SystemProvide an environment that can cope with various disasters.When the computer system suffers from irresistible natural disasters such as fire, flood, earthquake and war, as well as computer crime, computer virus, power failure, network/communication failure, hardware/software error and human operation error, it is a disaster,Disaster recovery systemIt will ensure the security of user data (data disaster recovery), and even a more complete disaster recovery system can provide uninterrupted application services (applicationsDisaster recovery)。It can be said that the disaster recovery system isdata storage The highest level of backup.
Generally speaking, in order to protectdata securityAnd improve the continuous availability of dataData backup, failure warning, etc.A complete disaster recovery system should include local disaster recovery andRemote disaster recovery。This is especially true for users and industries whose key business cannot be interrupted, such as telecommunications, customs, and financial industries.Below, this article will discussDisaster recovery systemSome key technologies of, including data backup, data replication, network storage, and HP storage devices are used to illustrate how to build a three-level disaster recovery system.
Implementation mode
Announce
edit
IData backupThe so-called backup is the process of copying the necessary files of the database to the dump device through a specific method.The dump device refers to the tape or disk used to place the database copy.
The basis for selecting backup is the ratio of the cost of losing data to the cost of ensuring that data is not lost. Sometimes, hardware backup can not meet the actual needs at all, such as when you mistakenly delete a table and want to restore it,Database backupIt becomes important.
Oracle provides a powerful backup and recovery strategy, including regular database backup (logical backup,Cold backupAndHot Backup)AndHigh availabilityDatabase (such as standby database andParallel database)The following backups mainly refer to the regular backup of the database.
1. The importance of backup
Backup is the most important thing to consider in the system. Although they even account for less than 1% in the whole system planning, development and testing process, seemingly unimportant and unknown work can really reflect its importance only when it is restored. Any data loss and long-term data downing machine cannot be received.If the backup fails to provide the necessary information for recovery, making the recovery process impossible or lasting for a long time (such as a backup scheme that has not been strictly tested), such backup is not considered or a good backup.
If there is a disaster of system crash, the database must be recovered. The success of recovery depends on two factors, accuracy and timeliness.What kind of recovery can be performed depends on what kind of backup is available.As a DBA, it is responsible for maintaining the database recoverability from the following three aspects:
(1) Minimize the number of database failures, so as to maintain the maximum availability of the database.
(2) When the database fails, the recovery time is reduced to the minimum, so as to maximize the benefits of recovery.
(3) When the database fails, ensure that the data is lost as little as possible or not at all, so as to maximize the recoverability of the data.
Data backupyesDisaster recoveryIt refers to the process of copying all or part of the data set from the hard disk or array of the application host to other storage media in order to prevent data loss caused by system operation errors or system failures.Traditional data backup mainly adopts built-in or externalTape driveconductCold backup。However, this method can only prevent operation errors from being caused by others, and its recovery time is also very long.With the continuous development of technology and the massive increase of data, many enterprises begin to adoptNetwork Backup。Network backup is generally through professionaldata storage The management software is realized by combining the corresponding hardware and storage devices.
(3) Remote critical data+tape backup.Tape backup data is used, and the production machine sends key data to the backup machine in real time.
long-rangeDatabase backup。It is to create a copy of the primary database on the backup machine that is separate from the production machine where the primary database is located.
(4) Network data mirroring.This way is rightproduction system The update of database data and important target files to be tracked is monitored and tracked, and the update log is transmitted to the backup system through the network in real time. The backup system updates the disk according to the log.
(5) Remote mirror disk.Via high speedFibre ChannelLine and disk control technology extends the mirror disk to a place far away from the production machine. The mirror disk data is completely consistent with the primary disk data, and the update mode is synchronous or asynchronous.
Data backupData recovery must be considered, including the adoption ofhot standby 、Disk mirroringOr fault tolerance, remote storage of backup tapes, redundancy of key components and other disaster prevention measures.These measures can be used for system recovery after system failure.But these measures can only deal with computersSingle point of failureThey are helpless and have no disaster recovery ability against regional and devastating disasters.
2、 Data replication
Data disaster recovery
SAN focuses on the unique problems of enterprise storage, and is mainly used in the working environment with large amount of storage.The two root causes of the problems encountered in the current enterprise storage solution are the structural restrictions caused by the close combination of data and application systems, andSmall computer system interface(SCSI)Limitations of standards.Most analysts believe that SAN is the future enterprise class storage solution, because SAN is easy to integrate, can improve data availability and network performance, and can also reduceStorage managementJob.
SAN is recognized as the most promising storage technology solution, and the development trend of SAN in the future will be open, intelligent and integrated.NASIt is the fastest growing storage technology. However, as far as the development trend of the two is concerned, SAN and NAS will be fully integrated at the application level.It can be said that NAS and SAN technology has become the mainstream technology of data disaster recovery backup today. The key lies in how to develop and improve a comprehensive and multi-level data disaster recovery backup system based on this technologydata storage Management software, combined with corresponding hardware and storage devices, canData backupCentralized management for automated backupFile archiving, hierarchical data storage, disaster recovery and other functions.
Three level system
Announce
edit
Relatively completeDisaster recoveryThe system design is generally three-tier architectureDisaster recovery systemThe whole system includes storage, backup and disaster recovery.The following backup servers produced by HP are used,modularizationDisk array, backupTape LibraryAnd relevant disaster recovery software to illustrate the establishment of a three-level architecture disaster recovery system.
Under normal circumstances, the business system runs on the main central server, and the business data is stored in the main central storage disk array EMA12000.EMA12000 has 12disc driveScalability to up to 126 disk drives, spanning multiple mainframes and mixedUNIX, multi vendor Windows NT, Windows 2000 and other open system platforms.
The ASC array control software designed by HP for the EMA12000 system realizes the centralized control of data across multiple server platforms, so that the availability of data can be guaranteed with true zero downtime no matter when, where, or how it is needed.
In order to realize the real-time disaster backup function of business data, two data centers can be set for key applications, namely the main center and the backup center.Main Centersystem configurationThe host includes two or more HP ALPHA servers and other related servers, which form a multi machine high reliability environment by forming a SCSI CLUSTER.The main center passes ATM/E3/WDMConnect to the Backup Center.
stayDisaster recoveryIn the system solution, normally, the business system runs on the main central serverdata storage In the primary central storage disk array EMA12000, configure the EMA12000 storage disk array in the backup center.The primary central storage disk array is connected to the backup central disk array through ATM/E3/WDM,DRM(Data Replication Manager) enables the data stored in the primary center to be completely consistent with the data in the backup center in real time.
3. Disaster recovery subsystem
In the scheme, backup dataTape LibraryPlaced in the backup center, directly connected to theStorage arrayEMA12000 and tape library TL895 via EBS (EnterpriseData backup)And Legato NetWorkerdata storage Backup of management system control system.In case of an unexpected disaster in the primary data center, the system can automatically switch to the backup data center to quickly restore the business data of the primary data center on the basis of continuous operation.
The three-level systemDisaster recoveryThe scheme has high availability.Level 1: In order to avoid the single point failure of the system affecting the whole system, redundant means are used, ranging from host, storage device toFiber optic adapter, all have redundant fault tolerance function;The second level, whether the host or storage device fails, can pass the primary/backup centerOptical switchesTo ensure the integrity of communication and data;Level 3: In case of an unexpected disaster in the primary data center, the system can automatically switch to the backup data center.The scientific design of the three-level system ensures the high availability and reliability of the data disaster tolerance system.
Not only that, HP's unique HP OpenView network device management software fundamentallysystem managementPersonnel are freed.Although there are many equipment in the whole systemHost system, storage device, or optical switch,Optical card, can be centrally managed and monitored through one workstation, which ensures the continuous operation of the entire business system from another aspect.In addition to normal planned downtime, the system can achieve 365 × 24 availability.
Remote disaster recovery
Announce
edit
long-rangeDisaster recoveryAs a new concept, it has been accepted by most industries in China, especially in information intensive enterprises such as finance and telecommunicationsdata protectionOur work has been put on the agenda.However, for Chinese enterprises and institutions, the realization of remote disaster recovery still faces many problems.In addition to the common factor of high investment, how to accurately understand disaster recovery and the technical problems in the specific implementation process have become barriers for enterprises to establish remote backup centers.
Before talking about disaster recovery technology, we should first understand what a disaster is.In the daily computing environment,system managementPeople sometimes encounter system problems and interruptions, but "interruption" is not exactly equivalent to "disaster".Broadly speaking, disasters generally include three types: unpredictable natural disasters (earthquake, typhoon, flood, thunder and lightning, fire, etc.);Infrastructure damage (CPU, hard disk damage, building collapse, power interruption, etc.);Operation error (misoperation, sabotage, etc.).In short, for a computer system, all events that cause abnormal system downtime are called disasters.
According to statistics, the causes of system disasters are generally: hardware failures accounting for 44%, human errors accounting for 32%, software failures accounting for 14%, virus effects accounting for 7%Natural disaster3%.Therefore, it is urgent to formulate and establish a complete disaster recovery plan as soon as possible to enhance the system's ability to resist disasters and minimize losses.
The concept is evolving: remoteDisaster recoveryIs it offsite storage?
The idea of how to keep data intact when encountering any disaster originated from computer systems.When it comes to disaster recovery, most people will immediately discuss how to connect two storage systems that are far enough away. However, the implementation of disaster recovery is not so simple.Disaster recovery pursues business continuity and requires the realization of queries and business activities on the network. It includes long-distance clusters of servers and mirror backups of servers and application systems in both places.
System of Boke Communication Company in ChinaManager of Engineering DepartmentMr. Mascong believes that real disaster recovery must meet three elements: first, the components and data in the system are redundant, that is, when one system fails, the other system can maintaindata transferSmooth;Secondly, it is long-distance. Because disasters always occur within a certain range, sufficient long-distance can ensure that data will not be destroyed by a disaster;Third,Disaster recovery systemTo pursue fast data recovery, also known asDisaster recovery"3R" (RedundanceRemote、Replication)。
From the perspective of real-time, disaster tolerance should be divided into three levels: the lowest level is tape level disaster tolerance, the upper level is disaster tolerance with mirroring function and data recovery, and the highest level should be: image+data recovery+Server cluster。