Redundant Arrays of Independent Disks (RAID) means "several independent disks form an array with redundancy".[1]
The disk array is composed of many independent disks, which are combined into a disk group with huge capacity. The addition effect produced by the data provided by individual disks is used to improve the entire diskSystem Effectiveness 。Using this technology, the data is cut into many sections and stored on each hard disk.[1]
The disk array can also use the concept of Parity Check to read data when any hard disk in the array fails.stayData reconstructionThe data can be re placed into the new hard disk after calculation.[1]
Redundant Arrays of Inexpensive Disks orRedundant array of independent disks(RAID, redundant array of independent disks)data storage Methods in different places of multiple hard disks.By placing data on multiple hard disks, I/O operations can overlap in a balanced manner, improving performance.Because multiple hard disks have increasedMean time between failures(MTBF), Storageredundant dataFault tolerance is also added.[2]
fromUniversity of California, Berkeley(University of California-Berkeley)In 1988, he published the article "A Case for Redundant Arrays of Inexpensive Disks".In this article, we talked about RAID and defined five levels of RAID.berkeley university The purpose of the research is to reflect the fast performance of the CPU at that time.CPUThe efficiency grows by about 30-50% every year, while the hard magnetic machine can only grow by about 7%.The research team hopes to find a new technology that can immediately improve the efficiency to balance the computer'sComputing power。At that time, the main research objectives of the Berkeley research team were efficiency and cost.[2]
In addition, the research team also designed fault tolerance(fault-Tolerance), logicData backup(logical data redundancy).At the beginning of the study, Inexpensive disks were also the main focus, but later it was found that a large number of cheap disk combinations could not be applied to realityproduction environment Later, Inexpensive was changed to independent, many independent disk groups.[2]
function
Announce
edit
The disk array has the functions of improving the speed of computer reading and writing data, achieving redundant protection of data and ensuring the reliability of data storage.[9]
RAID technology mainly includes the following threebasic function :
(1) By striping the data on the disk, the data can be accessed in blocks to reduce the mechanical capacity of the diskSeek time, improved dataAccess speed。[3]
(2) By reading several disks in an array at the same time, the mechanical seek time of the disks is reduced and the data access speed is improved.[3]
(3) Through image or storageParityInformation mode, realizing redundant protection of data.[3]
classification
Announce
edit
There are three types of disk arrays: externalDisk array cabinetThe second is the built-in diskArray cardThe third is to use software to simulate.[2]
External typeDisk array cabinetIt is most commonly used on large servers and has the feature of hot swap, but these products are expensive.[2]
The built-in disk array card is cheap, but requires high installation technology, which is suitable for technicians to use.The hardware array can provide online capacity expansion, dynamically modify the array level, and automaticallydata recovery、DriverRoaming, super cache and other functions.It provides performancedata protection, reliabilityusabilityAnd manageability.Array cardappropriativeprocessing unitTo operate.[2]
The way of using software simulation means thatNetwork operating systemThe disk management function provided by itself will connect the normalSCSI cardThe multiple hard disks on the are configured as logical disks to form an array.The software array can providedata redundancy Function, but diskSubsystemThe performance will be reduced. Some of the performance will be reduced by about 30%.Therefore, it will slow down the machine, and is not suitable for servers with large data traffic.[2]
principle
Announce
edit
Disk array asIndependent systemDirectly connect to the host outside the host or connect to the host through the network.The disk array has multiple ports that can be connected by different hosts or different ports.Different ports of a host connected to the array can be promotedtransmission speed 。[2]
And thenPCAs with the internal integrated cache of a single disk, there is a certain amount of buffer memory in the disk array to speed up the interaction with the host.The host interacts with the disk array cache, and caches data interacting with specific disks.[2]
In applications, some commonly used data needs to be read frequently. The disk array finds these frequently read data according to the internal algorithm, stores them in the cache, and speeds up the host's reading of these data. For data not in other caches, the host needs to read, and the array directly reads from the disk and transfers them to the host.For data written by the host,Write onlyIn the cache, the host can complete the write operation immediately.Then the cache writes to the disk slowly.[2]
Advantages and disadvantages
Announce
edit
advantage
increasetransmission speed 。RAID greatly improves the data of the storage system by storing and reading data on multiple disks at the same timethroughput(Throughput)。In RAID, many disks can beDriverData is transferred at the same time, and these disk drives are logically a disk drive, so using RAID can achieve the speed of several times, dozens of times, or even hundreds of times of a single disk drive.This is also the problem RAID originally intended to solve.Because the CPU speed was growing very fastdisc driveOfData transmission rateIt cannot be greatly improved, so a solution is needed to solve the contradiction between the two.RAID finally succeeded.[2]
adoptdata verification Provide fault tolerance function.Common disk drives cannot provide fault toleranceCRC(Cyclic redundancy check) code.RAID fault tolerance is built on everydisc driveOfHardware fault toleranceFunction, so it provides higher security.In many RAID modes, there are relatively complete mutual verification/recovery measures, even direct mutual verification/recoveryimage copies , which greatly improves the fault tolerance of the RAID system and improves the stability and redundancy of the system.[2]
The utilization rate of RAID 1 disks can only reach 50% at most (when two disks are used), which is the lowest of all RAID levels.[2]
RAID0+1 is understood as a compromise between RAID 0 and RAID 1.RAID 0+1 can provide data security for the system, but the degree of security is lower than that of Mirror, and the utilization rate of disk space is higher than that of Mirror.[2]
RAID Levels
Announce
edit
RAID JBOD
RAID JBOD Diagram
RAIDJBODJust a Bunch Of Disksstorage deviceIn a sense, this type is not counted as RAIDWikipediaJBOD is also classified as non RAID architecture.RAID JBOD concatenates all disks into a single storage device whose capacity is the sum of disks used for use by the operating system.For example, if you use three 80GB disks, the capacity of the RAID JBOD device is 240GB. If you use three 60GB, 80GB, and 100GB disks, the capacity of the RAID JBOD device is 240GB. Note that the RAID JBOD can use theAll spaces, regardless of whether the size of each equipment is the same.This is also the biggest difference between RAID JBOD and other RAID types.
Because the devices are connected in series, the access speed of the RAID JBOD is the same as that of a single device, and there is no form of verification. Therefore, if any disk fails, the entire RAID will be destroyed, and the reliability is 1/N of that of a single device.
With the continuous development of disk array (RAID) technology, disk array technology has many basic technical levels, which can be roughly divided into RAID0, RAID1, RAID2, RAID3, RAID4, RAID5, RAID6, etc. Each level of disk array technology has different technical principles.[10]
RAID 0
RAID 0 diagram
RAID 0Select a reasonable stripe on N hard disks to create a stripe set.The principle is that it will be similar to the displayInterlace scanning, willData segmentationWrite in different stripes to all hard disks for reading and writing at the same time.Multi hard diskParallel operationIdenticalOne timeThe speed of internal disk reading and writing is increased by N times.[2]
When creating a band extent set, it is very important to select the size of the band reasonably.If the stripe is too large, the stripe space on a disk may meet most of theI/O operationsThe reading and writing of data is still limited to a few hard disks, which can not give full play to the advantages of parallel operation.On the other hand, if the band is too small, anyI/O InstructionsMay cause a large number of read and write operations and occupy too many controllersBus bandwidth。Therefore, when creating a band extent set, you should carefully select the size of the band according to the needs of the actual application.[2]
The stripe set can evenly distribute data to all disks for reading and writing.However, if all the hard disks are connected to one controller, it may cause potential harm.This is because it is easy to overload the controller or bus when reading and writing frequently.To avoid the above problems, it is recommended that you can use multiple disk controllers.bestresolventEach hard disk is equipped with a special disk controller.[2]
Although RAID 0 can provide more space and better performance, the entire system is very unreliable. If a failure occurs, no remedy can be made.Therefore, RAID 0 is generally used forData securityIt is used only when the requirements are not high.[2]
RAID 1
RAID 1 diagram
RAID 1be calledDisk mirroringThe principle is to mirror the data of one disk to another disk, that is, when data is written to one disk, it will be generated on another idle diskMirror fileMaximum guarantee without affecting performanceSystem reliabilityAnd repairability, as long as any pair ofMirror diskAt least one disk can be used, and even the system can run normally when half of the hard disks have problems. When a hard disk fails, the system will ignore the hard disk and use the remaining mirror disks to read and write data, which has good disk redundancy.Although this is absolutely safe for data, the cost will also increase significantlyUtilization50%. In terms of four 80GB hard disks, the availabledisk spaceOnly 160GB.In additionhard disk failureThe RAID system is no longer reliable. You should replace the damaged hard disk in time, or the remaining image disks will also have problems, and the entire system will crash.After a new disk is replaced, the original data will take a long time to synchronize the image. External access to data will not be affected, but the performance of the entire system will decline.Therefore, RAID 1 is often used to store critical and important data.[2]
RAID 1 mainly implements disk mirroring through secondary read and write, so the load on the disk controller is also considerable, especially in the environment where data needs to be written frequently.To avoidPerformance bottlenecksIt is necessary to use multiple disk controllers.[2]
RAID0+1
Schematic diagram of RAID0+1
fromRAID 0+1It can be seen from the name that RAID0 andRAID1A combination of.When RAID 1 is used alone, the problem is similar to that when RAID 0 is used alone, that is, data can only be written to one disk at a time, and all resources cannot be fully utilized.To solve this problem, you can create a stripe set in the disk image.Because this configuration mode combines the advantages of stripe set and image, it is called RAID 0+1.Combining RAID0 and RAID1 technologies, data is distributed on multiple disks, and each disk has its own physical mirror disk, providing full redundancy, allowing one of the following disks to fail without affectingData availability, and has fast read/write ability.RAID0+1At least 4 hard disks should be created in the disk image.[2]
MegaRAID, Nytro and Syncro are allLSIThe solution launched for RAID, and has been creating updates.[2]
The main orientation of LSI MegaRAID is to protect data through high-performance, highly reliableraid controller Function to provide high-level protection for data.LSI MegaRAID is well known in the industry.[2]
The main positioning of LSI Nytro is data acceleration. It makes full use of today's popular flash memory technology to greatly improve data I/O speed.LSI Nytro includes three series: LSI Nytro WarpDriveAccelerator cardLSI Nytro XD application accelerationStorage solutionsAnd LSI Nytro MegaRAID application accelerator cards.Nytro MegaRAID is mainly used forDASEnvironment, the Nytro WarpDrive accelerator card is mainly used forSANandNASEnvironment, Nytro XD solution consists of Nytro WarpDrive accelerator card and Nytro XD intelligenceCacheThe software consists of two parts.[2]
LSI provides basic reliability guarantee through MegaRAID;Accelerate through Nytro;The capacity bottleneck can be broken through through Syncro, so that low-cost storage solutions can be expanded on a large scale and further improve reliability.[2]
RAID2
RAID2: withHamming codeVerification.Conceptually,RAID 2withRAID 3Similarly, both stripe data on different hard disks in bits or bytes.However, RAID 2 uses some coding technology to provide error checking and recovery.This coding technology requires multiple disks to store inspection and recovery information, which makes the implementation of RAID 2 technology more complex.Therefore, in thebusiness environmentRarely used in.On each disk on the left, there areOnes, different dataBit operationObtainedHamming check codeYou can save on another set of disks.Due to the characteristics of Hamming code, it can correct errors in case of data errors to ensure correct output.itsData transfer rateIt is quite high. If you want to achieve an ideal speed, you'd better improve the hard disk that stores the ECC code. For the design of the controller, it is better thanRAID3, 4 or 5 should be simple.There is no free lunch. The same is true here. To use Heming code, you must pay the price of data redundancy.The rate at which data is output is equal to the slowest in the drive group.[2]
RAID3
RAID 3 Schematic Diagram
RAID3 (withParity check codeParallel transmission).suchCheck codeUnlike RAID2, it can only check but not correct errors.It processes one band at a time when accessing data, which can improve reading and writing speed.The check code is generated when writing data and saved on another disk.To achieve this, users must have more than three drives. The write rate and read rate are both high, becauseCheck bitLess, so the calculation time is relatively small.It is very difficult to implement RAID control with software, and it is not easy to implement the controller.It is mainly used for graphics (including animation) and other requirementsThroughput rateA relatively high occasion.Unlike RAID 2, RAID 3 uses a single disk to store parity information.If a disk fails, the parity disk and other data disks can regenerate data.If the parity disk fails, the data usage will not be affected.RAID 3 for a large number ofContinuous dataCan provide very goodTransmission rate, but for random data, parity disks will become the bottleneck of write operations.[2]
RAID4
RAID4 (independent disk structure with parity code).RAID4 is very similar to RAID3. The difference is that its access to data is based ondata blockThat is, one disk at a time.So RAID3 has one horizontal bar at a time, while RAID4 has one vertical bar at a time.Its features are similar to RAID3, but it is much more difficult to recover from failure than RAID3, the controller design is much more difficult, and the efficiency of accessing data is not very good.[2]
RAID5
RAID 5
RAID5 (distributed parity independent disk structure).From itsSketch MapAs can be seen from the above, itsParity check codeIt exists on all disks, where p0 represents the parity value of band 0, and other meanings are the same.RAID5 has high read efficiency, average write efficiency, and good block collective access efficiency.Because the parity code is on different disks, the reliability is improved.But it doesdata transmissionOfParallelismThe solution is not good, and the controller design is also quite difficult.RAID 3 vsRAID 5The important difference is that RAID 3 needs to involve all array disks for every data transmission.For RAID 5, most data transfers only operate on one disk, and parallel operations can be performed.In RAID 5, there is a "write loss", that is, each write operation will generate four actual read/write operations, including two reads of old data and parity information, and two writes of new data and parity information.[2]
RAID6
RAID6It is a parity code independent disk structure with two types of distributed storage。It is an extension of RAID5, and is mainly used in situations where data must not be wrong.Of course, due to the introduction of the second parity value, N+2 disks are required. At the same time, the design of the controller becomes very complex, and the writing speed is also poor. It takes more time to calculate the parity value and verify the correctness of the data, resulting in unnecessary load.[2]
RAID7
RAID7 (optimized high-speeddata transferDisk structure).All I/O transmission of RAID7 is synchronous and can be controlled separately, which improves the parallelism of the system and the speed of system access to data;Each disk comes withCache memory,real-time operating system Any real-time operation chip can be used to achieve differentreal-time system Needs.AllowSNMP protocolFor management and monitoring, independent transmission channels can be assigned to the check area to improve efficiency.Multiple hosts can be connected because theCache memory, whenMulti userWhen accessing the system, the access time is close to 0.Due to the adoption ofParallel structureTherefore, the efficiency of data access is greatly improved.It should be noted that it introduces a cache memory, which has advantages and disadvantages, because once the system is powered off, all data in the cache memory will be lost, so it is necessary toUPSWork together.Of course, the price of such a fast thing is also very expensive.[2]
RAID10
RAID10(high reliabilityAnd efficient disk structure).This structure is nothing more than a band structure and a mirror structure. Because the two structures have their own advantages and disadvantages, they can complement each other to achieve the goal of both efficiency and speed.You can understand this new structure by combining the advantages and disadvantages of the two structures.The price of this new structure is high,ExtensibilityNot good.Mainly used forData capacityNot big, but speed anderror control In the database.[2]
RAID53
RAID53(Efficient data transfer disk structure).The later structure is a kind of repetition and reuse of the former structure. This structure is the unification of RAID3 and the band structure, so it is fast and has fault tolerance function.But the price is very high, which is not easy to realize.This is because all data must go through two methods of band and bit storage. Considering the efficiency, it is not easy to require these disks to be synchronized.[2]
RAID 5E
RAID 5E is an improvement on the level of RAID 5. Similar to RAID 5, the data verification informationuniform distributionOn each hard disk, however, some unused space is reserved on each hard disk. This space is not striped, and at most two physical hard disks are allowed to fail.It seems that RAID 5E and RAID 5 plus oneHot SpareIt seems similar. In fact, since RAID 5E distributes data on all hard disks, its performance is better than RAID5 plus a hot spare disk.When a hard disk fails, the data on the failed hard disk will be compressed to unused space on other hard disks, and the logical disk will maintain RAID 5 level.[4]
RAID 5EE
Compared with RAID 5E, RAID 5EE is more efficient in data distribution. Part of the space of each hard disk is used as distributed hot spare disks. They are part of the array. When a physical hard disk in the array fails, data reconstruction will be faster.[4]
Implementation method
Announce
edit
Software RAID
The host based software is used to provide RAID function, which is implemented at the operating system level. Compared with hardware RAID,Software RAIDIt has the advantages of low cost and simplicity.However, software RAID has the following disadvantages.[5]
(1) Performance: Software RAID will affect the overall performance of the system.This is because software RAID requires CPU to perform RAID calculations.[5]
(3)compatibility: Software RAID vsHost operating systemTherefore, it is necessary to verify the compatibility of the software RAID or operating system upgrade. The RAID software can only be upgraded when it is compatible with the operating system, which will reduce the flexibility of the data processing environment.[5]
Hardware RAID
Includes host based hardware RAID and array based hardware RAID.Host based hardware RAID usually installs a dedicated RAID controller on the host, and alldisc driveAll are connected to the host, and some manufacturers also integrate RAID controllers onto the motherboard.But the host based hardware RAID controller is not an efficient solution in a data center environment with a large number of hosts.The array based hardware RAID uses an external hardware RAID controller, which acts as the interface between the host and the disk, presents the storage volumes to the host, and the host manages these volumes as physical drives.The hardware RAID controller has the following main features:[5]
DAS is server centric. Traditional network storage devices use RAID hard disk arraysDirect connectionreachNetwork systemThis form of network storage structure is called DAS (Direct Attached Storage).[6]
2. NAS Network Attached Storage
NAS is data centric. NAS is short for Network Attached Storage, which is called direct networked storage in Chinese. In the NAS storage structure, the storage system no longer passesI/O busAttached to a specific server orClient, but directly throughnetwork interfaceIt is directly connected to the network and accessed by users through the network.[6]
3. SAN -- Storage Area Networks
SAN is network centric. SAN is a high-speed network similar to ordinary LANStorage network。SAN provides aLANAn easy way to connect, allowing enterprises to independently increase theirstorage capacity And prevent the network performance from being affected by data access.This independent and exclusiveNetworked StorageThe mode makes SAN have many advantages: high scalability;The function of storage hardware is not affected by LAN;Easy to manage;Centralized managementSoftware enables remote management andUnattendedBe realized;Fault toleranceStrong.
SAN is mainly used forStorage capacityLarge working environment, such as large hospitalPACSEtc., but due torequirementIt is not expensive and affects the SAN market.[6]
Disk array maintenance
Announce
edit
Strengthening the daily management and maintenance of the disk array is an important means to ensure the normal and efficient work of the disk array.As a storage manager,Routine maintenanceAttention should be paid to the following:[7]
① Set hot standby source disk
Setting a hard disk as a hot standby source disk will cause some waste, but it is worth considering from a security perspective.RAID 5 is used for high-capacity disk arrays, but there is redundancy of one hard disk.If a hard disk is damaged, the security of the entire array will beCritical stateAt this time, a little loss of any hard disk will cause catastrophic consequences, resulting in the loss of all data.A hot spare source disk is set. When the hard disk fails, the system will automatically replace the failed disk with the hot spare source disk and rebuild the array, and then the data will be completely protected.[7]
The particularly important data should be backed up frequently, so as not to "put eggs in one basket", and even the disk array with high security will not be absolutely safe.[7]
③ Establish patrol inspection system
The fact that the disk array still works does not mean that there is no failure.When the disk array fails, the data reading and writing of the disk array storage system is usually normal.This is a kind of security measure for disk array, but it often gives managers the illusion that there is no failure in the disk array.ScheduledPatrol inspection, you can not only find the faults that have occurred, but also understand theworking condition , play a preventive role.[7]
Solid State Disk RAID Technology
Announce
edit
At present, there are three types of RAID array technologies for SSDs, includingSolid state hard diskandMechanical hard diskThe hybrid RAID array built in combination realizes the complementarity of the two characteristics.With the continuous improvement of the cost performance ratio of SSDs, the RAID array formed by the combination of SSDs and SSDs and theNAND Flash The research and development process of pure solid state disk RAID array combined with chips.Because the price of solid state disk is higher than that of mechanical disk, compared with other pure solid state disk RAID arrayscost controlIt has great advantages in terms of.However, in terms of performance and reliability, the RAID array composed of multiple SSDs is better than the hybrid RAID array composed of SSDs and mechanical hard disks. At present, most SSD manufacturers use chip level RAID arrays inside SSDs to further improve performance and reduce power consumption.[8]
For iRAID of embedded RAID array technology, the preliminary research results of this structure show that the RAID system will no longer be a group of independent drives, and may only have a singlehigh-densityDisk.This will make thesestorage system 's disk array, such asCloud storage system, in performancePower consumptionThe volume has been greatly improved, the cost has been further reduced, and it is also easier to maintain.As a result, embedded RAID technology will become one of the main research directions of SSD RAID array technology, with broad application prospects, involving education, entertainment, national defense and many other fieldsapplication area , especially in aviation, military and other work environments with high complexitydata securityAreas with high level requirements will make great achievements.In addition, due to the lack of research on evaluating the reliability of SSD RAID, it is necessary to improve the reliability of RAID as soon as possibleEvaluation systemAnd method, thus reliabilityAnalytical researchIt will also become one of the research focuses of solid-state disk RAID array technology.[8]
In addition, the following two aspects also apply to SSD RAID arraystechnical study Is concerned in.[8]
1、Big data storageStructure andSearch EnginesResearch.The data storage system is determineddata mining Performance and cost.New big data storage architecture can integrate every storage driver in distributed and embedded search engines, breaking through datathroughputAnd data access storage system to improve the bandwidth of big data storage interface.[8]
2. Research on rapid reconstruction mechanism.The RAID structure of SSD adopts the corresponding reconstruction mechanism, which will speed up the whole reconstruction process from statistical errors to data recovery, and help reduce the reconstruction processData inRisk of loss.Rebuilding mechanism is indispensable for a perfect solid state disk RAID structure, which needs to be developed and optimized according to the characteristics of its RAID array.[8]