data storage

Data terminology
Collection
zero Useful+1
zero
Data storage objects include data stream Produced during processing Temporary files Or information to be searched during processing. Data is recorded in a certain format on the internal or external storage medium of the computer. The data store shall be named, which shall reflect the constituent meaning of information characteristics. Data flow reflects the data flowing in the system, showing the characteristics of dynamic data; Data storage reflects the static data in the system, showing the characteristics of static data.
Chinese name
data storage
Foreign name
Data storage
Role
Reflect static data in the system
Field
information science

storage medium

Announce
edit
Disk and tape are common storage media. Data storage organization varies with storage media. On tape, only press Sequential file Mode access; On the disk, sequential access or direct access can be adopted according to the use requirements. The data storage mode is closely related to the data file organization. The key is to establish the corresponding relationship between the logical and physical order of records and determine the storage address to improve the data Access speed

Three storage modes

Announce
edit

DAS

DAS (Direct Attached Storage) direct attached storage is the same as our common PC storage architecture. External storage devices are directly attached to The server Internal bus The data storage device is part of the entire server structure.
DAS storage mode is mainly applicable to the following environments:
1) Small network
Because the network size is small, the amount of data storage is small, and it is not very complex, this storage method will not have a great impact on the server. And this storage mode is also very economical, suitable for enterprise users with small networks.
2) Geographically dispersed network
Although the overall network size of the enterprise is large, it is geographically dispersed. It is very difficult to interconnect them through SAN or NAS The server DAS storage can also be used to reduce costs.
3) Special application server
On some special application servers, such as Microsoft's Cluster server Or the original partition used by some databases requires that the storage device be directly connected to the application server.
4) Improve DAS storage performance
Among the various connection modes between servers and storage, DAS was once considered as an inefficient structure, and it is not convenient for data protection. Direct attached storage cannot be shared, so it often happens that one server has insufficient storage space, while other servers have a large amount of storage space that is idle but cannot be used. If storage cannot be shared, there is no balance between capacity allocation and usage requirements.
The data protection process under the DAS structure is relatively complex. If network backup is performed, each server must be backed up separately, and all data flows must be transmitted through the network. If network backup is not performed, each server must be equipped with a set of backup software and tape devices, so the complexity of the backup process will greatly increase.
If you want to have high availability DAS storage, you must first reduce the cost of the solution. For example, LSI's 12Gb/s SAS has DAS direct attached storage, which can provide good support for large data centers. For large data centers, cloud computing, storage and big data, all of these have put forward higher requirements for DAS storage performance. The explosive growth of cloud and enterprise data center data has also driven the market's demand for high-performance storage interfaces that can support faster data access. Therefore, LSI 12Gb/s SAS is just able to meet the requirements of this performance growth, It can provide higher IOPS and higher throughput. 12Gb/s SAS improves higher write performance, and improves the overall performance of RAID.
Compared with direct attached storage architecture, shared storage architecture, such as SAN (storage area network) or NAS (network attached storage), can better solve the above problems. So we can see that the process of DAS being eliminated is getting faster and faster. However, up to 2012, DAS is still a common mode of server storage connection. In fact, DAS has not only not been eliminated, but also seems to be recovering in recent years.

NAS

The NAS (Network Attached Storage) data storage mode has comprehensively improved the previous inefficient DAS storage mode. It is independent of The server , a file server developed separately for network data storage to connect the storage devices and form a network. In this way, the data storage is no longer attached to the server, but independent network node It exists in the network and can be shared by all network users.
Advantages of NAS:
1) True plug and play
NAS is an independent storage node that exists in the network, independent of the user's operating system platform, and is truly plug and play.
2) Simple storage deployment
NAS does not rely on the general operating system. Instead, it uses a simplified user oriented operating system designed specifically for data storage. It has built-in protocols required for network connection, so the management and setup of the entire system are relatively simple.
3) Storage device location is very flexible
4) Easy to manage and low cost
The NAS data storage mode is designed based on the existing enterprise Ethernet TCP/IP protocol Communication, data transmission by file I/O.
Disadvantages of NAS:
(1) Low storage performance (2) Low reliability

SAN

In 1991, IBM The server ESCON (Enterprise System Connection) technology is introduced in. It is based on optical fiber media, and the maximum transmission rate is 17MB/s for server access storage A connection mode of. On this basis, ESCON Director (FC Switch) with stronger functions was further introduced, and the most original SAN system was built.
The storage mode of SAN (Storage Area Network) creates the networking of storage. Storage networking conforms to the trend of computer server architecture networking. The supporting technology of SAN is Fibre Channel (FC Fiber Channel) technology. It is ANSI for network and channel I/O interface Establish a standard integration of. FC technology supports a variety of advanced protocols such as HIPPI, IPI, SCSI, IP, ATM, etc. Its biggest feature is to integrate the communication protocol It is separated from the transmission physical medium, so that multiple protocols can be transmitted simultaneously on the same physical connection.
The hardware infrastructure of SAN is Fibre Channel, and the SAN built with Fibre Channel consists of the following three parts:
1) Storage and Backup device : Includes tape, disk, and Jukebox Etc.
2) Fibre Channel Network connection components: including Host bus adapter Driver , optical cable Hub Switch , Fibre Channel and SCSI Bridge
3) Application and Management software : Including backup software, storage resource management software and storage device management software.
Advantages of SAN:
1) Easy network deployment;
2) High speed storage performance. Because SAN adopts Fibre Channel technology , so it has higher storage bandwidth The storage performance is significantly improved. Full duplex for Fibre Channel of SAn serial communication The principle is to transmit data, and the transmission rate is up to 1062.5Mb/s.
3) Good scalability. Because SAN adopts network structure, its expansion ability is stronger. The optical fiber interface provides a connection distance of 10km, which enables physical separation and is not local computer room Storage of becomes very easy. [1]

Comparison of three storage modes

The biggest feature of storage applications is that there is no standard architecture. These three storage methods coexist and complement each other, which has well met the requirements of enterprise informatization applications.
In terms of connection mode, DAS adopts storage device direct connection application The server , with certain flexibility and limitation; NAS connects storage devices and application servers through network (TCP/IP, ATM, FDDI) technology. Storage devices are flexible in location. With the emergence of 10 Gigabit network, the transmission rate has been greatly improved; SAN is through Fibre Channel (Fibre Channel) technology connects storage devices and application servers, with good transmission rate and scalability. The three storage modes have their own advantages and co-exist, accounting for Disk storage More than 70% of the market. The price of SAN and NAS products is still far higher than that of DAS. Many users choose inefficient direct attached storage rather than efficient Shared Storage
Objectively speaking, SAN and NAS systems can already use technologies like thin provisioning to make up for the weakness of inflexible storage allocation in the early days. However, they used too much time to solve the problem of storage allocation, so that they left enough time for DAS to gain a foothold in the data center field. In addition, SAN and NAS still have many problems that cannot be solved. [2]

Storage price trap

Announce
edit
Sometimes it is necessary to upgrade the system by forklift, but with a little consideration in the purchase process, you can avoid the price trap that may be encountered in the storage purchase.

Storage prices vary widely

Storage equipment is a typical example of wasteful expenditure caused by technological change, but it also has its inevitability and necessity. In 2005, integrated drive electronics (IDE) is mainly used as low-end data storage, while small computer system interface( SCSI) The hard disk is mainly used for high-end servers. The IDE was very slow until it developed into an enhanced IDE, followed by the debut of SATA technology. SATA III in 2015 is comparable to the high-end storage option - serial linked SCSI (SAS) - at a lower cost.
All these storage devices use rotating hard disks, which cannot compete with flash memory in 2015. At the beginning, flash memory was used as the storage medium for cameras, which was fragile and small, but in 2015, it has become the preferred storage medium for most data center equipment manufacturers.

Hide costs

Even if the hardware is relatively cheap, management and most related tools will increase the cost of storage.
Tools that can only identify physical arrays have been abandoned by virtualization. The software that can handle the storage in the virtualized world finds that virtual storage still depends on the underlying physical drive platform support.
A tactical strategy is needed to maximize the use of existing technology before the next technical forklift upgrade. However, tactical decision-making is not strategy, and confusing the two will lead to the wrong technical direction of IT organizations.
If the current storage provider has been indoctrinating that your flash storage is very fragile and the availability has not been verified, then it is likely that you have bought a large SAS array as a strategic investment. It used to be the best storage, and you hope it can give full play to its performance for a long time without devaluing too much. But in 2015, this strategy has encountered problems.
Take SAS drive failure as an example, which will cause RAID6 reconstruction and seriously affect performance. If the required size has been discontinued, it will be difficult to find a replacement for the failed drive, and the array cannot accept different types of drives. Users will complain that all applications are very slow. You want to add a flash storage layer to improve performance, but you find that the array has no interface to support solid state disk drives.
The choice is to buy a new storage, move all the data, and then lose the old system - there is no need to replace the drive, and it is not even worth selling to recover funds.
Before considering updating the IT platform, think ahead -- the decision that seems right will not be so sure in the future. Understand the tactical/strategic opposition and if long-term solutions are needed, there must be standards and commercialization.
Ask the supplier how the existing product works with the old version. If the products sold by the manufacturer in 2015 are not so compatible with the old series, it is very likely - no matter how the sales representative promises - that indicates the next forklift upgrade time.
Consulting suppliers need to adhere to some standards: whether they insist on timely understanding of market changes, and whether the products fully support industry standards? Will exceeding these standards lead to interoperability problems with other manufacturers' products?
Can other manufacturers complement the products you are purchasing to increase the added value? Communicate with these partners and third parties, and ask them whether it is easy to deal with and change the supplier's products. [3]

Killing the Data Center's Data Storage Capacity

Announce
edit
Heterogeneity of data storage architecture and controller in data center is a major obstacle for standardized infrastructure to support different workloads.
For all intentions and goals, the core of the data storage system is productization. However, there are still differences at different levels in the battle to achieve interoperability of data storage arrays from different manufacturers.
Distributed computing means that the data storage in the data center must interoperate with servers from different manufacturers, which increases the requirement for standardization of data storage architecture. Cloud computing promotes greater progress in standardization.

Data storage management

Data Center storage capacity Management has always depended on a relatively constant basic data storage technology: traditional mechanical hard disk. Mechanical hard disk Only a few manufacturers—— Western Data With Seagate, Hitachi and Toshiba - hard disk is essentially a commodity.
The problem needs to be solved by realizing the cooperation between arrays and controllers of different brands. Some companies have purchased high-end and expensive storage, such as EMC's Symmetrix VMAX, and want to manage the entire data storage through a single toolset. However, the disk array operates through Array controller It is difficult to create a fully functional data storage management tool.
Data storage vendor - IBM has SAN Volume Controller, EMC uses VPLEX Hitachi Data Systems, HP and NetApp are all touting their own proprietary data storage management tools, claiming that they can achieve the integration of virtualized storage architectures. However, these tools basically only support the data storage systems of their own manufacturers, and in most cases, they are only for some products. The end user is looking for a real data storage and a high-performance heterogeneous data storage management tool without success.
Cloud computing is changing our view of data storage. Workloads become more and more mixed, and data storage needs to manage objects, file and block patterns according to different I/O requirements. However, to support cloud architecture, data storage infrastructure must be viewed as a single resource pool, and organizations need to be able to automatically adapt to changes in workloads. Only by providing highly standardized data storage tools can this function be achieved. This initiative has begun, but there is still a long way to go.

Flash data storage rescue

It is difficult to create a standardized data center's data storage capacity separately through disk storage. Media depends on the interaction between disk disks. Read/write spikes need to be mediated by an intelligent disk controller to manage the requirements of different workloads.
The data management method of flash data storage is different from that of hard disk. Flash data storage is a direct access storage architecture; It is not necessary to search the correct disk area through the head to retrieve data, so there is no delay. The advantage of data storage management speed means that flash can be applied to different workload types of the same array. It is also more unified across different vendors of data storage products to achieve virtualization.
Finally, standardized data storage may be a real commitment, not just a focus of discussion - but it is far away.
Data storage vendors still have many different ways to promote and deploy Flash. Many old data storage manufacturers sell it in a mixed way: add an independent flash layer in front of the disk array. When the data storage required by the workload is not in the flash layer, problems may occur because the controller needs to pull the data storage from the hard disk. This may make some data storage operations slower than pure hard disk arrays.
A necessary step to maximize the investment in data storage capacity of existing data centers is to stack pure flash memory and disk array systems. However, these existing traditional arrays will become trouble makers when building a single management layer. EMC ViPR Data Storage virtualization The product has demonstrated its commitment to providing greater control over the hybrid data storage architecture.
Full Flash Array Bundled together in the fierce competition of hybrid data storage. Flash data storage, such as Pure Storage, Violin Memory and Nimble Storage, provides intelligent software to minimize data storage volumes, and provides an advanced data storage management system to manage the entire virtualized environment.

Data storage system integration

Convergence infrastructure( CI) When the system encounters cloud data storage management, it tends to stir the water.
Nutanix, a supplier starting from data storage space, provides super CI platform services, including advanced data storage management software. IBM's PureFlex system and PureData system, Dell's PowerEdge FX2 system, HP's Converged Infrastructure, and other data storage products also provide various methods to achieve the integration of direct attached data storage and CI systems. Not only existing arrays but also new arrays are tightening the integration of expansion and CI.
Another trick can improve the speed of data storage connection on the server side, such as flash data storage of PCIe interface. IBM has developed a method for internal connection of its own system, which can further speed up data storage. This CAPI connector brings a unique problem again - it will depend on whether IBM will enable the connector to maintain a high level of unified collaboration with other manufacturers' data storage systems. The integrated system must still concentrate resources to share and utilize. This will require more advanced data storage tools than we can see. [4]