Collection
zero Useful+1
zero

Distributed system

Information science terminology
Distributed system is built on the network Software System. Because Software The distributed system has a high degree of Cohesion And transparency. Therefore, the difference between the network and the distributed system lies more in the high-level Software (In particular operating system ), not hardware. Cohesion Means each data base Distribution node highly autonomous , have local Database management system Transparency means that each data base The distributed nodes are transparent to users' applications, and it can't be seen whether they are local or remote. stay Distributed database system The user does not feel that the data is distributed, that is, the user does not need to know whether the relationship is divided or not copy , the site where the data is stored, and affair On which site, etc.
Chinese name
Distributed system
Foreign name
distributed system
Disciplines involved
information science
Application
automation
finger
Network based Software system
The difference is
high-level Software

System Introduction

Announce
edit
In a distributed system, a group of independent computers presents users with a unified whole, as if it were a system. The system has a variety of common physical and logical resources, which can dynamically allocate tasks. Decentralized physical and logical resources pass through computer network Realize information exchange. There is a distributed operating system that manages computer resources in a global way. Generally, a distributed system has only one model or paradigm for users. stay operating system There is a layer above Software Middleware is responsible for implementing this model. A famous example of a distributed system is web (World Wide Web). In the World Wide Web, everything looks like a document( Web Page). [1]
stay computer network This unity, the model, and the Software None exists. The user sees the actual machine, computer network It doesn't make these machines look unified. If these machines have different hardware or different operating system , then, these differences are completely visible to users. If a user wants to run a program , then he must log in to the remote machine and run the program on that machine.
Distributed systems and Computer network system The common point is that most distributed systems are built on computer networks, so distributed systems and computer networks Physical structure They are basically the same.
Their difference lies in: the design idea of distributed operating system and Network operating system They are different, which determines that they are also different in structure, working methods and functions. Network operating system Network users are required to use network resource First, you must understand the network resources. Network users must know the functions and configurations of each computer in the network Software Resources, network file structure, etc. If the user wants to read a Share files The user must know which computer and directory the file is placed in; The distributed operating system manages system resources in a global way. It can schedule network resources for users arbitrarily, and the scheduling process is "transparent". When a user submits a job, the distributed operating system can select the most appropriate processor , submit the user's job to this process program After the processor finishes the job, the result is passed to the user. In this process, users do not realize that there are multiple processor The system is like a processor.

System advantages

Announce
edit
Advantages of distributed system compared with centralized system
The real driving force for the trend of distributed development of the system is economy. 25 years ago, Herb Grosch, a computer authority and critic, pointed out that the computing power of CPU is proportional to the square of its price, which later became the Grosch theorem. That is, if you pay twice the price, you can get four times the performance. This conclusion was very consistent with the mainframe technology at that time, so many organizations tried their best to purchase the largest single mainframe. [1]
With the development of microprocessor technology, The Grosch theorem no longer applies. At the beginning of the 21st century, people can buy a CPU chip for only a few hundred dollars. This chip executes more instructions per second than the processor of the largest mainframe in the 1980s. If you are willing to pay twice the price, you will get the same CPU, but it runs at a higher clock speed. Therefore, the most cost-effective way is usually to use a large number of inexpensive CPUs in a system. Therefore, the main reason for preferring distributed systems is that they can potentially achieve much better cost performance than a single large centralized system. In fact, distributed systems achieve similar performance at a lower price.
A little different from this point of view is that we find that the set of microprocessors can not only produce better performance price ratio than a single mainframe, but also produce absolute performance that a single mainframe cannot achieve in any case. For example, according to the technology in the early 21st century, we can use 10000 modern CPU chips to form a system. Each CPU chip runs at 50 MIPS (millions of instructions per second), so the performance of the whole system is 500000 MIPS. However, if a single processor (i.e. CPU) wants to achieve this performance, it must execute an instruction in 2 × 10-12 seconds (2 picoseconds, 0.002 nanoseconds). However, no existing computer can approach this speed, and it is considered theoretically and engineering that computers that can meet this requirement are impossible to exist. In theory, Einstein's theory of relativity pointed out that light travels fastest, and it can travel 0.6 mm in 2 picoseconds. In fact, a computer with the calculation speed mentioned above, which is contained in a cube with a side length of 0.6mm, generates a lot of heat and can melt itself immediately. Therefore, whether you want to obtain ordinary performance at a low price or extremely high performance at a high price, distributed systems can meet the requirements.
On the other hand, some authors distinguish between distributed systems and parallel systems. They believe that distributed systems are designed to allow many users to work together, and the only goal of parallel systems is to complete a task at the fastest speed, just like our 500000 MIPS computers. We believe that the above difference is difficult to establish, because in fact, the two design fields are unified. We prefer to use the term "distributed system" in the broadest sense to refer to any system with multiple interconnected CPUs working together.
Another reason for building distributed systems is that some applications are distributed. A supermarket chain store may have many branches, each of which needs to purchase locally produced goods (possibly from local farms), sell locally, or decide which local vegetables must be thrown away because they have been too long or have rotted. Therefore, it is meaningful that the local computer of each store can understand the inventory list, rather than concentrating on the company headquarters. After all, most queries and updates are made locally. However, the top managers of chain supermarkets also want to know how many cabbage they have from time to time. One way to achieve this goal is to build the entire system as a computer for applications, but it is distributed in implementation. As we described earlier, a store has a machine. This is a commercial distribution system.
Another inherent distributed system is commonly referred to as computer supported cooperative work system (CSCW, Computer Supported Cooperative Work)。 In this system, a group of people who are physically far away from each other can work together, for example, writing the same report. As far as the long-term development trend of the computer industry is concerned, people can easily imagine a new field - Computer Supported Cooperative Games (CSCG). In this game, players who are not in the same place can play the game in real time. You can imagine playing electronic hide and seek in a multi-dimensional maze, or even playing an electronic air war together. Each player controls his local flight simulator to try to shoot down other players. The screen of each player shows the situation outside his plane, including other planes flying into his field of vision.
Compared with centralized system, another potential advantage of distributed system is its high reliability. By distributing the workload to many machines, a single chip failure will only shut down one machine at most, while other machines will not be affected in any way. Under ideal conditions, if 5% of the computers fail at a certain time, the system will still be able to work, but only 5% of the performance will be lost. For critical applications, such as nuclear reactors or aircraft control systems, the use of distributed systems is mainly due to its high reliability.
Finally, the incremental growth mode is also a potentially important reason why distributed systems are superior to centralized systems. Usually, a company will buy a mainframe to complete all the work. When the company prospers and expands, the workload will increase. When it increases to a certain extent, the host will no longer be competent. The only solution is to replace the existing mainframe with a larger machine (if any) or add another mainframe. Both of these practices will cause chaos in the operation of the company. In contrast, if a distributed system is adopted, only adding some processors to the system may solve this problem, and this also allows the system to expand gradually when the demand grows. These advantages are summarized in Table 1-1.
Distributed system
project
describe
Economics
Microprocessor provides better performance price ratio than mainframe
speed
Total computing power is stronger than a single mainframe
Inherent distribution
Some applications involve spatially dispersed machines
reliability
If one machine crashes, the whole system can still run
Increasingly
The computing power can be gradually increased
In the long run, the main driving force will be the existence of a large number of personal computers and the need for people to work together and share information. This information sharing must be carried out in a convenient form, not affected by geography or the physical distribution of people, data and machines.
Advantages of distributed system compared with independent PC
Since using microprocessors is a way to save money, why not give everyone a personal computer to work independently? For one thing, many users need to share data. For example, the staff of the air ticket reservation office need to access the main database that stores flight and existing seat information. If you back up the entire database for each staff member, it will not work in practice, because no one knows what seats other staff members have sold. The shared data is the basis of the above example and many other applications, so computers must be interconnected. The interconnection of computers produces distributed systems.
Sharing is not just about data. Expensive peripherals, such as color laser printers, photo typesetters, and large storage devices (such as automatic CD jukeboxes) are shared resources.
The third reason for connecting a group of isolated computers into a distributed system is that it can enhance communication between people. E-mail is more attractive than letters, telephones and faxes. It is much faster than a letter. Unlike a telephone, which requires two people to be present at the same time, or a fax, the documents generated by it can be edited, rearranged, and stored in a computer, or processed by a text processing program.
Finally, distributed systems may be more flexible than giving each user a separate computer. Although one possible mode is to give everyone a personal computer and connect them through LAN, this mode is not unique. Another mode is to mix and connect personal computers and shared computers (the models of these machines may not be the same), so that work can be done on the most appropriate computer, rather than always on your own computer. In this way, the workload can be more effectively distributed in the computer system. The failure of some computers in the system can also be compensated by making them work on other computers. Table 1-2 summarizes the points described above.
project
describe
data sharing
Allow multiple users to access a common database
Device Sharing
Allow multiple users to share expensive peripherals (such as color printers)
signal communication
Make communication between people easier, such as via email
flexibility
Allocate workload to available machines in the most efficient way

System shortcomings

Announce
edit
Although distributed systems have many advantages, they also have disadvantages. This section will point out some of these shortcomings. We have already mentioned the most difficult problem: software. At the current level of the latest technological development, we do not have much experience in designing, implementing and using distributed systems. What kind of operating system, programming language and application is suitable for this system? How much should users know about distributed processing in distributed systems? How much should the system do and how much should the user do? Experts have different opinions (this is not because experts are different, but because they seldom involve in distributed systems). With more research, these problems will gradually decrease. But we should not underestimate this problem. [1]
The second potential problem is the communication network. Because it will lose information, special software is required for recovery. At the same time, the network will be overloaded. When the network load tends to be saturated, it must be modified and replaced or another network must be added for capacity expansion. In both cases, some parts of one or more buildings must be re wired at a high cost, or the network interface board must be replaced (for example, with optical fiber). Once the system depends on the network, the information loss or saturation of the network will offset most of the advantages we gain by building a distributed system.
Finally, the ease of data sharing described above as an advantage also has two sides. If people can easily access the data in the whole system, they can also easily access the data irrelevant to them. In other words, we often have to consider the security of the system. In general, it is preferable to store data that must be absolutely confidential by using a dedicated, isolated personal computer that is not connected to any other machine. Moreover, the computer is stored in a locked and secure room, and all the floppy disks matching the computer are stored in a safe in the room. The disadvantages of the distributed system are shown in Table 1-3.
Distributed system
project
describe
Software
Few software has been developed
network
The network may be saturated and cause other problems
security
It is easy to cause access to confidential data
Table 1-3 Disadvantages of distributed systems
Despite these potential problems, many people still believe that the advantages of distributed systems outweigh the disadvantages, and generally believe that distributed systems will become more and more important in the next few years. In fact, within a few years, many institutions will connect most of their computers to large distributed systems to provide users with better, cheaper and more convenient services. In ten years' time, an isolated computer may no longer exist in medium-sized or large businesses or other institutions.

System application

Announce
edit
·
Distributed systems are used in many different types of applications. Below we list some applications. For these applications, the use of distributed systems is superior to other architectures such as processors and shared memory multiprocessors: [2]
Parallel and high-performance applications
In principle, parallel applications can also run on shared memory multiprocessors, but shared memory systems cannot be well scaled to include a large number of processors. HPCC (High Performance Computing and Communication) applications generally require a scalable design, which depends on distributed processing.
Tolerant Applications
Because each PE is autonomous, the distributed system is more reliable. The failure of a unit or resource (software or hardware) will not affect the normal function of other resources.
Inherently distributed applications
Many applications are inherently distributed. These applications are burst mode rather than bulk mode. Examples of this are transaction processing and Internet Javad, programs.
The performance of these applications depends on the throughput (positive J during transaction response or the number of transactions completed per second) rather than the execution time used by general multiprocessors.
For a group of users, the distributed system has a special application called computer supported cooperative working, CSCW) or groupware to support users to work together. Another application is distributed conference, that is, electronic conference through physical distributed network. Similarly, multimedia distance learning is a similar application. Since a wide variety of applications can be obtained on different platforms, such as Pc, workstations, local area networks and wide area networks, users want to go beyond the limitations of their fliP c to obtain more extensive features, functions and performance. Q operability in different networks and environments (including distributed system environments) is becoming increasingly important. To achieve interoperability, users need a standard distributed computing environment in which all systems and resources are available.
DCE (Distributed Computing Environment) is an industrial standard set of distributed computing technology developed by OSF (Open Systems Foundation). It provides security services to protect and control data access, name services that are easy to find distributed resources, and a highly scalable model for organizing extremely dispersed users, services, and data. DCE can run on all major computing platforms and is designed to support distributed applications in heterogeneous hardware and software environments.
DCE has been implemented by some developers including TRANSVARL. TRANSVARL is one of the members of the earliest multi vendor team, and its suggestions have become the basis of the DC E architecture. Guidelines for developing distributed applications using DCE can be found in. Systems with standard interfaces and protocols are also called open systems. Some other standards are based on a special model, such as CORBA (Common Object Request Broker Architecture), which is a standard developed by OMG (Object Management Group) and the multi computer manufacturer alliance. CORBA uses an object-oriented model to implement transparent service requests in distributed systems. The industry has its own standards, such as Microsoft's Distributed Component Object Model (DCOM) and Sun Microsystem's Java Beans.

System test

Announce
edit
·
In the process of test execution, the analysis of test results is a key issue that needs in-depth consideration. The key point of distributed system testing is to test the back-end server cluster, and determining whether there are bugs in the system is an important problem we need to solve. How can I determine whether there is a bug?
For the analysis of test results, we usually observe the following situations.
Observe the return results of front-end applications. There are two situations to consider here: first, operate according to the front-end application business function points and processes, and observe whether the returned results meet the needs and expectations of the business side; Second, operate the back-end server (usually restart, downtime, network disconnection, etc.), and observe whether the returned results of the front-end application meet the design requirements of the system.
Analyze server logs. During the function test, when we start the server, we need to define the log level as the Debug level (the lowest level). The main purpose of this is to facilitate test engineers to analyze logs and locate problems. In order to better locate the problem, it is often necessary to pile logs in the server program code to show some important data in the program through logs. Generally, we need to agree on the log format and add some keywords to the log lines for classification, which will facilitate the test engineers to analyze the logs and also facilitate the automated testing of distributed systems. In addition, it is worth noting that we try to place piling code in Debug code to avoid affecting system code and introducing new problems.
Analyze some important information about the operating system. Most of the distributed systems we tested were developed based on the Linux operating system. In addition to analyzing the program log in detail, we also need to analyze some important data information of the operating system to diagnose whether the server program is abnormal. Taking the Linux operating system as an example, we often use the top command Netstat command and sar command to view some data information of the operating system. For example, you can check whether the server program correctly listens to the specified port through the netstat command.
Use other analysis tools. For example, how to determine whether the server program has a memory leak? Usually, memory detection tools are needed for analysis. In Linux environment, we often use Valgrind for memory detection. This is a very useful and powerful analysis tool, which can help test or development engineers quickly find many hidden program bugs, especially in memory detection (at the same time, it also has many other excellent functions, and readers can check the manual on the official website).
Stress test and performance test of distributed system
For distributed systems, stress testing and performance testing are very important. During the stress test and performance test, the following difficulties may be encountered.
Data preparation. How to prepare massive test data and ensure the authenticity of simulation data? Taking a distributed file system as an example, whether 100GB data is stored in advance or 100TB data is stored in advance, and whether the size of the stored files is basically consistent or different or even different (for example, from tens of bytes to tens of megabytes), these factors have very different effects on the performance of the distributed system. In addition, if 100TB data needs to be stored in advance, if 100MB data is written per second, 100 × 1024 × 1024/100=1048576 seconds=291.27 hours=12 days is required to write 100TB data. Can we tolerate such a long time of data preparation? To solve this problem, we need to conduct in-depth analysis of the system architecture design, design test scenarios, and design test cases in advance to start preparing test data as soon as possible.
Performance or stress testing tools. Generally speaking, the testing of distributed systems needs to develop some testing tools to meet the requirements of performance testing. If possible, it is suggested that such a test tool should be implemented by the test engineer himself, because the test engineer knows his own test requirements better. When you need to develop your own test tools, there are two key issues to focus on: first, the collection method and calculation of some key data will become the key to the performance test tools, for example, TPS (requests per second) Accuracy of Throughput calculation; Second, ensure the performance of the performance test tool. If the performance of the tool itself is poor, it will not give the distributed system enough pressure to test. In addition, when considering multiple concurrency (for example, there are 100000 concurrent clients connecting at the same time), if the performance test tool can only run 50 or less on a test machine, then the number of test machines needed will also be very large (for example, 2000 test machines), which may be a cost that many companies cannot afford. Therefore, the performance of the performance testing tool itself must be good enough to meet the requirements and reduce testing costs.
Automated test of distributed system
Automated testing is the inevitable trend of the development of testing industry, and it is no exception for distributed system testing. In the process of implementing distributed system automated testing, we may encounter the following two difficult problems.
There are many platforms and miscellaneous hardware involved, so it is difficult to control the test flow. In the process of implementing automated testing, the test script needs to control many operating systems and applications, and has cross platform characteristics. At the same time, it may also need to control some network devices. Therefore, selecting an excellent automated testing framework has become one of the most important tasks. From our practical experience, STAF is a good choice. Its platform support (Windows and Linux versions) and development language support are comprehensive.
The verification of test results is complex. For automated testing of distributed systems, we need to collect various test result data through test scripts to verify the correctness of test results. In the process of implementing automated testing, we can partially modularize the test result data collection, and check whether the data is correct through each sub module. For example, we will design a log analysis module, which is mainly responsible for collecting corresponding data from the log of the server application for comparative verification (it is particularly important to add keywords in the piling log mentioned earlier in this article).
With the development of the Internet, large-scale distributed systems are becoming more and more complex and important. How to effectively ensure the continuous and stable operation of large distributed systems in 7 × 24 hours has become an important topic. [1]