Collection
zero Useful+1
zero

Concurrency

[bìng fā]
Announce Upload video
Terminology in computer field
synonym Concurrent programming (Processing multiple tasks "simultaneously" on one processor) Generally refers to concurrency
Concurrency, in operating system In, it means that several programs in a period of time are in the period from the start to the end, and these programs are all in the same processor But at any time point, only one program runs on the processor.
Chinese name
Concurrency
Foreign name
Concurrent
Pinyin
bìng fā

Lexical concept

Announce
edit
Headword : Concurrency
Basic explanation:
[] refers to [another disease] caused by an existing disease.
complication. [1]
Citation explanation:
1. Open at the same time.
Jin Fu Xuan The third volume of Fu Zi: "Ye is like the concurrence of spring flowers, and its fragrance is like that of autumn orchids."
2. The predicate follows. [1]

Basic meaning

Announce
edit
stay operating system In, concurrency means that several programs in a period of time are running from the start to the end, and they are all in the same processor But at any time point, only one program runs on the processor.
In a relational database, processes that allow multiple users to access and change shared data at the same time. SQL Server uses locks to allow multiple users to access and change shared data at the same time without conflict.

characteristic

Announce
edit
operating system Concurrent program Features of implementation:
In the concurrent environment, due to the breaking of the closeness of the program, new features have emerged:
① The program no longer corresponds to the calculation one by one. One copy of the program can have multiple calculations
Concurrent program There are mutual constraints between them. Direct constraints are reflected in the calculation results of one program requiring another program, and indirect constraints are reflected in the competition of multiple programs for a resource, such as processors buffer Etc.
Concurrent program In the process of implementation, it moves on and off.

Difference from parallel

Announce
edit
Concurrency When there are multiple threads operating, if the system has only one CPU, it is impossible for it to actually run more than one thread at the same time. It can only divide the CPU running time into several time periods, and then allocate the time periods to each thread for execution. When thread code in one time period runs, other threads are suspended This method is called concurrent.
Parallel: When the system has more than one CPU, thread operations may not be concurrent. When a CPU executes a thread, another CPU can execute another thread. The two threads do not preempt each other's CPU resources, but can do so simultaneously. This method is called parallel.
difference: Concurrency and parallelism are two concepts that are both similar and different. Parallelism means that two or more events occur at the same time; Concurrency means that two or more events occur at the same time interval. stay Multiprogramming Environment, Concurrency It means that there are multiple programs running at the same time in a macro level for a period of time, but when Single processor system However, only one program can be executed at a time, so these programs can only be executed alternately in time sharing. If in computer system Multiple in processor , then these programs that can be executed concurrently can be allocated to multiple processors to implement Parallel execution That is, each processor is used to process a program that can be executed concurrently, so that multiple programs can be executed simultaneously. [2]

Concurrent processing

Announce
edit
[3] A small website, such as a personal website, can be realized by using the simplest html static page. With some pictures to achieve the beautification effect, all pages are stored in a directory. Such a website has very simple requirements for system architecture and performance. With the continuous enrichment of Internet business and the development of website related technologies over the years, It has been subdivided into very detailed aspects, especially for large websites, the technology used is very extensive, from hardware to software, programming language, database, WebServer, firewall and other fields have high requirements, which is no longer comparable to the original simple html static website.
Large websites, such as portals. In the face of a large number of user visits and high concurrent requests, the basic solution focuses on the following aspects: using high-performance servers, high-performance databases, efficient programming languages, and high-performance Web containers. But in addition to these aspects, there is no way to fundamentally solve the high load and high concurrency problems faced by large websites.
The solutions provided above also mean greater investment to a certain extent, and such solutions have bottlenecks and do not have good scalability. Let me talk about my experience from the perspective of low cost, high performance and high scalability.

HTML static

In fact, we all know that the most efficient and the least consuming is the pure static html page, so we try our best to make the pages on our website use static pages. This simplest method is actually the most effective method. But for websites with a large amount of content and frequent updates, we can't implement them all manually one by one, so we have a common information release system CMS, such as the news channels of various portal sites we often visit, and even their other channels, which are managed and implemented through the information release system, The information release system can realize the simplest information entry and automatically generate static pages, and also has the functions of channel management, permission management, automatic capture, etc. For a large website, it is essential to have an efficient and manageable CMS.
In addition to portal and information publishing websites, for community websites with high interactivity requirements, it is also a necessary means to improve performance by making them as static as possible. It is also a widely used strategy to make posts and articles in the community static in real time, and to make them static again when they are updated. For example, Mop's hodgepodge uses such a strategy, The same is true of Netease community.
At the same time, html static is also a means of using some caching strategies. For applications that frequently use database queries but have little content updates in the system, you can consider using html static to achieve, such as the public setting information of the forum in the forum, which can be managed in the background and stored in the database in the current main stream forum, In fact, a large number of these information are called by the foreground program, but the update frequency is very small. You can consider static updating this part of the content when background updating, so as to avoid a large number of database access requests.

Picture server separation

As you know, for Web servers, whether Apache, IIS or other containers, images are the most resource consuming, so it is necessary to separate images from pages, which is basically a strategy adopted by large websites. They all have independent image servers, or even many image servers. This architecture can reduce the pressure on the server system that provides page access requests, and ensure that the system will not crash due to image problems. On the application server and image server, different configuration optimizations can be carried out. For example, Apache can support as few LoadModules as possible when configuring ContentType, Ensure higher system consumption and execution efficiency. This is a relatively easy implementation. If the server cluster is more convenient to operate, and if it is an independent server, the novice may upload pictures. Only when the server is local, the IIS set by one server can use the network path to implement the picture server, which can not change the program, but also improve the performance, However, there is no change in the IO processing performance of the server itself.

Database cluster and database table hash

Large websites have complex applications. These applications must use databases. When faced with a large number of accesses, the bottleneck of the database will soon appear. At this time, a database will soon be unable to meet the application requirements, so we need to use database clusters or database table hashing.
In terms of database cluster, many databases have their own solutions. Oracle, Sybase, etc. have good solutions. The Master/Slave provided by MySQL is also a similar solution. You can refer to the corresponding solutions to implement what kind of DB you use.
The database cluster mentioned above will be limited by the type of DB used in terms of architecture, cost and scalability, so we need to consider improving the system architecture from the perspective of applications. Database table hashing is a common and most effective solution. We install business and application or function modules in the application program to separate the database. Different modules correspond to different databases or tables. Then, we use a certain strategy to hash a page or function into smaller databases, such as user tables, and hash tables according to user IDs, This can improve the performance of the system at a low cost and has good scalability. Sohu's forum adopts such an architecture, which separates the forum's users, settings, posts and other information from the database, and then hashes the database and tables for posts and users according to their sections and IDs. Finally, a simple configuration can be made in the configuration file to enable the system to add a low-cost database at any time to supplement the system's performance.

cache

The term "cache" has been used in many places. Website architecture and cache in website development are also very important. First, we will talk about the two most basic caches. Advanced and distributed caching will be described later.
For architecture caching, people familiar with Apache can know that Apache provides its own caching module, and can also use the additional Squid module for caching, both of which can effectively improve the access response capability of Apache.
For caching in website program development, the Memory Cache provided on Linux is a common cache interface that can be used in web development. For example, when developing in Java, MemoryCache can be called to cache and share some data. Some large communities use this architecture. In addition, when using web language for development, all languages basically have their own cache modules and methods. PHP has Pearl's Cache module, and Java has more. Net is not very familiar with it, and I believe there must be.

image

Image is a common way to improve performance and data security for large websites. Image technology can solve the user access speed difference caused by different network access providers and regions. For example, the difference between ChinaNet and EduNet has prompted many websites to set up image sites in the education network and update data regularly or in real time. In terms of the details of image technology, I will not elaborate too deeply here. There are many professional off the shelf solution architectures and products to choose from. There are also low-cost software implementation ideas, such as rsync on Linux and other tools.

load balancing

Load balancing will be the ultimate solution for large websites to solve high load access and a large number of concurrent requests.
With the development of load balancing technology for many years, there are many professional service providers and products to choose from. I personally have been exposed to some solutions, including two architectures for your reference.
1. Software layer 4 switching
After you know the principle of the hardware layer 4 switch, the software layer 4 switch based on the OSI model came into being. This solution has the same principle, but its performance is slightly poor. However, it is easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing ability depends entirely on your familiarity with the configuration.
We can use the LVS commonly used on Linux to solve the software four layer switching. LVS is the Linux Virtual Server. It provides a real-time disaster response solution based on heartbeat, which improves the robustness of the system. At the same time, it provides flexible virtual VIP configuration and management functions, which can meet a variety of application requirements at the same time. This is essential for distributed systems.
A typical strategy of using load balancing is to build squid clusters on the basis of software or hardware four layer switching. This idea has been adopted on many large websites, including search engines. This architecture has low cost, high performance and strong scalability. It is easy to add or remove nodes from the architecture at any time. I'm going to organize this structure in detail and discuss it with you.
For large websites, each of the methods mentioned above may be used at the same time. I will introduce them briefly here. Many details in the specific implementation process still need to be known and experienced. Sometimes a small squid parameter or Apache parameter setting will have a great impact on the system performance.
2. Hardware layer 4 switching
Layer 4 switching uses the header information of layer 3 and layer 4 packets, identifies the traffic flow according to the application interval, and allocates the traffic flow of the entire interval segment to the appropriate application server for processing. The Layer 4 switching function is like a virtual IP, pointing to the physical server. The services it transmits follow a variety of protocols, including HTTP, FTP, NFS, Telnet, or other protocols. These services require complex load balancing algorithms on the basis of physical servers. In the IP world, the service type is determined by the terminal TCP or UDP port address, and the application range in Layer 4 switching is determined by the source and terminal IP addresses, TCP and UDP ports.
In the field of hardware four layer switching products, there are some well-known products to choose from, such as Alteon, F5, etc. These products are expensive, but they are worth the money. They can provide excellent performance and flexible management capabilities. Yahoo China used three or four Alteons for nearly 2000 servers.