Linux understands CPU load
in Note with 0 comment
Linux understands CPU load
in Note with 0 comment

It is inevitable to encounter Linux performance problems in work, but I don't know how to start, so this article will help us understand Linux performance problems and help us better solve them.

Load average

Those familiar with Linux know that using top uptime The command can view the load average indicator.

 uptime 16:22:29 up 3 days, 19:14,  1 user,   load average: 0.08, 0.03, 0.05

Use man uptime to view the load average explanation:

System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.

Understand the key points. Average load means that in unit time Operational status And Non interruptible state The average number of processes for Average number of active processes It is worth noting that, It is not directly related to CPU utilization

Use the command ps aux to view the process status stat, as noted in this article:

Why can't the D state be interrupted? For example, the system calls the I/O response of the hardware device. To ensure data consistency, the disk device cannot be interrupted by other processes or interrupts before returning data. If it is interrupted, it is easy to cause inconsistency between the disk data and the process data. Therefore, the non interruptible (D) state is a protection mechanism of the system for processes and hardware devices.

Average number of active processes Strictly speaking, it is the exponential decay average of the number of active processes (the decline speed of a quantity is proportional to its value). Generally, it is understood as Number of active processes per unit time OK.

CPU utilization

From the CPU point of view, Load average only reflects the number of processes that occupy CPU per unit time, but CPU utilization is not directly related to the number of processes. We can use the command top vmstat Check the CPU utilization. There are the following indicators:

 top ... %Cpu(s):  0.2 us,  0.1 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st ...

How to use LA and CU

LA is load average, that is, the average number of active processes, CU is CPU use, that is, CPU utilization

Generally speaking, If the load average is less than the number of CPUs, the machine performance meets the service requirements, and it doesn't matter if it exceeds some, Load average does not directly represent CPU utilization. It may be that IO blocks a lot. When the load average is higher than 70% of the number of CPUs, it may cause the process response to slow down, thus affecting the normal function of the service.

From historical data

Generally speaking, top uptime The indicators of load average at three time points are: 1 minute, 5 minutes, and 15 minutes. This reflects the recent state change trend of the system. In the actual production environment, we need to make long-term monitoring records. If there is an abnormal value change, such as the average load is twice that of the CPU, you need to analyze and investigate the problem.

Comprehensive analysis of two indicators

Combining the two indicators of load average and CPU utilization, the following possible situations will occur:

Simulation and verification

How can we analyze cases of different combinations of the two indicators, load balancing and CPU utilization, and find the source of the indicator changes?

Two tools are mainly used: stress System pressure testing tools, and sysstat Performance analysis toolkit, in which we will use sysstat In the toolkit mpstat and pidstat

 yum install stress yum install sysstat

simulation

use stress Simulate the following two scenarios

1、 CPU intensive processes

 #Simulate a process. The CPU utilization rate is 100%, and the time limit is 600s stress --cpu 1 --timeout 600

2、 IO intensive processes
- i option of stress, spawn N workers spinning on sync()

 #Simulate a process to execute sync continuously stress -i 1 --timeout 600

3. Scenarios of a large number of processes

 #Simulate 16 processes, 100% CPU utilization, 600s time limit stress --cpu 16 --timeout 600

verification

 mpstat -P ALL 5

Monitor all CPUs and output a set of data every 5 seconds. Pay attention to the indicator% usr utilization rate and% iowait IO blocking time. From this, we can determine whether it is CPU intensive or IO intensive.

 pidstat -u 5 1

The statistics interval is 5 seconds, and the data of the processes that have used the CPU. Pay attention to the indicator% usr utilization rate,% wait for the time to use the CPU, from which you can judge whether there are too many processes (threads).


End

Responses