Troubleshooting ideas for Linux/CENTOS systems with high CPU utilization and high load

If the CPU of the AliCloud ECS Linux system continues to run high, the system stability and business operation will be affected. This article briefly describes the troubleshooting analysis of the problem of high CPU utilization. Note: The relevant configurations and instructions in this article have been tested in the CentOS 6.5 64 bit operating system. Other types and versions of operating system configurations may differ.

CPU load viewing method


  • Use vmstat to view the CPU load of the system latitude
  • Use top to view the CPU load of the process latitude

Use vmstat to view the CPU load of the system latitude

The usage of CPU resources can be viewed from the system dimension through vmstat.

Instruction:

  1. Format: vmstat - n one

     


  2. - n one Indicates that the result is refreshed once a second.

    Sample output: $ vmstat - n 1procs —————- memory ————— —- swap ——- io —— - system ——— cpu ——- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st zero   zero       zero   seventy thousand three hundred and fifty-two one hundred and sixty-nine thousand four hundred and forty-eight four hundred and forty-eight thousand four hundred and fifty-two     zero     zero     zero     four   ten   eleven   zero   zero ninety-nine   zero   zero zero   zero       zero   seventy thousand three hundred and seventy-six one hundred and sixty-nine thousand four hundred and forty-eight four hundred and forty-eight thousand four hundred and eighty-four     zero     zero     zero     zero   one hundred and seventy-five   four hundred and six   zero   zero one hundred   zero   zero zero   zero       zero   seventy thousand three hundred and seventy-six one hundred and sixty-nine thousand four hundred and forty-eight four hundred and forty-eight thousand four hundred and eighty-four     zero     zero     zero     zero   one hundred and seventy-three   four hundred and fourteen   zero   one ninety-nine   zero   zero zero   zero       zero   seventy thousand three hundred and seventy-six one hundred and sixty-nine thousand four hundred and forty-eight four hundred and forty-eight thousand four hundred and eighty-four     zero     zero     zero   one hundred and twenty-eight   two hundred and twelve   four hundred and twenty-nine   three   zero ninety-six   one   zero ^ C

Echo description:

Description of main data columns in the returned results:

  • r: Indicates the thread that the CPU in the system is waiting for processing. Since the CPU can only process one thread at a time, the higher the value, the slower the system usually runs.
  • Us: The percentage of CPU time consumed by user mode. When the value is high, it indicates that the user process consumes more CPU time. For example, if the value exceeds 50% for a long time, the program algorithm or code needs to be optimized.
  • Sy: The percentage of CPU time consumed by kernel mode.
  • Wa: The percentage of CPU time consumed by IO waiting. When the value is high, it indicates that the IO wait is serious. This may be caused by a large number of random accesses to the disk, or the disk performance may be a bottleneck.
  • Id: The percentage of CPU time in idle state. If the value continues to be 0 and sy is twice that of us, it usually indicates that the system is facing a shortage of CPU resources.

Use top to view the CPU load of the process latitude

You can view the usage of its CPU, memory and other resources from the process dimension through top.

Instruction:

  1. Format: top

     

  2. Sample output: top - seventeen : twenty-seven : thirteen up twenty-seven days ,   three : thirteen ,   one user ,  load average : zero point zero two , zero point zero three , 0.05Tasks :   ninety-four total ,   one running ,   ninety-three sleeping ,   zero stopped ,   zero zombie % Cpu ( s ):   zero point three us ,   zero point one sy ,   zero ni , ninety-nine point five id ,   zero wa ,   zero hi ,   zero si ,   zero point one stKiB Mem :   one million sixteen thousand six hundred and fifty-six total ,   nine hundred and forty-six thousand six hundred and twenty-eight used ,     seventy thousand and twenty-eight free ,   one hundred and sixty-nine thousand five hundred and thirty-six buffersKiB Swap :         zero total ,         zero used ,         zero free .   four hundred and forty-eight thousand six hundred and forty-four cached Mem

  3.  PID USER      PR  NI    VIRT    RES    SHR S % CPU % MEM     TIME + COMMAND     one root       twenty   zero   forty-one thousand four hundred and twelve   three thousand eight hundred and twenty-four   two thousand three hundred and eight S   zero   zero point four   zero : nineteen point zero one systemd     two root       twenty   zero       zero       zero       zero S   zero   zero   zero : zero point zero four kthreadd

Echo description:

The third line of the default interface displays the overall usage of the current CPU resources, and the resource usage of each process is displayed below.

You can directly enter the big and small letters P in the interface to make the monitoring results be arranged in reverse order according to the CPU utilization rate, and then locate the processes that occupy more CPU in the system. Finally, according to the system log and the program's own log, the corresponding process is further investigated and analyzed to determine the reason for its high CPU usage.

Operation case


Use top to directly terminate processes with high CPU consumption

As mentioned earlier, you can view the load problem of the system through the top command, and locate the processes that consume more CPU resources.

You can quickly terminate the corresponding abnormal process directly on the top running interface. The description is as follows:

  1. To terminate a process, just press the lowercase k key.
  2. Enter the PID of the process you want to terminate (the first column of top output results). For example, as shown in the figure below, if you want to terminate the process with PID 23, enter 23 and press Enter.
    Linux/CENTOS 系统 CPU 占用率较高负载较高问题排查思路
  3. As shown in the figure below, after the operation is successful, a prompt message similar to "Send pid 23 signal [15/symbol]" will appear on the interface for the user to confirm. Press Enter to confirm.
    Linux/CENTOS 系统 CPU 占用率较高负载较高问题排查思路

Low CPU usage but high load

  • Problem description:
    There is no business program running in the Linux system. Through top observation, as shown in the figure below, the CPU is idle, but the load average is very high:
    Linux/CENTOS 系统 CPU 占用率较高负载较高问题排查思路
  • Treatment:
    Load average is an evaluation of CPU load. The higher the value, the longer the task queue, and the more tasks waiting to be executed.
    When this happens, it may be caused by a rigid process. Can be commanded ps -axjf Check whether the D status process exists.
    D state refers to the uninterruptible sleep state. A process in this state cannot be killed or exited by itself. It can only be solved by restoring the resources it depends on or restarting the system.
    Linux/CENTOS 系统 CPU 占用率较高负载较高问题排查思路

 

The kswapd0 process occupies a high CPU

The operating system uses a paging mechanism to manage physical memory. The operating system draws a part of the disk as virtual memory. Since the speed of memory is much faster than that of the disk, the operating system needs to change unwanted pages to the disk according to a certain paging mechanism, and transfer the needed pages to memory. Due to the continuous shortage of memory, this paging action continues, Kswapd0 is responsible for page changing in the virtual memory management. When the server memory is insufficient, kswapd0 will perform the page changing operation, which consumes the host CPU resources. If top finds that the process is in a non sleep state continuously and runs for a long time, it can be preliminarily determined that the system is continuously changing pages, and the problem can be turned to the cause of insufficient memory for troubleshooting.

  • Problem description:
    The kswapd0 process consumes a lot of CPU resources of the system.
  • Treatment:
    Linux system uses paging mechanism to manage memory, and at the same time, part of the disk is marked out as virtual memory. And kswapd0 is the process responsible for page changing in the virtual memory management of Linux system. When the system memory is insufficient, kswapd0 will frequently perform page changing operations. Because the paging operation consumes CPU resources very much, it will cause the process to continue to consume high CPU resources.
    If top and other monitors find that the kswapd0 process is continuously in a non sleep state, runs for a long time, and continues to use high CPU resources, it is usually because the system is constantly performing page changing operations. Then you can further query the memory occupation of the system and the processes in the system through free, ps and other instructions for further troubleshooting and analysis.

This article is reproduced on the Internet..

 

 Watson Blog
  • This article is written by Published on May 17, 2019 17:19:13
  • This article is collected and sorted by the website of Mutual Benefit, and the email address for problem feedback is: wosnnet@foxmail.com , please keep the link of this article for reprinting: https://wosn.net/2476.html

Comment