background

A few weeks ago, I used my spare time to write a recently popular midjourney The backend service of the generation tool is implemented in Java code. Yesterday, we checked a bug and logged in to the Linux server to view the program log. Then we saw from the monitoring of the terminal provided by MobiXterm that the server CPU has been in 100% status for a long time. For 2h4c, the configuration memory is almost half used, The QPS is also very low, so the CPU is not always so complicated. So we checked it and made a record of the solution.

Troubleshooting steps

  1. Log in to the server, check the process with the highest CPU consumption, and execute top -c It is found that the Java program causes the high CPU;
  2. Check the server log to see if the request volume is too large and the machine performance is poor. After checking, this is not the reason, so go to the next step;
  3. Because the Docker deployment is used, the host cannot locate the specific problem by checking the pid. Here we directly use arthas Positioning to specific problems;
  4. docker stats Check the container with the highest CPU consumption, and then use the Docker exec - it {container id}/bin/bash Enter the container
  5. Because the Docker installation is to minimize the image, there is no jdk environment, only the jre running environment, and it cannot be executed like jps Command, so first temporarily install a jdk in the docker container, and then start arthas Conduct troubleshooting;
  6. The steps to install the jdk and start the arthas command are as follows:

     #Download jdk wget  https://mirrors.huaweicloud.com/java/jdk/8u202-b08/jdk-8u202-linux-x64.tar.gz #Decompress tar -zxvf jdk-8u202-linux-x64.tar.gz #Download arthas wget  https://arthas.aliyun.com/arthas-boot.jar #Start arthas ./jdk-8u202-linux-x64/bin/java -jar arthas-boot.jar
  7. implement thead -n 3 Check the three busiest threads of the CPU and record the pid;
  8. Continue execution Thead pid number , locate the specific code, and my log is as follows:

     [ arthas@1 ]$ thread 5781 "http-nio-8080-exec-211" Id=5781 RUNNABLE at java.util.regex.Pattern$CharProperty.match(Pattern.java:3778) at java.util.regex.Pattern$Branch.match(Pattern.java:4606) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660) at java.util.regex.Pattern$Loop.match(Pattern.java:4787) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4719) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4570) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3779) at java.util.regex.Pattern$Branch.match(Pattern.java:4606) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660) at java.util.regex.Pattern$Loop.match(Pattern.java:4787) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4719) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4570) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3779) at java.util.regex.Pattern$Branch.match(Pattern.java:4606) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660) at java.util.regex.Pattern$Loop.match(Pattern.java:4787) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4719) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4570) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3779)

Analysis results

According to the troubleshooting results, it was found that the regular expression code was only useful for making pre judgment in an interface because of the waste of resources in the cycle during regular matching. It quickly located the problem of regular expression. After consulting relevant information and asking ChatGPT, it was generally determined that the problem was the irregular regular expression that caused the cycle to run out of control, Analyze reference articles link If you are interested in it, you can click it to have a look. Finally, you can modify the regular expression and republish the service. The CPU usage returns to normal.

summary

In the process of this operation, I encountered many problems. Many reference articles are based on Linux instructions and with the help of jstack The jps command is not used to check the java programs in the container, and the pid corresponding to the container is not the same as the pid corresponding to the host. Therefore, the only way to check is to locate the specific container and enter the container. The basic image used by the general container is to ensure that it is simple enough, so it is necessary to temporarily install some environments, Arthas, a tool recommended here, can greatly reduce the complexity of troubleshooting Java programs.

Finally, we should remind you how long love is. There may be problems with writing regular expression reference online. You should consider it carefully before using it to avoid similar online problems.

Reference link

Official document of Java diagnostic tool arthas

Troubleshooting 100% Java Process CPU Problems

Troubleshooting of Java application CPU soaring (more than 100%)

Regex gone wild: java.util.regex.Pattern matcher goes into high CPU loop

Article Contents