Record the performance optimization process of Kafka cluster

Four points of optimization have been done, and the results are very obvious, especially the things done in the optimization process.

numactl

This optimization is mainly aimed at the situation where the server CPU architecture is NUMA. The kafka process is started through systemd. Configure ExecStart When, add /usr/bin/numactl --interleave=all , such as:

 ExecStart=/usr/bin/numactl --interleave=all /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties

stay Interleave The program performance under the mode is higher than the default affinity mode, sometimes up to 30%. The root cause is that most workload distributions of Linux servers are random: that is, when each thread processes the logic corresponding to each external request, the memory it needs to access is physically randomly distributed. and Interleave The mode just aims at this feature to randomly spread memory pages to each CPU core, so that the load of each CPU and Remote Access The frequency of occurrence is evenly distributed.

Number of threads configuration

The configuration here is mainly tested. The test method is to use a combination of different number of threads to observe the performance of the cluster under the same pressure test parameter. Test environment: two CPUs, each with 24 cores, each with 32G of local memory

The best configuration is as follows:

 num.network.threads=24 num.io.threads=48 num.replica.fetchers=3

Thread configuration priority

When configuring systemd, add Nice=-20

Nice means:

 Nice= Sets the default nice level (scheduling priority) for executed processes. Takes an integer between -20 (highest priority) and 19 (lowest priority).  See setpriority(2) for details.

In addition, the purpose is to configure the highest priority by setting the value of Nice, so that the process or thread can run permanently on the specified CPU, and also prevent the process or thread from switching.

Micro batch production message

This part mainly refers to the micro batch processing performed by the Kafka producer during production. The so-called micro batch processing means that the producer waits for 30ms before sending each time, and after 30ms, the producer sends the cached message at one time.

In general, the cumulative messages of the producer are only sent to the buffer in memory, while sending messages involves network I/O transmission. The time scale of memory operation is different from that of I/O operation. The former is usually hundreds of nanoseconds, while the latter ranges from milliseconds to seconds.

Later, by monitoring the resources on the machine, such as memory, each machine only uses more than 14G of memory, including process memory and file cache/buffer. Obviously, the threshold of 30ms is set too high, and can be changed to 10ms later.

Optimization results

1. The time consumption is reduced, and the frequency of 100+ms production transmission time consumption is reduced
2. The use of kafka cluster resources has dropped significantly, especially the memory part. From the full memory use of each machine in the past, each machine now uses only one fifth of the total memory

reference material

1、 CPU of NUMA architecture -- do you really use it well?
2、 Confusion about the throughput and processing speed of the message queue?
3、 Identify performance issues
4、 Priorities of processes and threads

This article is written by Chakhsu Lau Creation, adoption Knowledge Sharing Attribution 4.0 International License Agreement.
All articles on this website are original or translated by this website, except for the reprint/source. Please sign your name before reprinting.