Mobile IM Development Guide 2: Heartbeat Command Details

original
2018/06/29 07:44
Reading 470

The Mobile IM Development Guide series will introduce all aspects of an IM APP, including technology selection, login optimization, etc. In addition, the author of this article will combine his Netease Yunxin Years of iOS IM SDK Development experience, in-depth analysis of various common problems in actual development.

 

Recommended reading

Mobile IM Development Guide 1: How to select technology

Mobile IM Development Guide 3: How to Optimize the Login Module

 

What is the heartbeat command?

Heartbeat is often involved in the design of IM services using TCP long connections. Heartbeat generally refers to that a certain end (in most cases, the client) sends user-defined instructions to the opposite end every certain time to judge whether the two sides are alive. Because it sends at a certain interval, similar to the heartbeat, it is called the heartbeat instruction.

 

Why do I need to heartbeat in the application layer?

So why is it necessary to heartbeat at the application layer? Isn't TCP a reliable connection? Can't we rely on TCP for disconnection detection? For example, the KeepAlive mechanism of TCP is used. Is application layer heartbeat the best practice today? What kind of heartbeat is the best practice?

Have you never considered these problems carefully before? It's just a simple heartbeat!  

For the client, the biggest driving force of using TCP long connection to achieve business is that when the current connection is available, each request is just a simple data sending and receiving, which eliminates DNS resolution, connection establishment and other time, greatly speeds up the request speed, and is also conducive to receiving real-time messages from the server.

But only if the connection is available. If the connection cannot be maintained well, every request will become a hit: lucky, send the request through a long connection and receive feedback. Unfortunately, the current connection has failed, and the request has not received feedback until it timed out. It needs another connection establishment process, which is not even as efficient as HTTP. The premise of connection maintenance is to detect the availability of the connection, and actively abandon the current connection and establish a new connection when the connection is unavailable.

Based on this premise, there must be a mechanism to detect connection availability. At the same time, the particularity of mobile network also requires clients to send certain signaling in their spare time to avoid the connection being recycled. See《 Tear B of WeChat and operators 》。

For the server, it is also very important to know the connection availability in a timely manner: on the one hand, the server needs to clean up invalid connections in a timely manner to reduce the load; on the other hand, it is also a business requirement. For example, the server in the game replica needs to deal with the problems caused by players' disconnection in a timely manner.

The importance of maintaining connections has been mentioned above. Now let's return to the specific implementation. Why do we need to use the application layer heartbeat for detection, rather than directly using TCP features?

We know that TCP is a connection based protocol, and its connection state is maintained by a state machine. After the connection, both parties will be in the established state, and the state after that will not change actively. This means that if the upper layer does not make any calls and keeps the TCP connection idle, the connection will remain connected even though there is no data. One day, one week, or even one month, even if the route crashes and restarts countless times during this period. To take a typical example, when we ssh to our VPS and accidentally kick off the network cable, the network changes will not be detected by TCP. When we plug back into the network cable, we can still use ssh normally, and no TCP reconnection occurs at this time.

Some people will say that TCP does not have the KeepAlive mechanism, and it can be implemented through this mechanism? But in fact, the mechanism of TCP KeepAlive is not applicable to this. After the Keep Alive mechanism is enabled, the TCP layer will send the corresponding KeepAlive probe to determine the connection availability after the scheduled time expires. The general time is 7200 s. After failure, try again 10 times, and the timeout is 75 s. Obviously, the default value cannot meet our needs, but can it be satisfied after modifying the settings? The answer is still no. Because TCP KeepAlive is used to detect whether the connection is alive or dead, and the heartbeat mechanism has an additional function: detecting the survival status of the communication parties. The two sounds like the same meaning, but in fact they are quite different. Consider a case where a server has an extremely high load for some reason, and the CPU is 100%, Unable to respond to any service request, but the connection status can still be determined using the TCP probe. This is a typical status where the connection is alive but the service provider is dead. For the client, the best choice is to reconnect to other servers after disconnection, rather than always thinking that the current server is available, Always send some requests to the current server that are bound to fail.

From the above, we can see that KeepAlive is not suitable for the scenario of detecting the survival of both parties. This scenario also depends on the heartbeat of the application layer. The heartbeat of the application layer has greater flexibility, which can control the detection time, interval and processing flow, and even attach additional information to the heartbeat packet. From this perspective, the heartbeat of the application layer is indeed a best practice.

 

How to implement heartbeat command?

From the above, we can draw a conclusion that at present, the heartbeat of the application layer is really a best practice to detect the connection validity and whether the two sides are alive. Then the remaining problem is how to implement it.

The simplest and most rude way is to beat at regular intervals. For example, if the heartbeat occurs once every 30 seconds, and no heartbeat packet is received within 15 seconds, the current connection is considered invalid, and the connection is disconnected and reconnected. This method is the most direct and easy to implement. The only problem is comparing power consumption and flow consumption. Based on 5 bytes of a protocol packet, 2880 heartbeat packets are sent and received in a day. A month is 5 * 2 * 2880 * 30=0.8 M of traffic. If several more IM software are installed on the mobile phone, several megabytes of traffic will be lost every month, not to mention the power loss caused by frequent heartbeat.

Since frequent heartbeat will lead to the abuse of power consumption and flow consumption, the direction of improvement is naturally to reduce the heartbeat rate, but it cannot too affect the real-time of connection detection. Based on this requirement, the heartbeat interval can generally be adjusted according to the program status. When the program is in the background (Android is mainly considered here), try to lengthen the heartbeat interval by 5 minutes or even 10 minutes. When the app is in the foreground, the original rules are followed. The judgment of connection reliability can also be relaxed to avoid the situation that the connection is invalid after a heartbeat timeout. Using error accumulation, the current connection is determined to be unavailable only after the heartbeat timeout is n times. Of course, there are also some small traps, such as timing the heartbeat packet cycle from the last instruction packet received instead of a fixed time, which can also reduce the number of heartbeats to a certain extent.

 

That's it Netease Yunxin For the understanding and practice of heartbeat commands, the third article in the Mobile IM Development Guide will introduce how to optimize the login module. Please wait.

 

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top