Exploration and Practice of Netease Yunxin in the Converged Communication Scenario - SIPGateway Service Architecture

original
2020/12/07 12:03
Reading number 143

Author: Zhu Zhenhao, technical expert of NetEase Cloud Information, Guan Shifu

1 Background

At the beginning of 2020, the outbreak of COVID-19 epidemic and the overall economic downturn still brought huge business opportunities to some fields, especially in the field of real-time audio and video, more and more enterprises chose RTC cloud conference for communication and collaboration Because of the market foundation of traditional video conferencing in the past decade, many enterprises are still using traditional hardware conferencing systems (PSTN/SIP protocol access) provided by Polycom, Cisco, etc. RTC cloud conferencing may not completely replace hardware conferencing systems in a short time, but in the current market environment, Convergence communication must be a major demand that more and more enterprises tend to choose

2 Comparison between hardware video conference and RTC cloud conference

Let's first take a look at how hardware video conferencing and RTC cloud conferencing are realized through which specific technologies, and what are their advantages and disadvantages.

Hardware video conference generally adopts MCU architecture (MCU: one of the multi person communication architectures, multi-point control unit, which is characterized by the server mixing all uplink media streams into one path and transferring them to the receiver), PSTN/SIP standard (SIP protocol: short for Session Initiation Protocol, protocol standard in the communication field) protocol access, which is complex in deployment and expensive in hardware cost, but can save bandwidth and terminal processing capacity.

RTC cloud conference takes NetEase conference as an example. It uses SFU architecture (SFU: one of the multi person communication architectures. Select the forwarding unit, which is characterized by The server forwards each subscribed upstream stream to the receiver, instead of mixed streams ), private signaling access, using the NERTC (NERTC: NetEase Yunxin's self-developed audio and video communication scheme) audio and video coding system to achieve a better audio and video effect experience, and supporting Simulcast (Simulcast: multi stream transmission, that is, allowing the same terminal to send video streams of different resolution levels at the same time) flexible deployment, However, the bandwidth is large and the terminal processing capacity is required.

So how to realize the interconnection between hardware video conference and RTC cloud conference is an urgent problem to be solved. Netease Yunxin has also carried out a lot of exploration and practice in the context of integrated communication, Explore the way to realize integrated communication through SIP Gateway , and applied in Netease conferences. We will come today Share the practice of NetEase Conference on SIP Gateway server architecture in the context of integrated communication

3 SIPGateway

Architecture diagram

The figure shows the architecture of the SIP Gateway server, which is mainly divided into several modules:

  • SIP protocol access module: used to realize the access of SIP users and support RTP/RTCP protocol;
  • Yunxin RTC protocol access module: used to realize the access of Yunxin RTC users and support the private protocol of Yunxin RTC;
  • Conference management module: mainly maintain the status of rooms and participants, conference number management, etc;
  • Encoding and decoding module: mainly deal with audio and video encoding and decoding and transcoding;
  • MCU module: SIPGateway realizes the function of mixing sound and screen. Since SIP standard terminal does not support the ability of Simulcast, nor does it support receiving multiple streams, it needs to do mixed screen mixing on SIPGateway and send it to SIP terminal;
  • SFU module: support pure forwarding mode or select forwarding mode according to voice excitation;

How can we use the SIP Gateway server to realize cloud conference Take a look at the actual application in two specific application scenarios

Instance application scenario

Scenario 1: NetEase conference APP initiates the conference and supports SIP terminal to join NetEase conference.

As shown in the figure above, when the conference is initiated by NetEase Conference APP, NetEase Conference will synchronously create a SIP conference short number, and the user can join the conference after entering the short number on the SIP terminal.

The specific SIP login process will not be described too much, but can refer to the SIP standard protocol (detailed protocol link is provided at the end of this article).

Let's see that in such a scenario, the specific media stream direction is as follows:

  • After the SIP user logs in, the SIP Gateway synchronously creates the room and manages the user status;
  • When the user joins the conference, it is a SIP user, SIPGateway will simulate this SIP user as an RTC user and then join NetEase conference This is the NERTC process. After establishing the uplink RTCSession with the media server, you can push the stream, The upstream stream of the SIP end will be copied to the media server while the MCU mixed screen mixing is performed on the SIP Gateway The advantage of this is that the media service does not need special processing for SIP users, thus reducing its intrusion;
  • When RTC users log in to the conference, SIPGateway simulates the RTC user as a SIP user and adds it to the room on the gateway At the same time, it establishes a downlink RTCSession with the media server to receive the stream, and the media stream of RTC users is forwarded to SIP users through the mixing screen;

In this process, you need to encode and decode audio and video:

  • In terms of audio, PSTN/SIP generally uses G.722 or G.729 codec, while the NERTC system uses OPUS codec, so the audio needs transcoding;
  • In terms of video, MCU function is introduced. The process will have a certain impact on the server performance, but the overall performance meets the expectations;

The final screen layout is shown in the following figure:

Scenario 2: The conference is initiated by the traditional video conference, and Netease conference APP is supported to join the hardware video conference

As shown in the figure, the specific process: when a conference is initiated by a traditional video conference, the conference management pre system will create a hardware video conference room, RTC users input the hardware video conference number, The system will synchronously create NetEase conference number and SIP conference short number, and notify SIPGateway, which plays the role of signaling protocol and SIP intelligent routing Let RTC users communicate with hardware video conference media.

Three conferences are involved in this process: hardware video conference room, NetEase conference room and gateway conference room. The specific media stream steering and data forwarding strategies are as follows:

  • When the hardware video conference creates a room, NetEase conference APP will call the hardware video conference number, and create NetEase conference room and SIP gateway SIP conference short number synchronously;
  • After creating the SIPGateway room, the screen will be fused in the gateway conference room;
  • When the user joins the conference, it is an RTC user. SIPGateway will simulate a SIP user to join the gateway conference room, and establish a downlink RTCSession with the media server to receive streams;
  • The overall strategy of SIPGateway is Through smart signaling conversion, all users in the hardware video conference are simulated as one RTC user to join NetEase conference All RTC users are simulated as a SIP user to join the hardware video conference;
  • SIPGateway pushes video images, According to the energy routing strategy, select the RTC user video stream with the largest sound energy , fuse the screen and send it to the hardware video;
  • SIP layout screen can be set by hardware video conference control, and can be changed by switching between SIP main venue and branch venue and polling, and can also selectively view the screen of one RTC or all RTC participants;
  • RTC joins the hardware video conference scene to achieve real conference interconnection. SIP end users can join both hardware video conference rooms and NetEase conference rooms;

The final screen layout is shown in the following figure:

The above is an introduction to the architecture of the SIPGateway and the specific technical implementation of the two application scenarios. Let's take a look at the main features of using the SIPGateway and the supported deployment methods.

4 Features and deployment of SIPGateway

Features of SIPGateway

The use of SIPGateway has the following characteristics:

Flexible incoming call mode

  • Support PSTN/SIP end to actively join NetEase conference. NetEase conference also assigns a corresponding SIP conference short number to the SIP terminal when it creates the conference. SIP users can join NetEase conference only by entering the SIP conference short number;
  • Support inviting PSTN/SIP end to join NetEase conference. During the conference, the host selects the SIP terminal in the address book and actively invites him to join the conference;
  • Support NetEase conference APP to join the third-party video conference system;

Full platform interoperability

  • Support interoperability with iOS, Android, PC and Mac platforms;
  • Support PSTN/SIP terminal access;

MCU cascade

Support the cascading of SIPGateway MCU and traditional hardware video conference MCU. This scenario can support the implementation of multiple canvases.

load balancing

The load balancing module will allocate the appropriate SIPGateway according to the actual load of the SIPGateway, provided that the same room is allocated to the same SIPGateway.

High security

  • IP whitelist: The proxy at SIPGateway is the proxy for the SIP terminal's membership request. To prevent malicious attacks, we set an IP whitelist for the tenant to prevent illegal IP access;
  • Regional isolation: different tenants can route to the same SIPGateway, or to a special SIPGateway or cluster;
  • For enterprises with high network security requirements, SIP proxy is deployed in the DMZ area of the enterprise, which serves as the protection of the enterprise intranet, prevents intranet attacks, and can also communicate with the external network;

High availability

  • SIPGateway adopts cluster deployment. When one SIPGateway goes down, audio and video calls of other users will not be affected;
  • SIP Proxies are kept alive through Keep Alive. When one proxy goes down, it will switch to the other proxy, which will not affect the SIP call status and user audio and video calls, so as to achieve sensorless switching;

Data monitoring platform

  • Data platform: collect the online status, traffic status and network status of SIP users on a regular basis, which helps to locate and analyze problems in time in case of abnormalities;
  • Monitoring platform: automatically monitors the status of the server. When the service is abnormal, an alarm will pop up automatically, which helps to solve the service exception problem in a timely manner;

Deployment mode of SIPGateway

In terms of deployment mode, SIPGateway adopts container deployment, Multiple ways to support public cloud or privatization , flexible and changeable, to meet the needs of different scenarios.

5 Conclusion

This paper mainly shares the scenario of using SIPGateway to achieve converged communication in Netease conferences, and analyzes the specific media stream steering and data forwarding strategies through two example application scenarios.

With more and more application scenarios, such as the enterprise's internal APP mobile workbench, system integrated telephone call function, intelligent hardware, such as intelligent access control, intelligent robots, etc., higher requirements will be put forward for the interoperability of all terminals. Netease Yunxin's exploration on this track will continue to meet the needs of users in different scenarios, It really helps users grow inside.

Relevant terms mentioned in the text and derivative reading:

  • MCU : One of the multi person communication architectures, multi-point control unit, which is characterized by the server mixing all the upstream media streams into one path and transferring them to the receiver;
  • SFU : As one of the multi person communication architectures, the forwarding unit is selected, which is characterized by that the server forwards each subscribed upstream stream to the receiving end, instead of mixing streams;
  • SIP protocol : short for Session Initiation Protocol, the protocol standard in the communication field;
  • RTCSession: RTC streaming component of NetEase Yunxin;
  • Simulcast : Multi stream transmission, that is, the same terminal is allowed to send video streams of different resolution gears at the same time;
  • NERTC: audio and video communication scheme developed by Netease Yunxin;
  • SIPGateway: SIP protocol access gateway;

To learn more, welcome click

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top