Source code

Home Page Ah File Ah Source code Ah IOS the realization of audio and video communication based on WebRTC-ios learning from entry to mastery.

IOS the realization of audio and video communication based on WebRTC-ios learning from entry to mastery.

Sharing the hottest information

 QQ screenshot 20170306095551.png


The name of WebRTC comes from the abbreviation of Web Real-Time Communication. In short, it is a technology that supports web browsers to carry out real-time voice dialogue or video dialogue.

It provides us with the core technology of video conferencing, including audio and video acquisition, codec, network transmission, display and other functions, and also supports cross platform: windows, Linux, MAC, Android, iOS.

It opened the source code of the project in May 2011, and has been widely supported and applied in the industry, becoming the standard of the next generation of video calls.

This article will stand on the shoulders of giants, and achieve voice and video calls between different clients based on WebRTC. This different client is not limited to mobile terminals and mobile terminals, but also between mobile terminals and Web browsers.



The realization principle of a.WebRTC.

Setting up WebRTC environment under two.IOS

Three. Introduce the API of next WebRTC, and realize the process of point to point connection.

Four the detailed implementation of.IOS client and the construction of server side signaling channel.


The realization principle of a.WebRTC.

WebRTC's audio and video communication is based on P2P, so what is P2P?

It is the abbreviation of point to point connection.

1. we start from the P2P connection mode:

In general, our traditional connection mode is server mediated mode.

  • Similar to the HTTP protocol: the client side server (of course the arrows returned by the server here represent only the return request data).

  • When we are doing instant messaging, we transmit text, picture, record and so on: the client A server and client B.

The point to point connection is that once the data channel is formed, there is no server in the middle, and the data flows directly from one client to another.

Client A? Client B... Client A? Client C... (can be interconnected between numerous clients)

Here we can think about the application scenario of audio and video calls. We really need no data to communicate between the two servers. Moreover, one of the biggest advantages of this is that it greatly reduces the pressure on the server side.

WebRTC is such an audio and video communication technology based on P2P.

2.WebRTC server and signaling.

Speaking of which, may we feel that WebRTC does not need a server? This is obviously a wrong understanding. Strictly speaking, it simply does not need the server to transfer data.

WebRTC provides browser to browser (point to point) communication, but does not mean that WebRTC does not need a server. For example, there are at least two things that WebRTC must use when it comes to server extensions.

  • The exchange of metadata (signalling) for communication between browsers must pass through the server.

  • In order to cross NAT and firewall.

First, it is well understood that when A and B need to establish P2P connections, at least the server should coordinate to control the connection. When the connection is disconnected, a server is required to inform the other side that the P2P connection has been disconnected. The data we use to control the state of the connection is called signaling, and the channel connected to the server is the signaling channel for WebRTC.

 QQ screenshot 20170306095814.png

The signalling in the picture is sending signals to the server, then calling WebRTC at the bottom, and WebRTC, through the signaling received by the server, knows the basic information of the other party, so as to realize the Media communication connection of the dashed line.

Of course, there are many other things that signaling can do.

  • A link control message used to control communication on or off.

  • Messages used to inform each other when errors occur.

  • Media streaming metadata, such as decoder, decoder configuration, bandwidth, media type, and so on.

  • Key data for establishing secure connections

  • The data on the network, such as IP address, port and so on, can be seen outside.

Before establishing a connection, there is obviously no way to transfer data between clients. So we need to transfer the data between the clients through the transit of the server, and then establish the peer to peer connection between the clients. But WebRTC API did not achieve these, which we need to achieve.

The concept of NAT in the second clause is before us. IOS instant messaging, from entry to "give up"? I mentioned in the same time, but at that time we were dealing with the TCP connection interruption to deal with the NAT timeout. We will not talk about it here. NAT Encyclopedia

Here I briefly explain that the emergence of NAT technology is actually to solve the lack of IP addresses under IPV4. For example, usually we are under a router, and the router assigned to us is usually or If there are N devices, it may be allocated to 192.168.0.n, and this IP address is obviously only a IP address in the internal network. Such a path corresponds to the address of N inner network. The way of using a small number of public IP addresses to represent more private IP addresses will help slow down the depletion of the available IP address space.

But this also brings a series of problems, such as the point to point connection, which can lead to such a problem.

If the client A wants to send data to the client B, the data will be blocked by the NAT when the data comes to the router under the client B, so that B can not receive the A data.

But A's NAT already knows the address of B, so when B sends data to A, NAT will not block it, so A can receive B data. This is the core idea of our NAT crossing.

So we have the following train of thought:

With the help of a public network IP server, a and B, the public network IP/PORT can be sent to the public network server. The public network server can get IP/PORT of a and B, and because a and B actively send packets to the public network IP server, so the public network server can penetrate NAT NAT and send packets to the user.

So as long as the public network IP sends the IP/PORT of B to a, the IP/PORT of a is sent to B. So a and B will not be blocked by NAT next time.

WebRTC's NAT/ firewall traversal technology is based on the above idea.

A common problem in establishing point to point channels is the NAT traversal technology. The NAT traversal technology is required when establishing connections between hosts in the private TCP/IP network where NAT devices are used. In the past, this problem is often encountered in the field of VoIP. At present, there are many NAT traversing technologies, but none of them is perfect because NAT's behavior is not standardized. Most of these technologies use a public server, which uses an IP address that can be accessed from anywhere in the world. In RTCPeeConnection, the ICE framework is used to ensure that RTCPeerConnection can achieve NAT traversing.

 QQ screenshot 20170306095915.png

Here is the ICE protocol framework, which is composed of the following technologies and protocols: STUN, NAT, TURN and SDP, which help ICE achieve common implementation.

The NAT/ firewall traverses.

The little friends may be forced to face it again, and suddenly come out with so many nouns. It doesn't matter. We will not control them here. When we realize them, we will mention them. If you are interested, you can read this article: WebRTC protocols.

Two WebRTC environment building under.IOS:

First of all, we need to understand that WebRTC is already in our browser. If we use the browser, we can use js to invoke the API of the corresponding WebRTC directly and realize the audio and video communication.

However, we are on the iOS platform, so we need to go to the official website to download the specified version of the source code and compile it. Roughly, the size of the source code is more than 10 G, the compilation process will encounter a series of pits, and we compile and finish the.A Library of the final form of webrtc, which is about more than 300 m.

We do not write the compilation process here. We can read this article if you are interested.

WebRTC (iOS) Download compilation

Finally, we compiled the following documents as follows: WebRTC:

 QQ screenshot 20170306095949.png

It includes a.A file and some header files under the include folder. When you test, you can use the compiled files directly, but if you need the latest version of WebRTC in the future, you can only compile it by yourself.

Then we add the entire WebRTC folder to the project and add the following system dependency Library:

 QQ screenshot 20170306095958.png

So far, a WebRTC environment under iOS has been built.

Three. Introduce the API of next WebRTC, and realize the process of point to point connection.

1.WebRTC has basically realized three API, namely:

  • MediaStream: through the MediaStream API, you can get the video and audio synchronous flow through the camera and microphone of the device.

  • RTCPeerConnection:RTCPeerConnection is WebRTC's component for building stable and efficient streaming between point to point components.

  • RTCDataChannel:RTCDataChannel enables a high throughput, low latency channel between browsers to transmit arbitrary data.

RTCPeerConnection is the core component of our WebRTC.

2.WebRTC establishes the process of point to point connection.

Before we use WebRTC to realize the audio and video communication, we must understand its connection process, otherwise it will not be able to start with its API.

We mentioned before that WebRTC uses ICE protocol to ensure NAT traversing, so it has such a process: we need to get a ice candidate from STUN Server, which is actually a public network address, so that we have the client's own public address. What the STUN Server does is to save the public network address and send packets to each other to prevent subsequent NAT blocking.

As we mentioned before, we also need a server to establish a signaling channel to control A and B when to establish connections. When we build connections, we should inform each other what ice candidate (public address) is and what SDP is. It also includes a series of signaling such as when to disconnect and so on.

By the way, here is the concept of SDP, which is a session description protocol. Session Description Protocol (SDP) It is a protocol that describes the contents of multimedia connections, such as resolution, format, encoding, encryption algorithm, etc. So both ends of the data can understand each other's data. In essence, the metadata describing the content is not the media stream itself.

Let's talk about the process of establishing P2P connection:

1.A and B connect to the server side, establish a TCP long connection (any protocol can be, WebSocket/MQTT/Socket original /XMPP), we are here to save trouble, directly use WebSocket, such a signaling channel will have.

2.A obtains ice candidate from ice server (STUN Server) and sends it to Socket service side, and generates session containing session description (SDP), which is sent to the server side.

The 3.Socket server forwarding A's offer and ice candidate to B, B will save A information.

4. then B sends answer containing its own session description (because it receives offer, so it returns answer, but the content is SDP) and ice candidate to Socket server.

The 5.Socket server gives A and B answer and ice candidate to A, and A stores the information of B.

At this point, A and B have established a P2P connection.

It is very important to understand the whole process of P2P connection, otherwise the code implementation part will be hard to understand.

Four the detailed implementation of.IOS client and the construction of server side signaling channel.

Signaling in chat rooms

The above is a signaling exchange process between two users, but we need to build a chat room for multi-user online video chat. Therefore, some extensions are needed to achieve this requirement.

User operation

First, we need to identify a user's general procedure in the chat room.

1. open the page to connect to the server.

2. enter the chat room

3. establish peer to peer connections with all other users in chat rooms, and output them on the page.

4. if other users in the chat room leave, they should be notified to close their connection and remove their output in the page.

5. if there are other users joining, it should be notified that the connection will be established on the new user and output on the page.

6. leave the page and close all connections.

It can be seen from the above that in addition to the establishment of point to point connections, servers are required to do at least the following things:

1. when new users enter the room, send new users' information to other users in the room.

2. when new users join the room, send other user information in the room to new users.

3. when the user leaves the room, he sends the information that leaves the user to other users in the room.

Implementation ideas

Taking WebSocket as an example, the above user operation process can be modified as follows:

1. client and server establish WebSocket connection

2. send a signalling into the chat room (join), which requires the name of the chat room that the user enters.

3. the server sends one other user signal (peers) according to the room that the user is joining. The signaling contains other users' information in the chat room, and the client builds one point to point connection with other users based on information.

4. if a user leaves, the server sends a user leaving the signaling (remove_peer), which contains the information of the departing user, and the client closes and leaves the user's information according to the information, and makes corresponding clearance operations.

5. if a new user is added, the server sends a user to join the signaling (new_peer), which contains the information of the newly added users, and the client establishes the point to point connection with the new user according to the information.

6. users leave the page to close the WebSocket connection.

In this way, we have a basic idea, and we will implement a video chat room based on WebRTC.

We first implement the client implementation, first look at WebRTCHelper.h:

Here, our external interface is very simple. It is a way to generate singletons, an agent, and a way to connect to the server. This method needs to pass 3 parameters in the past: server address, port number, and room number. There's another way out of the room.

Let's talk about the agent. There are 3 alternative ways for agents.

1. callbacks of local settings can be used to display local video images.

2. callbacks from remote streams can be used to display each other's video images.

3.WebRTC connection callback callback, pay attention to this close only with the current userId connection closes, and if you in addition to chat rooms with other people to establish a connection, will not have an impact.

Then we do not see how to achieve it first.

You just need to set up an agent for yourself, then connect to the socket server.

Let's take a look at our handling of agents:

The code is simple. The core is the lines that call API of WebRTC:

When we get local and remote streams, we can use this stream to set up video images, and audio is automatically output (remote audio output, local local audio will not).

Basically, video images only need the following 3 steps:

1. create an instance of RTCEAGLVideoView type.

2. get RTCMediaStream type stream from proxy callback, get RTCVideoTrack instance from stream:

3. use this _localVideoTrack to set render for RTCEAGLVideoView instance.

Such a video image is displayed on the RTCEAGLVideoView instance. We only need to add it to the view to display it.

Remember that we need to pay attention to the RTCVideoTrack instance. We must hold it. Otherwise, video images may not be displayed.

In this way, a simple WebRTC client is built. Next we ignore the Socket server (first as implemented) and the implementation of WebRTCHelper. We run demo to see the effect.

 QQ screenshot 20170306100623.png


This is the map I used to intercept with my cell phone, because the simulator can not call the MAC camera, the first?? it's a local video image, and the back?? it is transmitted by the remote user. If there are n remote users, it will be arranged down all the time.

After we have finished, we can run the demo on GitHub and try this video chat room.

Next, let's talk about the implementation of WebRTCHelper:

First, in front of the application of this class, we first call the singleton and set up the agent:

It's very simple. It initializes the instance and initializes two attributes, which is _connectionDic used to load RTCPeerConnection instances. _connectionIdArray is used to install the connected user ID.

Then we called connectServer:.

This method is connected to our socket server. Here we use webScoekt, the frame used is Google's SocketRocket, and I will not go into details about its usage. IOS instant messaging, from entry to "give up"?

Here we set up agents for ourselves and establish connections, and then connect to the agent after successful connection.

After successful connection, we called the method of joining the room and added the room number we set up at the beginning:

When we join the room, we just send this JSON data to the server with socket, __join.

Next is the logic of the server side. The server gets this type of data and sends us such a message:

This message type is _peers, meaning the new user of the room, and returns our ID in this room to us, and gets this message, which means that if we join the room successfully, we can do a series of initialization. The connections field is empty, indicating that there is no one in the room at present. If someone is there, it will return to such a string.

Among them, connections contains ID which is already in the user's room.

Next, the core proxy method of our entire class is the processing after receiving the socket message:

Here, we deal with 6 events. These 6 events are the signal events we talked about for a long time, but this is only part of it.

Let's talk briefly about the handling of 6 signaling events here:

Note: here is the order of the 6 events. I hope you can run the demo interruption point by yourself. As a result of various events, the sequence of incoming messages will be more combined, and the presentation will be very chaotic, so here we only follow the order of the code.

1. receive _peers:

To prove that when we join the new room, we need to initialize some local things, including adding ID to the _connectionIdArray. Initializes the factory of point to point connection objects:

Create local video stream:

Here we use the system AVCaptureDevice, AVAuthorizationStatus, and RTC's RTCVideoCapturer, RTCVideoSource, RTCVideoTrack and a series of classes to complete the _localStream local flow initialization. As for the specific usage, let's take a look at the code, or it is relatively simple, I will not talk about it.

We then created the point to point connection core object:

It is probably using these two methods to create RTCPeerConnection instances and set up RTCPeerConnectionDelegate agents for themselves. Finally, save it in our _connectionDic, and the corresponding key is ID for each other.

Then we add flows to all RTCPeerConnection instances:

Finally, because we are the new users, we have created offer:

We go through the connection dictionary to create a offer for each connection, and the role is set to the initiator RoleCaller.

CreateOfferWithDelegate is an instance method of RTCPeerConnection, creating a offer, and setting the proxy agent as its RTCSessionDescriptionDelegate agent for herself.

Seeing this, we found that besides SRWebSocket agents, there were two more agents. One was to create RTCPeerConnectionDelegate for point to point connections, the other was to create RTCSessionDescriptionDelegate for offer.

I believe you will find it a bit messy. We have not finished the agent who received the socket message. We have so many agents at once.

Let's look at all the surrogate methods first.

 QQ screenshot 20170306101436.png

A total of so many pictures belong to socket, point to point connection objects, and SDP (offer or answer).

I believe that the former two need agents. You can see why, because the callback is used in the network, the agent is used. Why does SDP use the agent? With doubt, let's take a look at the two proxy methods of RTCSessionDescriptionDelegate:

The above is the first proxy method. When we create a SDP, we will be called, because we can only create the SDP of the machine. We call the createOfferWithDelegate method before, and we will trigger this agent if we create it successfully. In this agent, we set the SDP for this connection.

However, calling setLocalDescriptionWithDelegate to set up local SDP will trigger its second proxy method (echoing with a setRemoteDescriptionWithDelegate setting remote SDP):

This method, whether local or remote, will be called after SDP is set up. Here we decide whether to generate offer or answer type data to wrap SDP according to the difference of _role. Finally, the data is sent to the server by _socket, and the server is forwarding to the users of the designated socketId.

Note: this socketId was acquired in connections after we entered the room, or we had received it from other people's offer in the room.

Thus, the process of SDP generation, binding and sending is over.

Then we return to the didReceiveMessage method of SRWebSocketDelegate.

2. we will talk about second signaling events: _ice_candidate

This event, we have said in principle, in fact, its data is a public network IP on the other side of the client, but this public network IP is issued by STU Server, for the sake of NAT/ firewall traversing.

We have received this kind of event and need to save the IP of the peer to the point to point connection object.

Let's take a look at the code:

Here we create a RTCICECandidate instance candidate, which identifies the remote address. And add it to the peerConnection corresponding to ID.

Here we only see the remote _ice_candidate, but we know that this address is also issued by our client. Where is the sending?

Let's take a look at RTCPeerConnectionDelegate.

When we create peerConnection, we will go to the ICEServers array that we added at the beginning of initialization, go to ICE Server address to request, get ICECandidate will call this proxy method, we use socket to send our network address to the opposite end.

Speaking of this ICEServers, we would like to mention here that we need a STUN server. Here we use Google: