Implementation Practice of AI Driven Super Resolution Technology

original
2021/01/06 11:35
Reading 251

In recent years, with the rapid development of deep learning technology, AI based super-resolution technology has shown broad application prospects in the field of image restoration and image enhancement, and has attracted the attention and attention of academia and industry. However, in the field of RTC video, many AI algorithms cannot meet the application requirements in actual scenarios. This paper will focus on the landing of AI technology from research to deployment, and share the opportunities and challenges faced by the landing application of super-resolution technology in the RTC field.

1、 Overview of super-resolution technology

1. The proposal of super-resolution technology

 The proposal of super-resolution technology

The concept of super-resolution was first proposed by Harris and Goodman in the 1960s. It refers to the technology of generating high-resolution images from low resolution images through some algorithm or model, and recovering more detailed information as much as possible, also known as spectral extrapolation. However, at the beginning of the study, the spectrum extrapolation method was only used for simulation under some assumptions, and was not widely recognized; Until the super-resolution method of single image was proposed, the super-resolution technology began to be widely studied and applied. At present, it has become an important research direction in the field of image enhancement and even computer vision.

2. Classification of super-resolution technology

 Classification of super-resolution technology

The super-resolution methods of a single image can be divided into interpolation based, reconstruction based and learning based methods according to different principles. Due to the simple algorithm principle and limited application scenarios, the super-resolution effect of the first two methods in actual scenes is not ideal; Learning based method is the best super-resolution method in practice. Its core includes two parts: the establishment of algorithm model and the selection of training set. According to the algorithm model and training set, learning based methods can be divided into traditional learning methods and deep learning methods. Generally speaking, the algorithm model of traditional learning methods is relatively simple and the training set is relatively small. The deep learning method generally refers to the convolutional neural network method trained with a large amount of data, which is also a hot research topic in the academic circle at present. So next I will focus on the development process of super-resolution methods based on deep learning.

3. DL-based SR

 DL-based SR

SRCNN is the first attempt of deep learning method in super-resolution problem. It is a relatively simple convolution network, which is composed of three convolution layers, each of which is responsible for different functions. The first convolution layer is mainly responsible for extracting high-frequency features, the second convolution layer is responsible for completing the nonlinear mapping from low definition features to high-definition features, and the last convolution layer is responsible for reconstructing high-resolution images. The network structure of SRCNN is relatively simple, and the super-resolution effect needs to be improved, but it establishes the basic idea of deep learning method in dealing with super-resolution problems. Later in-depth learning methods basically follow this idea to carry out super-resolution reconstruction.

Later ESPCN, FSRCNN and other networks have made some improvements based on SRCNN. The number of network layers is still relatively shallow, the number of convolution layers does not exceed 10, and the superresolution effect is not particularly ideal. At that time, the training of deep convolutional networks was problematic. Generally, for convolutional neural networks, when the number of network layers increases, the performance will also increase. But in practical applications, people find that when the number of network layers increases to a certain extent, due to the principle of back propagation, the gradient will disappear, resulting in poor convergence of the network and reduced model performance. This problem was not solved until ResNet proposed the residual network structure.

VDSR is the first application of residual networks and residual learning ideas on super-resolution problems. It increases the number of layers of super-resolution networks to 20 for the first time. The advantage is that the residual characteristics can be directly learned by using residual learning. The network convergence will be faster and the super-resolution effect will be better. Later, some convolutional neural networks proposed more complex structures. For example, SRGAN proposed to use the generative countermeasure network to generate high-resolution images. SRGAN consists of two parts, one is the generation network, the other is the discrimination network. The function of the generating network is to generate a high-resolution image based on a low resolution image, while the function of the discriminant network is to determine the high resolution image generated by the generating network as false, so that the network will continue to play a game between the generating network and the discriminant network during training, and finally reach a balance, so as to generate a high-resolution image with realistic details and textures, It has better subjective visual effect. Other deep convolution network methods, such as SRDenseNet, EDSR, and RDN, use more complex network structures. The convolution layer of the network is getting deeper and deeper, and the super-resolution effect on a single image is getting better and better.

 Overall development trend

The general trend of the development of super-resolution technology can be summarized as follows: from traditional methods to deep learning methods, from simple convolutional network methods to deep residual network methods. In this process, the structure of the super-resolution model is becoming more and more complex, the network level is getting deeper and deeper, and the super-resolution effect of a single image is getting better and better, but there will also be some problems.

2、 Requirements of real-time video tasks and challenges of SR

 Requirements for video processing tasks

In the field of RTC, most of the video processing tasks are live broadcast, conference and other instant communication scenarios, which require high real-time performance of the algorithm, so real-time performance of the video processing algorithm is a priority. Secondly, the practicability of the algorithm. When users use live broadcast or conference, the video quality captured by the camera is sometimes low, which may contain a lot of noise; In addition, the video will be compressed first when encoding and transmitting, and the compression process will also lead to the degradation of image quality. Therefore, the actual application scenarios of RTC are relatively complex, and many video processing methods, such as super-resolution algorithm, are ideal in the research. Finally, how to improve the experience of users, especially mobile users, reduce the computational resource occupation of algorithms, and apply to more terminals and devices is also a must for video tasks.

For these requirements, the current super-resolution methods, especially those based on deep learning, have many problems. At present, most of the academic research on super-resolution is still limited to the theoretical stage. If image super division, especially video super division, is to be implemented on a large scale, some practical problems must be solved. The first is the problem of network models. In order to pursue better super-resolution effect, many current deep learning methods use large models with more and more parameters, which will consume a lot of computing resources and cannot be processed in real time in many actual scenarios. The second is the generalization ability of the deep learning model. For all kinds of deep learning models, there will be the problem of training set adaptation. The training sets used in training are different, and their performance in different scenarios is also different. The models trained with open data sets may not have the same good performance in actual application scenarios. Finally, there is the problem of the effect of supersampling in real scenes. At present, most of the academic supersampling methods are about ideal scenes, completing the reconstruction from down sampled images to high-resolution images. However, in real scenes, image degradation not only includes down sampled factors, but also many other factors, such as image compression, noise, blur, etc.

To sum up, the current AI based super-resolution method, in the RTC video task, faces the main challenge that can be summarized as how to achieve video quality enhancement with good real effects by virtue of a relatively small network, that is, how to "make the horse run faster and let the horse eat less grass".

3、 The development trend of video super-resolution technology

First, the deep learning method will still be the mainstream of super-resolution algorithm.

Because the effect of traditional methods on super-resolution tasks is not ideal, and the details are relatively poor. Deep learning method provides a new idea for super-resolution. In recent years, the super-resolution method based on convolutional neural network has gradually become the mainstream method, and the effect is also improving.

 Deep learning method

As can be seen from the above figure, in recent years, the number of papers on AI based super-resolution methods has been one-sided compared with traditional methods, and this situation will be further expanded in the next few years. Although there are some problems, with the emergence of some lightweight networks, deep learning methods may have greater breakthroughs in landing applications in the future, and these problems will also be solved. Deep learning methods will still be the mainstream research direction of superresolution.

Secondly, some lightweight networks with smaller parameters will play a greater role in promoting the implementation of the super division algorithm.

Because various deep convolutional network methods at present, such as deep residual networks such as EDSR and RDN, are difficult to meet the needs of real-time video transmission, some smaller lightweight networks will have better results for real-time tasks.

Third, future super-resolution methods will focus more on real scene tasks.

Most of the SR methods in the academic field focus on the undersampling problem, which does not perform well in the real scene. In the real scene, there are various image degradation factors. Some more targeted methods, such as the super-resolution tasks including compression loss, coding loss and various noises, may be more practical.

 Academic super model training mode

4、 NetEase Yunxin AI super division algorithm

 NetEase Yunxin AI super division algorithm

In the RTC field, because the video file is too large, we need to encode it, and then transmit it to the receiver for decoding and playback. Because the essence of coding is video compression, when the network is poor, the coding quantization parameters will be large, which will cause serious compression, resulting in blocking effect and other distortions in the output image, resulting in blurred image quality. In this case, if the decoded video is directly super divided, the compression loss will also be amplified, and the super divided effect is often not ideal. In response to these problems, NetEase Yunxin proposed a video super-resolution method based on code loss recovery, adopted the strategy of data driving and network design, simulated the real distortion scene through data processing, and optimized layer by layer from model design to engineering implementation, which has made some breakthroughs in the two major problems restricting AI super-resolution technology, Good results have been achieved in terms of model real-time performance and real scene super segmentation effect.

 Algorithm strategy

The above is some practical experience of NetEase Yunxin in promoting the implementation and application of AI driven hyper division technology, hoping to enlighten and reference everyone.

More technical content, please follow WeChat official account [NetEase Smart Enterprise Technology+]

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top