Collection
zero Useful+1
zero

H.264

High compression digital video codec standard
H.264, It's also MPEG-4 The tenth part is made by ITU-T video coding Expert Group (VCEG) and ISO/IEC Motion Picture Expert Group( MPEG )Jointly formed joint video group( JVT , Joint Video Team) video codec Standards. This standard is commonly referred to as H.264/ AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC).
H. The main parts of the 264 standard include Access Unit delimiter, SEI, and primary coded picture image encoding ), Redundant Coded Picture. And Instant Decoding Refresh( IDR , real-time decoding and refreshing), Hypothetical Reference Decoder (HRD), Hypothetical Stream Scheduler (HSS, Hypothetical Bitstream Scheduler).
Foreign name
H.264
Properties
Digital video compression format
Main parts
Access unit separator
Type
Technical terminology

Background

Announce
edit
H. 264 Yes International Organization for Standardization (ISO) and international telecommunications union (ITU) jointly proposed MPEG4 Next generation digital video Compressed format H. 264 is named after H.26x series by ITU-T Video codec technology One of the standards. H. 264 is ITU-T's VCEG (Video Coding Expert Group) and ISO/IEC's MPEG (Activity image encoding Expert group)( JVT : joint video team) Video coding standard This standard originated from the development of the ITU-T project called H.26L. H. Although the name 26L is not very common, it has always been used. H. 264 is one of ITU-T standards named after H.26x series, AVC It is the name of ISO/IEC MPEG.
There are two international organizations that develop video coding and decoding technology, one is“ ITU (ITU-T) H.261 , H.263, H.263+, etc. The other is the International Organization for Standardization (ISO), whose standards include MPEG-1 MPEG-2 MPEG-4 Etc. And H.264 is a new number jointly developed by the Joint Video Team (JVT), which is jointly established by the two organizations Video coding standard , so it is both ITU-T H.264 and ISO/IEC MPEG-4 advanced video coding (Advanced Video Coding, AVC). Therefore, whether MPEG-4 AVC MPEG-4 Part 10 , ISO/IEC 14496-10, both refer to H.264.
In January 1998, the collection of draft standards began. In September 1999, the first draft was completed. In May 2001, the draft standards were formulated Test mode TML-8 adopted the FCD board of H.264 at the 5th JVT meeting in June 2002. It was officially released in March 2003. In 2005, a higher level of H.264 was developed Applied standards MVC and SVC versions.
ITU and MPEG organizations of ITU released the H.264 standard, and soon released an announcement, which is the next generation video codec standard H.265 Solicitation Technical proposal Technologies set for H.265 performance index Yes: The compression efficiency is twice as high as that of H.264, and the computational load of encoding and decoding is not significantly increased. According to the review of the 2009 Xi'an Meeting organized by MPEG, no technical proposal has reached the above indicators.
H. 264 is built on the basis of MPEG-4 technology. Its encoding and decoding process mainly includes five parts: inter frame and Intra prediction (Estimation), Transform and inverse transform, quantization( Quantization )And inverse quantization, loop wave filtering (Loop Filter)、 Entropy coding (Entropy Coding)。
H. The main goal of 264 standard is to provide better image quality in the same bandwidth compared with other existing video coding standards. Through this standard, the compression efficiency under the same image quality is about 2 times higher than the previous standard (MPEG2).
H. 264 can provide 11 levels and 7 categories of sub protocol formats (algorithms). The level definition is to limit the external environment, such as bandwidth requirements, memory requirements, network performance, and so on. The higher the level, the higher the bandwidth requirements, Video quality The higher. Category definition is defined for specific applications encoder The feature subset used, and the encoder complexity in different application environments is standardized.

advantage

Announce
edit
1. Low Bit Rate: MPEG2 and MPEG4 Compared with ASP and other compression technologies, under the same image quality, the amount of data compressed by H.264 technology is only 1/8 of MPEG2 and 1/3 of MPEG4. [1]
2. High quality images: H.264 can provide continuous and smooth high-quality images (DVD quality). [1]
3. Fault tolerance Strong: H.264 provides a solution to the problem of instability network environment It is a necessary tool for easy packet loss and other errors. [1]
4. Strong network adaptability: H.264 provides Network abstraction layer (Network Abstraction Layer), so that H.264 files can be easily transmitted on different networks (such as the Internet, CDMA, GPRS, WCDMA CDMA2000 Etc.). [1]
H. The biggest advantage of 264 is its high data compression Under the condition of the same image quality, the compression ratio of H.264 is more than 2 times that of MPEG-2 and 1.5~2 times that of MPEG-4. For example, if the size of the original file is 88GB, MPEG-2 is used Compression standard After compression, it becomes 3.5GB, with a compression ratio of 25:1, while after compression using the H.264 compression standard, it becomes 879MB, from 88GB to 879MB, with a compression ratio of 102:1. Low Bit Rate plays an important role in the high compression ratio of H.264. Compared with MPEG-2, MPEG-4 ASP and other compression technologies, H.264 compression technology will greatly save users' download time and data traffic charges. In particular, H.264 has high compression ratio and high-quality and smooth images. That's why the video data compressed by H.264 network transmission The process requires less bandwidth and is more economical. [1]

characteristic

Announce
edit
The main features of H264 standard are as follows:
1. Higher coding efficiency: Compared with the rate efficiency of standards such as H.263, it can save more than 50% on average Code rate
2. High quality video picture: H.264 can provide high quality video image at low bit rate and high quality video image at low bandwidth image transmission It is the application highlight of H.264.
3. Improve the network adaptability : H.264 can work in the low latency mode of real-time communication applications (such as video conferencing), or it can work in the video storage or Video stream Server.
4. Adoption Hybrid coding Structure: Same as H.263, H.264 also uses DCT Transform coding With DPCM Differential coding The hybrid coding structure of motion estimation , intra prediction, multi frame prediction, content-based Variable length coding , 4x4 2D integer transformation, etc Coding method , which improves the coding efficiency.
5. There are few coding options for H.264: when coding in H.263, it is often necessary to set quite a number of options, which increases the difficulty of coding. However, H.264 achieves the goal of "regression basics", which reduces the coding time Complexity
6. H.264 can be applied in different occasions: H.264 can use different transmission and playback rates according to different environments, and provides rich error handling Tools that can control or eliminate packet loss and Error code
7. Error recovery function: H.264 provides solutions network transmission A tool for packet loss, which is applicable to high error rate Transmitted wireless network Medium transmission Video data
8. High complexity: The improvement of 264 performance is achieved at the cost of increasing complexity. It is estimated that the computational complexity of H.264 coding is about three times that of H.263, and the decoding complexity is about two times that of H.263.

technology

Announce
edit
H. 264 is the same as the previous standard, it is also DPCM plus Transform coding Blending of Coding mode However, it adopts the simple design of "returning to the basic", without many options, it can obtain more than H.263++ Much better compression performance; Enhanced adaptability , using the "network friendly" structure and syntax is conducive to Error code And packet loss processing; The application target range is wide to meet different rates and Resolution And the requirements of different transmission (storage) occasions.
Technically, it combines the advantages of previous standards and absorbs the experience accumulated in standard formulation. And H.263 v2 (H.263+) or MPEG-4 Compared with the simple profile, H.264 coding method Similar Best encoder In most cases Code rate Up to 50% bit rate can be saved. H. 264 can continuously provide high Video quality H. 264 can work in a low latency mode to adapt to real-time communication applications (such as video conferencing), and can also work well in applications without delay restrictions, such as video storage and server based Video stream Type application. H. 264 Supply package Transmission network The tools needed to handle packet loss in the Wireless network A tool for handling bit errors in.
On the system level, H.264 has proposed a new concept, VCL )Conceptual segmentation is carried out between the network abstraction layer (NAL) and the network abstraction layer (NAL). The former is the expression of the core compressed content of video content, and the latter is the expression of delivery through a specific type of network. Such a structure is convenient for information encapsulation and better priority control of information.

code

H. 264/AVC mainly supports continuous encoding and decoding or interlaced video encoding and decoding in 4:2:0 format. 4:2:2 and 4:4:4 can be used as reference information Frames H. 264/AVC adopts the classic motion compensation Hybrid coding Frame, there are three kinds of coded images: I frame P frame And B frames. Different from the previous coding standard, H.264 has newly defined SP frame and SI frame. These two newly defined frames can be used in different image quality Switch between different transmission rates. When information is lost, this method can be used to recover quickly. [3]
1. Intra frame Predictive coding
Intra coding Used to reduce the image Spatial redundancy In order to improve the efficiency of H.264 intra coding, make full use of adjacent frames in a given frame Macroblock Space for relevance , adjacent macroblocks usually contain similar attributes. Therefore, when coding a given macroblock, it is possible to first predict according to the surrounding macroblocks (typically according to the upper left corner macroblock, the left macroblock and the upper macroblock, because this macroblock has been Encoding processing ), and then predicted value Encode the difference with the actual value, so that the bit rate can be greatly reduced compared with the direct encoding of the frame.
H. 264 provides 9 modes for 4 × 4 pixel macroblock prediction, including 1 DC prediction and 8 Direction prediction Nine pixels from A to I of adjacent blocks have been coded and can be used for prediction. If we select mode 4, then a, b, c, d 4 pixels are predicted to be equal to E, and e, f, g, h 4 pixels are predicted to be equal to F, which is very small for the image spatial information H.264 also supports 16 × 16 intra coding.
H. 264 Select intra frame Prediction mode The difference between the predicted value and the actual value can be transformed, quantified and Entropy coding At the same time, the coded Bitstream After inverse quantization and inverse transformation prediction residual The reconstructed frame is obtained by adding the image and the predicted value loop filter It is sent to frame memory after smoothing. [3]
Inter prediction coding uses Time redundancy To proceed motion estimation And compensation. H. 264 motion compensation supports previous Video coding standard Most of the key features in, and more functions are flexibly added, in addition to supporting P frames B frame In addition, H.264 also supports a new inter stream Transmit frame ——SP frame. After the code stream contains SP frames, it can quickly switch between code streams with similar content but different code rates, and supports random access And fast playback mode. H. 264 motion estimation has the following four characteristics.
(1) Macro block segmentation with different sizes and shapes
The motion compensation for each 16 × 16 pixel macroblock can adopt different sizes and shapes. H.264 supports seven modes, as shown in Figure 4. The motion compensation in block mode improves the performance of motion details processing, reduces the blocking effect, and improves the image quality. Fig. 4 Macro block segmentation method
(2) High precision sub-pixel motion compensation
Half pixel precision motion estimation is used in H.263, while 1/4 or 1/8 pixel precision motion estimation can be used in H.264. When the same precision is required, H.264 uses 1/4 or 1/8 pixel precision motion estimation Post residual It is smaller than the residuals of half pixel motion estimation in H.263. So under the same precision, H.264 Interframe coding The required bit rate in is smaller.
(3) Multi frame prediction
H. 264 provides optional multi frame prediction function. During inter frame coding, 5 different Reference frame , provides better error correction performance, which can improve video image quality. This feature is mainly used in the following situations: periodic movement Translational motion Change the camera lens back and forth between two different scenes.
(4) Deblock wave filter
H. 264 defines adaptive removal Blocking effect This can handle the horizontal and vertical block edges in the prediction loop, greatly reducing the blocking effect.
H. When 264 adopts inter frame coding, motion estimation is first performed in the middle of the reference frame, and then the residual image after motion estimation is sent to the channel together with the motion vector after integer transformation, quantization and entropy coding. [3]
3. Integer transformation
In terms of transformation, H.264 uses a block of 4 × 4 pixels similar to DCT But the space transformation based on integer is used, so there is no error in the inverse transformation because of the choice, Transformation matrix As shown in Figure 5. And Floating point operation In contrast, integer DCT transformation will cause some additional errors, but because the quantization after DCT transformation also exists quantization error In contrast, the quantization error caused by integer DCT transform has little effect. In addition, the integer DCT transform also has the advantages of reducing the amount of computation and Complexity , which is conducive to the advantages of transplanting to fixed-point DSP.
4. Quantification
H. 52 different options in 264 quantization step , which is the same as the 31 quantization in H.263 step Very similar, but in H.264, the step size is progressive with a compound rate of 12.5%, rather than a fixed constant.
In H.264, there are also two ways to read transform coefficients: Zigzag scan and double scan, as shown in Figure 6. In most cases, simple zigzag scanning is used; Dual scan only for smaller Quantization level Block, which helps improve Coding efficiency Fig. 6 Reading mode of transformation coefficient
The last step of video coding is entropy coding. Two different entropy coding methods are used in H.264: Universal Variable Length Coding (UVLC) and Text Based Adaptive Binary Arithmetic Coding (CABAC).
In standards such as H.263 data type Such as transformation coefficient, motion vector, etc VLC Code meter H. The UVLC code table in 264 provides a simple method. No matter what type of data the symbol represents, the unified variable word length is used Coding table Its advantages are simple; The disadvantage is that a single code table is based on the probability statistical distribution According to the model, the correlation between coded symbols is not considered, and the effect is not very good at medium and high bit rates.
Therefore, optional CABAC method is also provided in H.264. arithmetic coding All syntax elements (transform coefficients, motion vectors) can be used on both sides of encoding and decoding probability model In order to improve the efficiency of arithmetic coding, the basic probability model can adapt to the following Video frame And changed Statistical characteristics Content modeling provides conditional probability It is estimated that, using the appropriate content model, the correlation between symbols can be removed by selecting the corresponding probability model of the coded symbols adjacent to the symbols to be coded. Different syntax elements usually maintain different models
H. The target application of 264 covers most video services, such as Cable TV Remote monitoring Interactive media , digital TV, video conference video on demand Streaming Media Services, etc.
Standard overall framework
H. 264 network transmission Differences. Two layers are defined: the video coding layer (VCL) is responsible for efficient video content representation, and the network abstraction layer (NAL) is responsible for packaging and transmitting data in the appropriate way required by the network (as shown in the figure: standard Overall framework )。
Baseline Profile: This layer uses all the features of H.264 except B-Slices, CABAC and interleaved coding mode. This level is mainly used for low time delay For real-time applications.
Main Profile: contains all the features of the baseline profile, including B-slits, CABAC and interleaving coding modes. It is mainly aimed at the time delay requirements are not high, when compression ratio and quality requirement Higher occasions.
Profile X: supports all Baseline profile features, but does not support CABAC and macro block based adaptive frame field coding. This level is mainly for various networks Video stream Transmission applications.
1. Hierarchical design H.264 algorithm can be conceptually divided into two layers: the video coding layer is responsible for efficient video content representation, and the network abstraction layer (NAL) is responsible for packaging and transmitting data in the appropriate way required by the network. A packet based interface is defined between VCL and NAL. Packaging and corresponding signaling belong to NAL. In this way, the tasks of high coding efficiency and network friendliness are completed by VCL and NAL respectively.
The VCL layer includes block based motion compensated hybrid coding and some new features. Like the previous video coding standard, H.264 does not Pretreatment and Post processing And other functions are included in the draft, which can increase the flexibility of the standard.
NAL is responsible for using the segmented format of the underlying network to encapsulate data, including framing Logical Channel Signaling Timing information The use of or sequence end signal, etc. For example, NAL supports video in Circuit switching The transmission format on the channel supports the use of RTP for video on the Internet/ UDP / IP The format of the transmission. NAL includes its own header information, segment structure information and actual load information, that is, upper VCL data. (If used Data segmentation Technology, data may consist of several parts).
2. High precision and multi-mode motion estimation
H. 264 supports motion vectors with 1/4 or 1/8 pixel accuracy. A 6-tap filter can be used to reduce high-frequency noise when the precision is 1/4 pixel, and a more complex 8-tap filter can be used for the motion vector with the precision of 1/8 pixel. When performing motion estimation, the encoder can also select "Enhanced" Interpolation Filter to improve the prediction effect.
In the motion prediction of H.264, a macroblock (MB) can be divided into different sub blocks according to Figure 2, forming 7 block sizes in different modes. This multi-mode flexible and detailed division is more suitable for the shape of the actual moving object in the image, and greatly improves the accuracy of motion estimation. In this way, each macroblock can contain 1, 2, 4, 8 or 16 motion vectors.
In H.264, the encoder is allowed to use more than one previous frame for motion estimation, which is the so-called multi frame reference technology. For example, 2 or 3 frames just coded Reference frame , the encoder will choose to give a better prediction frame for each target macroblock, and indicate which frame is used for prediction for each macroblock.
Integer transformation of 3.4 × 4 blocks
H. 264 Similar to the previous standard, block based Transform coding However, the transformation is an integer operation rather than a real number operation, and its process is basically similar to that of DCT. The advantages of this method are: decoder The transformation and inverse transformation with the same precision are allowed in, which is easy to use Fixed-point operation Mode. In other words, there is no "inverse transformation error". The unit of transformation is 4 × 4 blocks instead of 8 × 8 blocks commonly used in the past. As the size of the transformation block is reduced, the division of moving objects is more accurate. In this way, not only the transformation calculation is less, but also the convergence error at the edge of moving objects is greatly reduced. In order to make the transformation method of small size blocks not produce gray difference between blocks in the smooth area of large area in the image, the DC coefficients of 16 4 × 4 blocks (one for each small block, 16 in total) of the macro block brightness data in the frame can be transformed for the second time by 4 × 4 blocks chroma Data 4 4 The DC coefficients of × 4 blocks (one for each block, 4 in total) are transformed by 2 × 2 blocks.
H. 264 To improve Code rate The ability to control and quantify the change of step size Amplitude control Around 12.5%, rather than changing at a constant rate. The normalization of the transform coefficient amplitude is processed in the inverse quantization process to reduce the computational complexity. To emphasize the color Verisimilitude , a smaller quantization step is adopted for the chromaticity coefficient.
4. Unified VLC
H. There are two methods for entropy coding in 264. One is to use unified VLC (UVLC: Universal VLC) for all symbols to be encoded, and the other is to use content adaptive binary arithmetic coding (CABAC:Context-Adaptive Binary Arithmetic Coding)。 CABAC is optional. Its coding performance is slightly better than UVLC, but its computational complexity is also high. UVLC uses a code set with unlimited length. The design structure is very regular. Different objects can be encoded with the same code table. This method can easily generate a codeword, and the decoder can easily identify the prefix of the codeword. UVLC can quickly obtain resynchronization when bit errors occur.
In previous H.26x series and MPEG -X series standards are adopted Interframe prediction The way. In H.264, intra prediction can be used when encoding Intra images. For each 4 * 4 block (except for the special disposal of edge blocks), each pixel can use the different weighted sum of 17 closest previously encoded pixels (some Weight It can be predicted by 0), that is, 17 pixels in the upper left corner of the block where the pixel is located. Obviously, this intra prediction is not in time, but in Spatial domain On Predictive coding Algorithm, which can remove the space between adjacent blocks Redundancy For more efficient compression.
According to the selected prediction reference points, there are nine different modes for brightness, but only one mode for intra prediction of chrominance.
6. For IP and wireless environments
H. The 264 draft contains tools for error cancellation, which facilitates the transmission of compressed video in the environment of error code and packet loss, such as Mobile channel Or IP channel Robustness
To prevent transmission errors, H.264 Video stream In time synchronization It can be completed by using intra frame image refresh, and spatial synchronization is supported by slice structured coding. At the same time, in order to facilitate resynchronization after bit error Video data Certain resynchronization points are also provided in. In addition, intra macroblock refresh and multi reference macroblocks allow the encoder to consider not only the coding efficiency but also the characteristics of the transmission channel when determining the macroblock mode.
In addition to adapting to the channel bit rate by changing the quantization step size, H.264 often uses Data segmentation To cope with the change of channel bit rate. In general, Data segmentation The concept of Service quality QoS。 For example, the syntax based data partitioning method is used to divide each frame of data into several parts according to its importance, which allows out of buffer Discard unimportant information. Similar temporal data partitioning method can also be used, by using multiple Reference frame To complete.
stay wireless communication In the application of Quantization precision Or space/ time resolution To support Wireless channel Of Bit rate Change. But, in Multicast In this case, it is impossible to require the encoder to respond to various bit rates. Therefore, it is different from the fine grading coding used in MPEG-4 FGS (Fine Granular Scalability), H.264 uses SP frames for stream switching instead of hierarchical coding.
Due to the unification of Blu ray format, most of the HD video All are encoded in H.264 format, which is divided into four main steps, namely, stream processing, inverse transformation dynamic compensation , deblocking filtering. These four steps are also the main four parts of resource consumption.
H. The first step of the four steps of H.264 decoding, "CAVLC/CABAC decoding", consumes the most computing resources, which is much higher than the other three steps (in brief, CAVLC/CABAC is two different algorithms in H.264 coding specification, both to improve Compression ratio , where CABAC is more than CAVLC compression ratio Higher, but it is also required to be higher when decoding).
If all four steps adopt processor pure Software decoding Operation HDDVD Version High Code rate H. 264 video, the processor load will be very huge, even if it can play smoothly HD video , the execution efficiency of other applications that are started at the same time will also be affected due to the excessive pressure on the processor.
If the processor decodes the "CAVLC/CABAC decoding" and "Reverse Transformation" Show Core To undertake the functions of "motion compensation" and "decoding and deblocking" can reduce the pressure on the processor to a certain extent. However, for the use of Single core processor Or low end Dual Core Processor Of the users, this is still not a good way to deal with this kind of video; Secondly, when we meet Coding rate Higher video will still cause great processing difficulty to the processor, resulting in video playback Uncertainty Consumers may encounter the situation that some videos can be played smoothly, but some videos lose frames.

redundant processing

H. 264 and previous international standard For example, compared with MPEG-4, H.263 makes full use of various redundancies to achieve efficient compression, Statistical redundancy And visual physiological redundancy.
1. Statistical redundancy: spectral redundancy (referring to the correlation between color components), spatial redundancy, and Time redundancy This is video compression Different from the fundamental point of still images, video compression mainly uses time redundancy to achieve a large compression ratio.
2. Visual physiological redundancy
Visual physiological redundancy is due to human vision system (HVS) characteristics, such as the human eye's High frequency component It is not sensitive to the high frequency component of the brightness component, and is not sensitive to the noise at the high frequency (i.e. detail) of the image.
For these redundancies, video compression algorithm Different methods are used, but the main consideration is focused on space redundancy and time redundancy. H. 264 also adopts a hybrid structure, that is, spatial redundancy and temporal redundancy are treated separately. For spatial redundancy, the standard eliminates it through transformation and quantization. Such coded frames are called I Frame; Time redundancy is achieved by Interframe prediction , i.e motion estimation And compensation. The encoded frame is called P frame or B frame. Different from previous standards, H.264 is encoded I At frame time, intra prediction is adopted, and then Prediction error Coding. This makes full use of spatial correlation and improves coding efficiency. H. 264 intra prediction takes 16x16 macroblocks as Basic unit First, the encoder uses the neighboring pixels of the same frame as the current macroblock as a reference to generate the prediction value of the current macroblock, and then prediction residual Carry out transformation and quantization, and then do entropy coding for the result after transformation and quantization. The result of entropy coding can be formed Bitstream Has. Due to the reference data Both are reconstructed images after inverse transformation and inverse quantization. Therefore, in order to make the encoding and decoding consistent, the reference data used for prediction at the encoder end is the same as that at the decoder end, which is also the reconstructed image after inverse transformation and inverse quantization.

market

Announce
edit
According to the coding and decoding functions, the H.264 market can be divided into the decoding market and the coding market.

Decoding market

H. 264 decoding products mainly include: decoding integrated circuits supporting H.264 standard, including special decoding chips and System chip SoC The decoding software supporting H.264 standard is used for various electronic products.
Satellites were the first to use H.264 decoding chips on a large scale HD STB because H. 264 Technology It can significantly improve the compression efficiency Satellite transponder The HD TV programs sent can be increased from one channel to three channels (with DVB -S2 and other new types Transmission technology )Therefore, satellite operators in the United States and Europe have used H.264 decoding chips since 2004. Up to now, it supports the H.264 standard and has become a variety of high-definition set-top boxes SoC chip Of Standard configuration , and is HD TV SoC chips are widely used.
With the rapid rise of Internet video services, all kinds of intelligent electronic devices have successively supported video network downloading and playing. H. The 264 standard has always been Network video Main points of compression technique And gradually replaced Flash Video Formatted Development trend Its main supporters are Microsoft Internet Explorer and Apple Inc The former guarantees the advantages of H.264 in the desktop device market, while the latter guarantees the advantages of H.264 in the portable device market. [2]
However, due to Google's decision to launch Chrome New Video codec technology WebM The prospect of H.264 in the network video market is greatly challenged because it does not support H.264.

Coding market

Thanks to H.264's excellent Coding efficiency , so that it will be Video monitoring equipment It is accepted by the main coding equipment market.
H. 264 efficient coding efficiency, which takes up less of the same video program network bandwidth and storage space H. The main indicators of 264 encoder include: supported resolution and Frame rate , encoding delay, encoding Bitstream compatibility , code stream control accuracy And other indicators. gross Encoder resolution It supports up to 1920X1080, and the frame rate is 25 frames( PAL )Or 30 frames (N system), and the encoding delay is more than 200 milliseconds.

Error recovery

Announce
edit
The tools for error recovery follow Video compression coding The technology is improving continuously. Old standards( H.261 、H263、 MPEG-2 In the second part of, the division of slice and macroblock group, intra coding macroblock, intra coding slice, and intra coding image are used to prevent error propagation. Later improved standards (H.263+, MPEG-4) use multi frame reference and Data segmentation Technology to recover from errors.
H. 264 standard puts forward three kinds of key technology : (1) parameter set, (2) flexible macroblock order (FMO), (3) redundant chip (RS) for error recovery.

Intra coding

H. The intra coding technology in 264 is the same as that in the previous standard. It is worth noting that:
(1) H.264 Intra prediction The reference macroblock of the coded macroblock can be an inter coded macroblock. The intra prediction macroblock is not the same as the intra coding in H.263. The predicted intra coding has better coding efficiency than the non predicted intra coding, but reduces the resynchronization performance of the intra coding. This performance can be restored by setting the limit intra prediction flag.
(2) There are two types of slices that only contain intra macro blocks. One is the intra slice, and the other is the IDRslice. The immediate refresh slice must exist in the IDRpicture. Compared with short-term reference images, immediately refreshing images has stronger resynchronization performance.
In wireless IP network environment To improve the resynchronization performance of intra frame images Rate distortion optimization Encode and set limit intra prediction markers.

image segmentation

H. 264 supports dividing an image into slices, and the number of macroblocks in the slice is arbitrary. In non FMO mode, the macroblock order in the chip is the same as Raster scanning Sequence, which is special in FMO mode. The slice division can adapt to different MTU sizes, and can also be used for interleaving packet packaging.

Reference image selection

reference resources image data Selection, whether based on macroblock, slice, or frame, is an effective tool for error recovery. For the system with feedback, after the encoder obtains the information of the image area lost in transmission, the reference image can select the original image area corresponding to the correctly received image for reference. In systems without feedback, redundant coding will be used to increase error recovery performance.

Data partitioning

Generally, the data of a macro block is stored together to form a piece, Data partitioning The macro block data in a slice is recombined, and the data related to the semantics of the macro block is formed into a partition. The partition is used to assemble the slice.
There are three different data partitions in H.264.
Header information division: It contains the type of macroblock in the film, quantization parameters and motion vectors, and is the most important information in the film.
Intra frame information division: includes intra frame CBPs and intra frame coefficients. Intra frame information can prevent error propagation.
Inter frame information division: it includes inter frame CBPs and inter frame coefficients, which are usually much larger than the first two divisions.
The intra frame information division combines with the header information to solve the intra frame macroblock, and the inter frame information division combines with the header information to solve the inter frame macroblock. Inter frame information division is the least important, Counterweight Synchronization has no contribution. When data partitioning is used, the data in the slice is saved to different caches according to its type. At the same time, the size of the slice should be adjusted to make the largest partition in the slice smaller than the MTU size.
If the decoder obtains all the partitions, it can completely reconstruct the chip; If the decoder finds that the intra frame information or inter frame information division is lost, the available header information still has good error recovery performance. This is because the macroblock type and the motion vector of the macroblock contain the basic characteristics of the macroblock.

Working with parameter sets

The sequence parameter set (SPS) includes all the information of an image sequence, and the image parameter set (PPS) includes all the pieces of an image. Multiple different sequences and Image parameters Sets are sorted and stored in the decoder. The encoder refers to the sequence parameter set to set the image parameter set, and selects the appropriate image parameter set for use according to the storage address of each encoded film header. The error recovery performance of H.264 can only be improved by focusing on protecting the sequence parameters and image parameters.
The key of using parameter set in error channel is to ensure that the parameter set reaches the decoder in time and reliably. For example, in the real-time channel, the encoder uses reliable control protocol Take them as early as possible Out of band transmission To enable the control protocol to send the first chip referencing the new parameters to the decoder before they arrive; Another method is to use application layer protection to resend multiple Backup files Ensure that at least one backup data reaches the decoding end; The third method is to solidify the parameter set settings in the hardware of the codec.

Macro Block Order (FMO)

Flexible macroblock order is a major feature of H.264. Macroblocks can be arbitrarily assigned to different slice groups by setting a macroblock order mapping table (MBAmap). FMO mode disrupts the original macroblock order, reduces coding efficiency, increases latency, but enhances Error resilience Performance. The FMO mode divides the image in various modes, including checkerboard mode and rectangle mode. Of course, the FMO mode can also enable the macroblocks in a frame to be divided sequentially, so that the size of the divided slice is smaller than wireless network MTU size of. The image data after FMO mode segmentation is transmitted separately. Taking the chessboard mode as an example, when the data of one slice group is lost, the data of another slice group (including the information of adjacent macroblocks of the lost macroblock) can be used for error concealment. experiment data display , when Loss rate When it is 10% (in video conference application), the image after error concealment still has high quality.

Redundancy slice method

As mentioned earlier, when using a system without feedback, it cannot be used Reference frame The selected method is used for error recovery. Redundant chips should be added during coding to enhance error resilience. It should be noted that the coding parameters of these redundant pieces are different from those of non redundant pieces, that is, a fuzzy redundant piece is attached to a clear piece. Clear chips are decoded first, and redundant chips are discarded if they are available; Otherwise, redundant blur pieces are used to reconstruct the image.

advantage

Announce
edit
1. Low Bit Rate: Compared with MPEG2, MPEG4 ASP and other compression technologies, under the same image quality, the amount of data compressed by H.264 technology is only 1/8 of MPEG2 and 1/3 of MPEG4.
2. High quality image: H.264 can provide continuous and smooth high-quality image( DVD Quality).
3. Strong fault tolerance: H.264 provides necessary tools to solve packet loss and other errors that are easy to occur in unstable network environments.
4. Strong network adaptability: H.264 provides a network abstraction layer, which enables H.264 files to be easily transmitted on different networks (such as the Internet, CDMA, GPRS, WCDMA, CDMA2000, etc.).
H. The biggest advantage of 264 is its high data compression Under the condition of the same image quality, the compression ratio of H.264 is more than 2 times that of MPEG-2 and 1.5~2 times that of MPEG-4. For example, if the size of the original file is 88GB, it will become 3.5GB after compression with MPEG-2 compression standard, and the compression ratio will be 25:1, while it will become 879MB after compression with H.264 compression standard, from 88GB to 879MB, and the compression ratio of H.264 will reach an amazing 102:1. Low Bit Rate plays an important role in the high compression ratio of H.264. Compared with MPEG-2, MPEG-4 ASP and other compression technologies, H.264 compression technology will greatly save users' download time and data traffic charges. In particular, H.264 has high compression ratio and high-quality and smooth images. That's why the video data compressed by H.264 network transmission The process requires less bandwidth and is more economical.
H. Part 10 of 264/AVC/MPEG-4 contains some new functions, which enable it to compress video more effectively than the old standard and provide greater flexibility for applications in various network environments. In particular, some of these key functions include:
  • Multi screen Inter picture prediction It includes the following functions:
  • In the past, it was more flexible to use pictures as encoding references. Compared with the previous standard, it allowed up to 16 reference frames (or 32 reference fields, in the case of interleaved encoding), and in some cases. This is one of the common restrictions different from the previous standard; Or regular“ B Picture "(B-frame), two. This specific function usually allows moderately improved bit rate and quality in most scenes. But in some types of scenes, such as those with repeated actions or back and forth scene cuts or exposed background areas, it allows the bit rate to be significantly reduced while maintaining clarity.
  • Variable block size motion compensation (Block) is 16 × 16 and 4 × 4, so that the precise motion area is divided into small blocks. The supported luma prediction block sizes include 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4, many of which can be used together in a single macroblock. The chroma prediction block size is based on chroma subsampling It becomes smaller in use.
  • With a maximum value of 32, the ability to use multiple motion vectors of each macroblock (one or two per partitions) in the case of 16 B macroblocks built is 4 * 4 partitions. Each 8 * 8 or larger partition area of the motion vector may point to different reference pictures.
  • stay B frame , including macro block, which leads to the ability to use any macro block type when using B frames for more effective encoding. In particular, this function omits MPEG-4 ASP .
  • Six faucets screen half pel luma Sample prediction, sharper sub-pixel motion compensation derivation. Quarterly pixel motion is derived from the Halfpel value of linear interpolation method to save electric energy and process.
  • Quarter pixel Precision motion compensation enables accurate description of displacement in the mobile field. by chroma The resolution is usually halved in two vertical and horizontal directions (see 4:2:0 )Therefore, the motion compensation of chromaticity uses one eighth of the pixel grid units of chromaticity.
  • Weighted prediction, allowing an encoder to execute motion compensation Specify the use of scaling and offset, and provide an important benefit in special cases - transition to black fade in and fade out, fade in, and cross fade in and fade out. This includes B-frame, implicit weighted prediction and explicit weighted prediction as P-frame.
  • Spatial prediction from adjacent blocks of edges Internal Encoding, not in MPEG-2 The "DC" found in only the prediction part 2 and the transformation coefficient are predicted in H.263v2 And MPEG-4 section 2. This includes 16 × 16, 8 × 8 and 4 × 4 (only one type can be used in each Macroblock Luma prediction block size in)
  • Non destructive Macro block encoding function of, including:
  • The representative mode of lossless "PCM macroblock" directly represents video data samples, allowing perfect representation of specific areas and a strict limit on the number of encoded data placed in each macroblock.
  • The representative mode of the enhanced lossless macroblock allows perfect reproduction of a specific area, usually with a very small number of bits than the PCM mode.
  • agile Interlace scanning -Scan video encoding function, including:
  • Macro block adaptive frame field (MBAFF) encoding, encoding 16 × 16 macroblocks, allows the use of the dual structure of macroblocks for pictures of frames in domain mode (compared with MPEG-2, where the results of the picture processing field mode encoded as a frame are compared with 16 × 8 half macroblocks in processing).
  • Picture adaptive frame field coding (per or PicAFF) allows freely selected mixed picture coding to be combined in these two fields for coding or as an individual single field or as a complete frame.
  • New transformation design functions, including:
  • Exactly match integer 4 × 4 block space transformation, allowing accurate placement of remnant Signal scarce“ ring "It is often found that this kind of design is similar to the well-known discrete cosine transform (DCT) in concept, which was introduced in 1974 by N. Ahmed T. Natarajan and K.R. Rao Discrete cosine transform of However, it is simplified and made to provide precisely specified decoding.
  • Precise matching of integer 8 × 8 block space transforms allows highly correlated regions to be compressed more effectively than with 4 × 4 transforms. This design is conceptually similar to the well-known discrete cosine transform, but simplifies and makes decoding available exactly as specified.
  • The adaptive encoder selects an integer conversion operation between 4 × 4 and 8 × 8 transform block sizes.
  • Secondary Adama transformation Applied to the chromaticity DC coefficient (and also luma (a special case in), obtaining more compression is performed on the "DC" coefficient of the main space transformation of the smooth region.
  • Quantitative design includes:
  • Log step size control is easier to manage bit rate by encoder and simplified inverse quantization scaling
  • Scaling matrix selects customized frequency quantization by perceptual based quantization optimization encoder
  • Helps prevent other DCT Common circulation in blocking workpiece Deblocking filter -Based on image compression technology for better visual appearance and compression efficiency
  • Entropy coding The design includes:
  • Context adaptive binary arithmetic coding (Arithmetic) The probability that the algorithm knows the syntax elements in a given context for lossless compressed syntax elements in the video stream. Arithmetic compresses data more efficiently than CAVLC, but requires more processing to decode.
  • Context adaptive variable length coding (CAVLC), which is the transform coefficient value of arithmetic coded quantization replaced by lower complexity. Although CAVLC is less complex than arithmetic, it is more detailed and more efficient than the methods usually used to place code coefficients in other designs in advance.
  • A commonly used simple and highly structured Variable length coding (VLC) technology refers to many technologies that do not pass arithmetic or CAVLC, called Exponential Golomb code (or Columbus) coded syntax elements.
  • Loss recovery functions include:
  • Allow the same video syntax to be used in many network environments Network abstraction layer (E) Definitions. H. A very basic design concept of 264 is to generate self contained packets. If you want to delete duplicate headers and MPEG-4 header extension codes (HDTV). This is achieved by decoupling information about multiple slices from the media stream. A combination of higher-level parameters is called a parameter set. H. The 264 specification includes two types of parameter settings: sequence parameter settings (SPS) and picture parameter settings (PPS). The settings remain unchanged for the encoded video sequence, and the parameters of the entire active sequence and the active picture remain unchanged for the encoded picture. The structure of the sequence and image parameter set includes such information as picture size, optional encoding mode employed, and macroblock to slice group mapping.
  • Flexible macroblock sorting (FMO), also known as Slice Group and Arbitrary Slice Order (ASO), is the basic region for ordering technology restructuring( block )The representation in the picture. Often seen as a robust feature of error or loss, the Fish Marketing Organization and Taro Aso can also be used for other purposes.
  • Data Partitioning (DP), which provides the ability, more importantly, not too important syntax elements to divide data into different packets, enables unequal error protection (UEP) applications and other types of errors or loss of robustness improvements.
  • Redundant chip (RS), error/loss robustness function, allows the encoder to send additional pictures represented (usually in low fidelity) areas. If the main representation is damaged or lost, it can be used.
  • Frame numbering, a function that allows the creation of "subsequences", through additional pictures and other pictures, to detect and conceal the entire picture. Network packet loss or channel error may lead to loss between optional inclusion of enabling secular scalability.
  • Switching slices, called SP and silicon slices, allow an encoder to directly switch to an ongoing video stream with a decoder as the purpose of bit rate switching and "spoofing mode" operation of the video stream. When the decoder jumps into the middle of the video stream using the SP/SI function, it can precisely match the position of the decoded picture in all the video streams that use different pictures or have no pictures before switching, though as a reference.
  • To prevent accidental simulation startup code This is a special sequence of encoded data. It allows random access to bits in the bitstream and byte alignment. It can be a simple automatic process in the system that loses byte synchronization.
  • Supplementary Enhanced Information (SEI) and Video Availability Information (VUI) are additional information that can be inserted into bitstreams to enhance video applications for a variety of purposes. SEI FPA (frame packing arrangement) messages containing 3D arrangements:
  • 0: Checkerboard pixel or from L and R
  • 1: Alteration of interlaced columns - L and R
  • 2: Line interlacing - L and R
  • 3: Side by side - L is on the left and R on the right
  • 4: Up and down - L is on the top, R is at the bottom
  • 5: Frame interleaving - one view per frame
  • Auxiliary pictures can be used for this purpose as Alpha synthesis .
  • Supports monochrome (4:0:0) 4:2:0, 4:2:2, and 4:4:4 chroma subsampling (according to the selected profile).
  • The sample supports bit depth precision from 8 to 14 bits per sample (according to the selected configuration file).
  • Airplanes capable of encoding individual colors are their own unique image slicing structure, macroblock mode, motion vector, etc., which allows the encoder to be designed in a simple parallel structure (only three 4:4:4 files can be configured).
  • The function and control/change of the sample value ordering in the decoded picture of the picture sequence counting, the pictures that can be kept and the timing information that can be isolated, the timing information that is allowed to be carried out, have no effect on the decoded picture content.
These technologies, as well as several others, help H.264 perform significantly better than any prior standard in a wide variety of application environments. H. 264 can often perform substantially better than MPEG-2 video - typically obtaining half or less of the same quality bit rate, especially for high bit rates and high resolutions.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software implementation, which can be downloaded for free. Its main purpose is to provide the functions of H.264/AVC, rather than useful applications itself Is an example. We are also doing some reference hardware design work in Moving image Group of Experts. The above function completes all configuration files of H.264/AVC covering H.264. Codec The configuration file is a set of functions that are determined to meet a specific set of specifications for the intended application of the codec. This means that many people list some profiles whose features are not supported. The next section discusses H.264/AVC for various configuration files.

configuration file

Announce
edit
This standard defines 21 sets of capabilities, known as configuration file , for specific application program Class of.
Non Scalable 2D The configuration file of the video application includes the following:
  • Constrained Baseline Profile (CBP)
  • Mainly low-cost applications, this profile is most commonly used for video conferencing and mobile applications. It corresponds to a subset of functions between a common baseline, master, and high profile.
  • Baseline Profile (BP)
  • Mainly for low-cost applications that need additional data loss robustness, this profile is used in some video conferencing and mobile applications. This configuration file includes the reference configuration file constrained in, three additional functions that can be used for all supported functions plus the loss of robustness (or for other purposes, such as low latency multi-point video stream synthesis). The importance of this profile has faded for some custom constraints on the baseline profile of 2009. All constrained baseline profile codes are also considered as baseline profiles of codes, and these two profiles share the same profile identifier code value.
  • Extended Profile (XP)
  • As a streaming media video profile, this profile has relatively high pressure resistance and robustness for data loss and some additional skills for server stream switching.
  • Master Profile (MP)
  • This profile is used to define standard definition digital TV broadcasting in MPEG-4 format used in DVB standard. It is not, however, used for high-definition television broadcast programs as a high profile that faded the importance of this profile when it was developed for the application in 2004.
  • High Profile (HiP)
  • Broadcast and disc storage applications, especially the main profile of applications for HDTV (for example, this is via Blu ray disc Storage format and digital television Profile for HDTV broadcast service).
  • Gradually high key (suppression)
  • Similar to high profile, but not supported by field encoding function.
  • Constrained High Profile
  • It is similar to the gradual high profile, but it does not support B (bi prediction) slicing.
  • High 10 profile (Hi10P)
  • This configuration file goes beyond the product capability of typical mainstream consumers. In the high configuration file, the image accuracy supporting decoding for each sample is up to 10 bits.
  • High 4:2:2 profile (Hi422P)
  • I mainly focus on interlaced videos of professional applications. This configuration file generates the configuration file of the top 10. The added support is 4:2:2 chroma subsampling The format simultaneously uses up to 10 bits of decoded picture precision for each sample.
  • High 4:4:4 prediction profile (Hi444PP)
  • This configuration file generates a 4:2:2 high configuration file, which supports up to 4:4:4 chroma sampling, 14 bits for each sample, and three separate color planes for each picture that supports efficient lossless region coding and encoding.
For camcorders, editing and professional applications, the standard contains four additional intra only profiles, which are defined as a simple subset of the corresponding other profiles. These are applications mainly for professionals (such as cameras and editing systems):
  • High 10 profile
  • High 10 profiles limit all internal use.
  • High 4:2:2 internal configuration file
  • High 4:2:2 profiles are restricted to all internal use.
  • High 4:4:4 internal configuration file
  • High 4:4:4 configuration files are restricted to all internal uses.
  • Internal configuration file for CAVLC 4:4:4
  • High 4:4:4 configuration files are limited to all internal uses and CAVLC entropy encoding (that is, arithmetic is not supported).
Because of this Scalable video coding (SVC), the standard includes five additional Extensible configuration file , they are defined as the combination of H.264/AVC configuration files of the base layer (identified by the name of the extensible configuration file in the second word) and the tools to achieve scalable expansion:
  • Scalable benchmark profile
  • Mainly for video conferencing, mobile and monitoring applications, the base layer (a subset of bit streams) generated by this configuration file must comply with this constraint on the baseline configuration file. For scalability tools, a subset of enabled available tools.
  • Benchmark profile for extensible constraints
  • A subset of the extensible benchmark profile is mainly used for real-time communication applications.
  • Scalable high profile
  • Mainly for broadcast and streaming media applications, the base layer generated by this configuration file must conform to the H.264/AVC high configuration file.
  • High profile with extensible constraints
  • The scalable high profile of a subset is mainly used for real-time communication applications.
  • Scalable High Internal Profile
  • Mainly for production applications, this profile is a scalable high profile that is restricted to all internal uses.
Because of this multi view video coding (MVC). The standard includes two Multiview Profile :
  • Stereo High Profile
  • The target of this profile is two views three-dimensional 3D video is combined with MVC extended tool for high profile viewing and prediction capability.
  • Multi view high profile
  • This configuration file supports inter image (secular) and inter MVC view prediction, two or more views, but does not support field image and macro block adaptive frame field coding.
  • Multi view depth high profile
function
CBP
BP
XP
MP
ProHiP
HiP
Hi10P
Hi422P
Hi444PP
Bit depth (per sample)
eight
eight
eight
eight
eight
eight
8 to 10
8 to 10
8 to 14
chroma format
4:2:0
4:2:0
4:2:0
4:2:0
4:2:0
4:2:0
4:2:0
4:2:0 / 4:2:2
4:2:0 / 4:2:2 / 4:4:4
Flexible macroblock ordering (FMO)
nothing
yes
yes
nothing
nothing
nothing
nothing
nothing
nothing
Arbitrary slice ordering (ASO)
nothing
yes
yes
nothing
nothing
nothing
nothing
nothing
nothing
Redundant chip (RS)
nothing
yes
yes
nothing
nothing
nothing
nothing
nothing
nothing
Data partition
nothing
nothing
yes
nothing
nothing
nothing
nothing
nothing
nothing
SI and SP Slicing
nothing
nothing
yes
nothing
nothing
nothing
nothing
nothing
nothing
Interleaved encoding (PicAFF, MBAFF)
nothing
nothing
yes
yes
nothing
yes
yes
yes
yes
B Slice
nothing
nothing
yes
yes
yes
yes
yes
yes
yes
Arithmetic entropy coding
nothing
nothing
nothing
yes
yes
yes
yes
yes
yes
4: 0: 0 ( monochrome )
nothing
nothing
nothing
nothing
yes
yes
yes
yes
yes
Adaptability to 4 × 4 8 × 8 transformation
nothing
nothing
nothing
nothing
yes
yes
yes
yes
yes
Quantized scaling matrix
nothing
nothing
nothing
nothing
yes
yes
yes
yes
yes
Individual C b And C r QP control
nothing
nothing
nothing
nothing
yes
yes
yes
yes
yes
Individual color plane coding
nothing
nothing
nothing
nothing
nothing
nothing
nothing
nothing
yes
Predictive lossless coding
nothing
nothing
nothing
nothing
nothing
nothing
nothing
nothing
yes

edition

Announce
edit
The version of AVC standard includes the following completed modifications, corrections and amendments (the date is the last approval date in ITU - T, although the last "international standard" approval date in ISO/IEC is somewhat different later in most cases). Each version represents a change integrated into the next lower version of the text. Bold facing version Yes, including relatively significant technical improvements.
  • Version 1: (May 30, 2003) Included in the first approval Benchmarks, mainly and extensions H.264/AVC version of the configuration file.
  • Version 2: (May 7, 2004) contains various minor corrections.
  • Version 3: (March 12005) H.264/AVC includes high High 10 、 (FRExt) Extended range of fidelity of The first amendment of High 4:2:2 , and High 4:4:4 Configuration file.
  • Version 4: (September 13, 2005) Correction contains various minor corrections and three additions Aspect ratio index
  • Revision 5: (June 13, 2006) High 4:4:4 (as correction processing in ISO/IEC).
  • Version 6: (June 13, 2006) Minor extensions composed of revisions like Color space support for extended color gamut (As above Aspect ratio index Bundled in ISO/IEC).
  • Version 7: (April 6, 2007) revised to include additional High 4:4:4 prediction And four Inner Unique Profile( Within 10 high Within 4:2:2 high High: within 4:4:4 , and CAVLC within 4:4:4 )。
  • Version 8: (November 22, 2007) H.264/AVC includes Scalable video coding based on dsp (SVC) Amendment, including Scalable baselines High scalability , and Scalable high frame The configuration file item of is a major supplement.
  • Version 9: (January 13, 2009) contains some minor modifications and corrections.
  • Version 10: (March 16, 2009) revised to include new configuration files( Constrained Baseline Profile) and only a subset of previously supported functions.
  • Version 11: (March 16, 2009) H.264/AVC contains Multiview video coding (MVC) extension, including Multi view height Significant supplement to configuration file revision.
  • Version 12: (March 9, 2010) Revised the configuration file containing the new MVC( Stereo height Profile) defines two view video encoding and interleaving tools and specifies additional SEI messages( Frame wrapping arranges SEI messages )Support.
  • Edition 13: (March 9, 2010) contains some minor amendments and corrections.
  • Issue 14: (June 29, 2011) Specify a new level( Level 5.2 )Support higher processing rate per second and a new configuration file( Step up profile )Only the revision of the maximum macro block of the frame of the high configuration file previously specified by the tool is supported.
  • Version 15: (June 29, 2011) contains some minor modifications and corrections.
  • Version 16: (January 13, 2012) contains three new configuration files, which are mainly used to revise the application definition for real-time communication: Constraint height Scalable constraint baseline , and Scalable constraint high Configuration file.
  • Version 17: (April 13, 2013) and other SEI message indicators were revised.
  • Version 18: (April 13, 2013) 3D stereo video with specified depth mapping data, including Multi view depth high The encoding of the configuration file.
  • Version 19: (April 13, 2013) Correct the error in the process of extracting the multi view video sub code stream.
  • Version 20: (April 13, 2013) Specify additional color space Identifier (including support ITU-R Recommendation BT.2020 UHDTV )And other types of tone mapping information SEI messages in the modified model.

Encoding and decoding

Announce
edit
See: Camera and on-board video stream encoding in the list
Because H.264 encoding and decoding require a lot of Computing power In a specific type of Arithmetic operation The software running in general Cpu usually has less power to achieve high efficiency. However, the latest quad core general-purpose x 86 Cpu has enough computing power to perform real-time standard definition and high-definition coding. The compression efficiency depends on the algorithm implementation of the video, not whether to use hardware or software. Therefore, hardware and software are based on Power efficiency , flexibility and cost. In order to improve power efficiency and reduce hardware form factors, special hardware may be employed for the complete encoding or decoding process, or CPU Accelerate assistance in a controlled environment.
be based on CPU The solution is known to be much more flexible, especially the coding must In progress Simultaneous multi format, multi bit rate and various resolutions (multi screen video) and possible container formats Additional Features , advanced and comprehensive advertising functions, etc. CPU based software solutions usually make it easier to perform multiple concurrent coding sessions within the same CPU load balance
Introduced in January 2011 CES (Consumption exhibition of electronic products Intel "Sandy Bridge" Core i3/i5/i7 processor provides a comprehensive HD H.264 encoder called Intel Fast Sync Video on-chip Hardware.
Hardware H.264 encoder can be ASIC or FPGA FPGA is a general programmable chip. To use FPGA as hardware encoder, H.264 encoder design requires custom application chips. The complete HD H.264 encoder can run in 2009 on a single low-cost FPGA chip (high profile, level 4.11080p, 30 fps).
ASIC encoder and H.264 encoder functions can be obtained from many different semiconductor companies, but the core design used in ASIC usually consists of chips and media, On2 (formerly Hantro Google Acquisition), Imagination Technology, NGCodec and other companies. FPGA and ASIC products provided by some companies.
Texas Instruments Production line ARM +Execute the DSP core code 1080p of DSP H264 BP at 30 frames per second. This allows (where implemented as highly optimized DSP code) Codec The flexibility of is higher than the software efficiency on the general CPU.