Lossy compression

Announce Upload video
Lossy compression
Collection
zero Useful+1
zero
Lossy compression is the use of human image Or the insensitivity of some frequency components in sound waves, which allows certain information to be lost during compression; Although not fully recovered raw data However, the lost part of the original image The impact of is reduced, but the compression ratio is much higher.
Lossy compression is widely used in voice, image and video Data compression.

summary

Announce
edit
Lossy compression, also known as destructive data compression in Taiwan, Hong Kong and Macao,
common voice image video Compression is basically lossy.
Lossy compression
stay Multi-Media In application, the common compression methods are: PCM ( Pulse code modulation ), Predictive coding Transform coding , interpolation and extrapolation, statistical coding, vector quantization and subband coding, etc, Hybrid coding It is a widely used method.
mp3 divX Xvid jpeg rm rmvb wma wmv All are lossy compression.
Lossy data compression The method is to compress and decompress the data and raw data Different but very similar compression methods. Lossy data compression Also known as destructive compression, it means that the secondary information data is compressed, and some quality is sacrificed to reduce the amount of data and improve the compression ratio. This method is often used on the Internet, especially Streaming Media And the telephone field. In this article, it is often called encoding and decoding. It is related to lossless data compression The corresponding compression method. According to different formats, Lossy data compression There will be generationloss: compressing and decompressing files will lead to progressive quality degradation.
The defects caused by lossy compression that can be detected by human eyes or ears are called compression artifacts.

lossless compression

Announce
edit
Lossless compression is the compression of the file itself. Like the compression of other data files, lossless compression optimizes the data storage mode of the file. By using some algorithm to represent duplicate data information, the file can be completely restored without affecting the file content. For digital images, there will be no loss of image details.
The basic principle is that the same color information only needs to be saved once. The software for compressing an image will first determine which areas in the image are the same and which are different. Images including duplicate data (such as blue sky) can be compressed, and only the start and end points of blue sky need to be recorded. However, there may be different shades of blue, and the sky may sometimes be covered by trees, mountains or other objects, which need to be recorded separately. Essentially, the lossless compression method can delete some duplicate data and greatly reduce the size of the image to be saved on the disk. However, the lossless compression method cannot reduce the memory consumption of the image. This is because when the image is read from the disk, the software will fill in the missing pixels with appropriate color information. If you want to reduce the memory capacity of images, you must use lossy compression methods.
Lossy compression is characterized by maintaining the gradual change of color and deleting the sudden change of color in the image.

type

Announce
edit

Lossy transform codec

First, the image or sound sampling , cut into small pieces, transform into a new space quantification And then carry out Entropy coding

Predictive codec

Lossy compression
The previous data and subsequent decoded data are used to predict the current sound sampling or image frame The error between the predicted data and the actual data, as well as other information that reproduce the prediction, are quantified and encoded.
Some systems use these two technologies at the same time, and transform encoding and decoding is used to compress the error signal generated by the prediction step.

Advantages and disadvantages

Announce
edit
One of the advantages of lossy methods is that in some cases, the file size can be much smaller than that of any known lossless methods, while meeting the needs of the system. When the user gets damaged Compressed file For example, in order to save download time, the unzipped file and the original file are Data bits They may be quite different from each other, but for most practical purposes, the human ear or the human eye cannot distinguish the difference between the two.
Lossy methods are often used for compression voice image as well as video
Damaging video Encoding and decoding almost always achieve audio frequency Or static image Much better compression ratio (Compression ratio is Compressed file To uncompressed files).
audio frequency The compression ratio of 10:1 can be achieved without perceived quality degradation, video A very large compression ratio such as 300:1 can be achieved under the condition of slightly observing the quality degradation.
Lossy compression image It is characterized by maintaining the gradual change of color and deleting the sudden change of color in the image. A lot of experiments in biology have proved that the human brain will use the nearest color to fill the lost color. For example, for a white cloud on a blue sky background, the lossy compression method is to delete it image Some colored parts of the edge of the scene. When viewing this picture on the screen, the brain will use the colors seen on the scene to fill in the missing color parts. Use detrimentally compression technique Some data has been deleted intentionally, and the cancelled data is no longer recovered.
Lossy compression
Lossy static image compression Often as audio frequency It can get 1/10 of the original size, but it is undeniable that the use of compression technique Yes, it will affect image The quality, especially when carefully observed, decreases more obviously,. In addition, if lossy compression is used image Only on screen The above shows that it may have little impact on image quality, at least for the recognition degree of human eyes, because human eyes are more sensitive to light, and the role of light on scenery is more important than the role of color. However, if you want to pass a picture compression technique The processed image is printed by a high-resolution printer, and the image quality will be obviously damaged.
Some methods take into account the characteristics of human anatomy, for example, the human eye can only see light at a certain frequency. The psychoacoustic model describes voice How to achieve the maximum compression without reducing the sound perception quality.

Common Formats

Announce
edit
——MP3(MP3PRO\MP3SURROUND)、AAC(*.3gp/*.mp4/*.m4a)、ATRAC3/ATRAC3+(*.aa3)。
Lossy compression
First understand audio frequency Principle of compression: using the psychoacoustic characteristics of human hearing (spectrum masking characteristics and time masking characteristics, etc.) and the limited resolution of the human ear to signal amplitude, frequency and time, when encoding, the frequencies that the human ear cannot sense will not be encoded or transmitted, that is, the frequencies that the human ear cannot recognize will not be encoded or transmitted voice The part (called irrelevant part or irrelevant part) that does not contribute to the strength, tone and direction of the signal is not encoded and transmitted. When encoding the part that cannot be felt, it is allowed to have large quantization distortion and keep it below the hearing threshold (i.e. the lowest volume that can be heard by the human ear), and the human ear still cannot feel it. audio frequency The compression of is to use these characteristics to work.

Some Basic Concepts of Psychoacoustics

1. Equal loudness curve
The sensitivity of human hearing changes with frequency. That is, two tones with the same power but different frequencies usually sound different. Passing equal loudness curve We can see that the human ear is most sensitive to the frequency of 4KHz, that is, it can be detected at 4KHz voice The pressure level (loudness) cannot be detected at other frequencies. This provides conditions for distortion at some less sensitive frequencies.
2. Shielding
We learned shielding in high school physics. Is strong voice The signal covers the weak sound signal, so that we cannot detect it. And when two voice When the time and frequency are very close, the shielding effect will be strong. Therefore, we can not encode or transmit the masked part during encoding. In this way, the sound quality still has no great loss, and the human ear is not easy to detect.
3. Critical frequency band
For human hearing voice The perceptual characteristics of "" are not changed on the scale of linear frequency (human hearing is not so good), but can be expressed in a series of limited frequency bands called critical frequency bands. In short, divide the whole frequency band into several bands. In each band, the auditory perception of the human ear is the same, that is, the psychoacoustic characteristics are the same.
To get down to business, the essence of coding is algorithm.

Mainstream coding and its algorithm

1、MP3(MP3PRO\MP3SURROUND)
MP3 should be the most widely used lossy compression number at present Audio format Has. Its full name is MPEG( MovingPictureExpertsGroup )AudioLayer-3。 1987 Germany A Lossy Compression Developed by Fraunhofer Research Institute Digital audio And obtained a patent in 1989. At first, it was not perfect. It was more like a coding standard framework, which was left to people to improve. In 1992, this technology was incorporated into the MPEG specification and officially known as MP3.
MP3 files are composed of frames( frame )Frame is the smallest component of MP3 file. What is a frame? Remember how the original animation was made? Different continuous pictures are switched to achieve dynamic effect. Each picture is a "frame". The difference is that the frame recorded in MP3 is audio frequency Data instead of drawing data. The frame speed of MP3 is about 30 frames per second.
Each frame is composed of frame header and frame data. The frame header records the basic information of the frame, including bit rate index and sampling rate index (this is helpful for understanding ABR and VBR Coding method Very important). Frame data, as the name implies, records the subject audio frequency Data.
All the above are the basis of MP3 coding, but in fact, the early encoders are very imperfect, the compression algorithm is nearly rough, and the sound quality is not ideal. MP3 has two leaps in sound quality: human auditory psychological model( PerceptualModel )And the application of VBR technology.
PS: VBR is variablebitrate Abbreviation for variable ratio, which means when MP3 files are suppressed voice When there are many elements and the ratio is high, the compression will be automatically reduced Bit rate , automatically increase the bit rate when the demand for bit rate is low. The purpose of this is to increase the speed of online file playback and reduce the share of system resource ... This is an algorithm developed by Xing. They encode the complex part of a song with high Bitrate and the simple part with low Bitrate. Although it's a good idea, it's a pity that Xing encoder's VBR algorithm is very poor, and the sound quality is far from CBR. Fortunately, Lame perfectly optimized the VBR algorithm, making it the best encoding mode for MP3. This is a way to take the quality as the prerequisite and give consideration to the file size. The coding mode is recommended.
Lossy compression
MP3 can survive today, but its development has not stopped. On 14 June 2001, France Thomson and U.S.A RCA two companies jointly launched a new Compressed format :MP3PRO。 MP3PRO is an improvement based on MP3 technology, which uses CodingTechnologies The codec enhancement technology developed by the company is called SBR( SpectralBandReplication )。 When making MP3PRO files, the encoder will audio frequency It is divided into two parts. Part of it is to audio frequency The low-frequency part of the data is separated, and the normal MP3 is obtained through traditional MP3 technology coding Audio stream This makes MP3 encoder focus on the compression of low-frequency signals to obtain better quality, and makes the original MP3 player It can also play MP3 PRO files. The other part is to encode the separated high-frequency signal and embed it into the MP3 stream. The traditional MP3 player will ignore it, while the new MP3 PRO player will restore it and combine it to obtain high-quality full bandwidth voice Through this technology, the MP3 PRO64Kbps coding rate can provide the same quality of 128Kbps MP3, and have the same sound quality, while the volume is only half the size of MP3.
PSP supports both MP3PRO and MP3PRO format conversion Software There are many, you can go online to find. If you are interested, you can try it mp3 Strong.
Thomson officially announced the world's most popular music in early December 2004 Compressed format MP3 enters the era of multi-channel. MP3SURROUND is jointly developed by Fraunhofer IIS and Agere. It uses the psychoacoustic coding technology of binaural CueCoding (BCC), which can guarantee the file size while realizing multi-channel surround. Joined at the same time AgereSystems The company is mainly responsible for promoting the multi-channel MP3 format - MP3 SURROUND. MP3SURROUND technology achieves high-quality 5.1 channel surround audio frequency , which can be used in Network music Release broadcasting system , PC audio-visual applications, game sound effects Consumer Electronics And car audio. Although multiple channels are integrated, Thomson said that MP3SURROUND files have not increased significantly compared with ordinary MP3s (with the same sampling rate), and compared with other surround multichannels Audio format Only half of them. More importantly, MP3SURROUND provides good compatibility Software , normal use on MP3 player.
2、AAC(*.3gp/*.mp4/*.m4a)
AAC is advanced Audio coding AdvancedAudioCoding )Abbreviation of the Fraunhofer Institute Dolby Developed jointly with AT&T. AAC Yes MPEG-2 It is part of the specification, which is applicable to the ultra high quality from 8Kbps mono phone sound quality to 160Kbps multi-channel sound quality audio frequency Encoding within the range. Compared with MP3, AAC has added perfect stereo reproduction, code stream effect sound scanning Multi-Media MP3 for control, noise reduction and optimization Audio format The features that are not available make it possible to perfectly reproduce CD sound quality after audio compression. It also supports up to 48 tracks, 15 low-frequency tracks, more sampling rates and Bit rate , multi language compatibility, and higher decoding efficiency. In a word, AAC can provide better sound quality on the premise of reducing the size of MP3 files by 30%.
Here are some descriptions of several modules:
Gain control
The gain control module is used in the variable sampling rate configuration, which consists of a polyphase quadrature filter PQF( polyphasequadraturefilter ), gain detector and gain modifier. This module separates the input signal into four bands of equal bandwidth. There is also a gain control module in the decoder, which can obtain a low sampling rate output signal by ignoring the high subband signal of PQF.
FilterBank
Filter banks are used to transfer input signals from Time domain transformation The conversion module to frequency domain is the basic module of MPEG-2AAC system. This module has been improved discrete cosine transform MDCT, It is a linear orthogonal overlapping transform, which uses a method called time-domain aliasing to cancel TDAC( timedomainaliasingcancellation )Technology. MDCT uses KBD( Kaiser-Besselderived )The window or sine window can be used. The forward MDCT transformation can be expressed as follows:
The reverse MDCT transformation can be expressed as follows:
Among them,
N=sample number,
N=length of transform block,
I=block number,
The above two discrete cosine transformation formulas are shown in《 Discrete function 》It is introduced in detail in Mathematical and Physical Equations, only to help interested players understand, without further study.
Instantaneous noise shaping TNS
Perception voice In coding, TNS module is a method to control the instantaneous shape of quantization noise, which solves the problem of false matching between masking threshold and quantization noise. The basic idea of this technology is that the tone signal in the time domain has an instantaneous peak in the frequency domain. TNS uses this duality to expand the known Predictive coding The quantization noise is placed below the actual signal to avoid wrong matching.
joint stereo coding
Lossy compression
Combined stereo coding( jointstereocoding )It is a spatial coding technology, whose purpose is to remove redundant information in space. MPEG-2AAC system includes two spatial coding technologies: M/S coding( Mid/Sideencoding )Harmony intensity/coupling( Intensity/Coupling )。 Use of M/S code matrix Operation, so M/S coding is called matrix stereo coding( matrixedstereocoding )。 M/S coding does not transmit left and right channel signals, but uses the nominal "sum" signal and "difference" signal. The former is used for the central M (middle) channel, and the latter is used for the side S (side) channel. Therefore, M/S coding is also called "sum difference coding". There are many names of sound intensity/coupling codes, some called sound intensity stereo codes( intensitystereocoding ), or channel coupling coding( channelcouplingcoding )The basic problem they explore is the irrelevance between vocal channels.
Forecast( Prediction )
This is a technology commonly used in voice coding systems, which is mainly used to reduce the redundancy of stationary signals.
Quantizer( Quantizer )
A nonuniform quantizer is used.
Noiseless coding( Noiselesscoding )
Noise free coding is actually Hoffman code It encodes the quantized spectral coefficient, scale factor and direction information.
PS: Personally, I prefer AAC, so I write it in more detail. You can try it, which is definitely better than MP3. You can use iTunes 6 to convert AAC (*. m4a). The operation of iTunes 6AAC is very simple. You can directly copy the AAC (*. 3gp *. mp4 *. m4a) to [MUSIC] to play.
It can be said that aac is the best lossy compression method at present.
The highest quality general comparison does not make any difference (to the naked eye).
3、ATRAC3/ATRAC3+(*.aa3)
Friends who played MD in the early years know that SONY's ATRAC is specially designed for MD Audio format The algorithm is later widely used in SONY's NetworkWalkman and other portable audio devices. "ATRAC3plus" stands for "adaptive conversion voice Code 3+"is a set of audio frequency compression technique , developed from the ATRAC3 format, this technology became increasingly perfect in 2002. This technology is to integrate MD Walkman The theoretical basis for reducing the volume of. [1]
To analyze ATRAC3/ATRAC3+, let's talk about its big brother - ATRAC algorithm. When Digital audio When data is compressed, a certain amount of quantization noise is usually introduced into the signal. To prevent these signals from being perceived by the human ear, audio frequency The coding decomposes the signal into a group of units, each of which corresponds to a specific time frequency range. The encoder will analyze according to the psychoacoustic principle mentioned above, and carry out high-precision coding for important units. For insensitive units, it can retain some quantitative noise without affecting the perception quality of the human ear. When decoding, the quantized spectrum will be re established according to the bit allocation, and then synthesized sound signal
ATRAC is no exception, but there are some improvements. ATRAC also applies subband decoding and conversion decoding technologies, and the input signals are distributed unevenly to emphasize the frequency division of important bass areas. In addition, ATRAC uses a variable block length to change the input signal, which can ensure efficient decoding when passing stably and will not affect the time resolution when passing instantaneously. Specifically, the input signal is divided into three frequency bands at 5.5125KHz and 11.025KHz. Subband decomposition uses QMF( QuadratureMirrorFilters Integral mapping filter); These three bands are MDCT( ModifiedDiscreteCosineTransform Indexing discrete cosine transform - similar to the usual fast Fourier transform , Advanced Mathematics II and Mathematical Equations When converted into spectrum values, MDCT allows 50% overlap between blocks, which can improve frequency resolution while maintaining critical sampling. The length of the block can change according to the type of signal, which is the adaptive part of ATRAC (this practice is mainly used to mask the initial quantization noise).
When ATRAC algorithm has been developed for 10 years, it can not meet the market demand, SONY A new algorithm was launched in August 2002——
ATRAC3/ATRAC3+。 Its core algorithm is not substantially changed compared with ATRAC, but it uses improved band separation filtering and MDCT, and uses gain adjustment, tone component separation, joint stereo and other technologies to make audio frequency The volume of compressed data is further reduced.
4、AAL(ATRACAdvancedLossless)
AAL is the abbreviation of ATRACAdvancedLossless (Adaptive Acoustic Conversion Advanced Lossless), which is a new development of SONY audio frequency Compressed format Its characteristics are lossless compression A CD can be compressed to 30% - 80% of the original without losing any audio information.
5、Ogg
Ogg's full name should be OGG Vobis (ogg Vorbis) audio frequency Compressed format , similar to MP3 and other existing Music format However, it is completely free, open and free of patent restrictions. One outstanding feature of OGG Vobis is that it supports multi-channel. With its popularity, it will not be a dream to listen to multi-channel works encoded by DTS with a walkman in the future.
This is Vorbis audio frequency The name of the compression mechanism, while Ogg is the name of a plan, which is intended to design a completely open multimedia system
The extension of Ogg Vorbis file is OGG。 The design format of this file is very advanced. The created OGG file can be played on any player, so this file format It can continuously improve the size and sound quality without affecting the old encoder or player.
Compared with aac, it has a slight advantage in low frequency, and is inferior to aac in high frequency.
The highest quality general comparison does not make any difference (to the naked eye).
The highest quality, namely Q10, is almost twice the size of the highest quality Q500 encoded by aac using faac.
Open source coding.