Lossy compression is the use of humanimageOr the insensitivity of some frequency components in sound waves, which allows certain information to be lost during compression;Although not fully recoveredraw dataHowever, the lost part of the originalimageThe impact of is reduced, but the compression ratio is much higher.
Lossy compression is widely used in voice, image andvideoData compression.
Lossy data compressionThe method is to compress and decompress the data andraw dataDifferent but very similar compression methods.Lossy data compressionAlso known as destructive compression, it means that the secondary information data is compressed, and some quality is sacrificed to reduce the amount of data and improve the compression ratio.This method is often used on the Internet, especiallyStreaming MediaAnd the telephone field.In this article, it is often called encoding and decoding.It is related tolossless data compression The corresponding compression method.According to different formats,Lossy data compressionThere will be generationloss: compressing and decompressing files will lead to progressive quality degradation.
The defects caused by lossy compression that can be detected by human eyes or ears are called compression artifacts.
lossless compression
Announce
edit
Lossless compression is the compression of the file itself. Like the compression of other data files, lossless compression optimizes the data storage mode of the file. By using some algorithm to represent duplicate data information, the file can be completely restored without affecting the file content. For digital images, there will be no loss of image details.
The basic principle is that the same color information only needs to be saved once.The software for compressing an image will first determine which areas in the image are the same and which are different.Images including duplicate data (such as blue sky) can be compressed, and only the start and end points of blue sky need to be recorded.However, there may be different shades of blue, and the sky may sometimes be covered by trees, mountains or other objects, which need to be recorded separately.Essentially, the lossless compression method can delete some duplicate data and greatly reduce the size of the image to be saved on the disk.However, the lossless compression method cannot reduce the memory consumption of the image. This is because when the image is read from the disk, the software will fill in the missing pixels with appropriate color information.If you want to reduce the memory capacity of images, you must use lossy compression methods.
Lossy compression is characterized by maintaining the gradual change of color and deleting the sudden change of color in the image.
The previous data and subsequent decoded data are used to predict the current sound sampling orimage frameThe error between the predicted data and the actual data, as well as other information that reproduce the prediction, are quantified and encoded.
Some systems use these two technologies at the same time, and transform encoding and decoding is used to compress the error signal generated by the prediction step.
Advantages and disadvantages
Announce
edit
One of the advantages of lossy methods is that in some cases, the file size can be much smaller than that of any known lossless methods, while meeting the needs of the system.When the user gets damagedCompressed fileFor example, in order to save download time, the unzipped file and the original file areData bitsThey may be quite different from each other, but for most practical purposes, the human ear or the human eye cannot distinguish the difference between the two.
Lossy methods are often used for compressionvoice、imageas well asvideo。
audio frequencyThe compression ratio of 10:1 can be achieved without perceived quality degradation,videoA very large compression ratio such as 300:1 can be achieved under the condition of slightly observing the quality degradation.
Lossy compressionimageIt is characterized by maintaining the gradual change of color and deleting the sudden change of color in the image.A lot of experiments in biology have proved that the human brain will use the nearest color to fill the lost color.For example, for a white cloud on a blue sky background, the lossy compression method is to delete itimageSome colored parts of the edge of the scene.When viewing this picture on the screen, the brain will use the colors seen on the scene to fill in the missing color parts.Use detrimentallycompression technique Some data has been deleted intentionally, and the cancelled data is no longer recovered.
Lossy compression
Lossy staticimage compression Often asaudio frequencyIt can get 1/10 of the original size, but it is undeniable that the use ofcompression technique Yes, it will affectimageThe quality, especially when carefully observed, decreases more obviously,.In addition, if lossy compression is usedimageOnly onscreenThe above shows that it may have little impact on image quality, at least for the recognition degree of human eyes, because human eyes are more sensitive to light, and the role of light on scenery is more important than the role of color.However, if you want to pass a picturecompression technique The processed image is printed by a high-resolution printer, and the image quality will be obviously damaged.
Some methods take into account the characteristics of human anatomy, for example, the human eye can only see light at a certain frequency.The psychoacoustic model describesvoiceHow to achieve the maximum compression without reducing the sound perception quality.
First understandaudio frequencyPrinciple of compression: using the psychoacoustic characteristics of human hearing (spectrum masking characteristics and time masking characteristics, etc.) and the limited resolution of the human ear to signal amplitude, frequency and time, when encoding, the frequencies that the human ear cannot sense will not be encoded or transmitted, that is, the frequencies that the human ear cannot recognize will not be encoded or transmittedvoiceThe part (called irrelevant part or irrelevant part) that does not contribute to the strength, tone and direction of the signal is not encoded and transmitted.When encoding the part that cannot be felt, it is allowed to have large quantization distortion and keep it below the hearing threshold (i.e. the lowest volume that can be heard by the human ear), and the human ear still cannot feel it.audio frequencyThe compression of is to use these characteristics to work.
The sensitivity of human hearing changes with frequency.That is, two tones with the same power but different frequencies usually sound different.Passing equal loudnesscurveWe can see that the human ear is most sensitive to the frequency of 4KHz, that is, it can be detected at 4KHzvoiceThe pressure level (loudness) cannot be detected at other frequencies.This provides conditions for distortion at some less sensitive frequencies.
2. Shielding
We learned shielding in high school physics.Is strongvoiceThe signal covers the weak sound signal, so that we cannot detect it.And when twovoiceWhen the time and frequency are very close, the shielding effect will be strong.Therefore, we can not encode or transmit the masked part during encoding.In this way, the sound quality still has no great loss, and the human ear is not easy to detect.
3. Critical frequency band
For human hearingvoiceThe perceptual characteristics of "" are not changed on the scale of linear frequency (human hearing is not so good), but can be expressed in a series of limited frequency bands called critical frequency bands.In short, divide the whole frequency band into several bands. In each band, the auditory perception of the human ear is the same, that is, the psychoacoustic characteristics are the same.
To get down to business, the essence of coding is algorithm.
Mainstream coding and its algorithm
1、MP3(MP3PRO\MP3SURROUND)
MP3 should be the most widely used lossy compression number at presentAudio formatHas.Its full name is MPEG(MovingPictureExpertsGroup)AudioLayer-3。1987GermanyA Lossy Compression Developed by Fraunhofer Research InstituteDigital audioAnd obtained a patent in 1989.At first, it was not perfect. It was more like a coding standard framework, which was left to people to improve.In 1992, this technology was incorporated into the MPEG specification and officially known as MP3.
MP3 files are composed of frames(frame)Frame is the smallest component of MP3 file.What is a frame?Remember how the original animation was made?Different continuous pictures are switched to achieve dynamic effect. Each picture is a "frame". The difference is that the frame recorded in MP3 isaudio frequencyData instead of drawing data.The frame speed of MP3 is about 30 frames per second.
Each frame is composed of frame header and frame data. The frame header records the basic information of the frame, including bit rate index and sampling rate index (this is helpful for understanding ABR and VBRCoding methodVery important).Frame data, as the name implies, records the subjectaudio frequencyData.
All the above are the basis of MP3 coding, but in fact, the early encoders are very imperfect, the compression algorithm is nearly rough, and the sound quality is not ideal.MP3 has two leaps in sound quality: human auditory psychological model(PerceptualModel)And the application of VBR technology.
PS: VBR isvariablebitrateAbbreviation for variable ratio, which means when MP3 files are suppressedvoiceWhen there are many elements and the ratio is high, the compression will be automatically reducedBit rate, automatically increase the bit rate when the demand for bit rate is low. The purpose of this is to increase the speed of online file playback and reduce the share ofsystem resource... This is an algorithm developed by Xing. They encode the complex part of a song with high Bitrate and the simple part with low Bitrate.Although it's a good idea, it's a pity that Xing encoder's VBR algorithm is very poor, and the sound quality is far from CBR.Fortunately, Lame perfectly optimized the VBR algorithm, making it the best encoding mode for MP3.This is a way to take the quality as the prerequisite and give consideration to the file size. The coding mode is recommended.
Lossy compression
MP3 can survive today, but its development has not stopped.On 14 June 2001,FranceThomson andU.S.ARCA two companies jointly launched a newCompressed format:MP3PRO。MP3PRO is an improvement based on MP3 technology, which usesCodingTechnologiesThe codec enhancement technology developed by the company is called SBR(SpectralBandReplication)。When making MP3PRO files, the encoder willaudio frequencyIt is divided into two parts.Part of it is toaudio frequencyThe low-frequency part of the data is separated, and the normal MP3 is obtained through traditional MP3 technology codingAudio stream。This makes MP3 encoder focus on the compression of low-frequency signals to obtain better quality, and makes the originalMP3 playerIt can also play MP3 PRO files.The other part is to encode the separated high-frequency signal and embed it into the MP3 stream.The traditional MP3 player will ignore it, while the new MP3 PRO player will restore it and combine it to obtain high-quality full bandwidthvoice。Through this technology, the MP3 PRO64Kbps coding rate can provide the same quality of 128Kbps MP3, and have the same sound quality, while the volume is only half the size of MP3.
PSP supports both MP3PRO and MP3PRO format conversionSoftwareThere are many, you can go online to find.If you are interested, you can try itmp3Strong.
Thomson officially announced the world's most popular music in early December 2004Compressed formatMP3 enters the era of multi-channel.MP3SURROUND is jointly developed by Fraunhofer IIS and Agere. It uses the psychoacoustic coding technology of binaural CueCoding (BCC), which can guarantee the file size while realizing multi-channel surround.Joined at the same timeAgereSystemsThe company is mainly responsible for promoting the multi-channel MP3 format - MP3 SURROUND.MP3SURROUND technology achieves high-quality 5.1 channel surroundaudio frequency, which can be used inNetwork musicReleasebroadcasting system, PC audio-visual applications, game sound effectsConsumer ElectronicsAnd car audio.Although multiple channels are integrated, Thomson said that MP3SURROUND files have not increased significantly compared with ordinary MP3s (with the same sampling rate), and compared with other surround multichannelsAudio formatOnly half of them.More importantly, MP3SURROUND provides good compatibilitySoftware, normal use on MP3 player.
2、AAC(*.3gp/*.mp4/*.m4a)
AAC is advancedAudio coding(AdvancedAudioCoding)Abbreviation of the Fraunhofer InstituteDolbyDeveloped jointly with AT&T.AAC YesMPEG-2It is part of the specification, which is applicable to the ultra high quality from 8Kbps mono phone sound quality to 160Kbps multi-channel sound qualityaudio frequencyEncoding within the range.Compared with MP3, AAC has added perfect stereo reproduction, code stream effect sound scanningMulti-MediaMP3 for control, noise reduction and optimizationAudio formatThe features that are not available make it possible to perfectly reproduce CD sound quality after audio compression.It also supports up to 48 tracks, 15 low-frequency tracks, more sampling rates andBit rate, multi language compatibility, and higher decoding efficiency.In a word, AAC can provide better sound quality on the premise of reducing the size of MP3 files by 30%.
Here are some descriptions of several modules:
Gain control
The gain control module is used in the variable sampling rate configuration, which consists of a polyphase quadrature filter PQF(polyphasequadraturefilter), gain detector and gain modifier.This module separates the input signal into four bands of equal bandwidth.There is also a gain control module in the decoder, which can obtain a low sampling rate output signal by ignoring the high subband signal of PQF.
FilterBank
Filter banks are used to transfer input signals fromTime domain transformationThe conversion module to frequency domain is the basic module of MPEG-2AAC system.This module has been improveddiscrete cosine transform MDCT,It is a linear orthogonal overlapping transform, which uses a method called time-domain aliasing to cancel TDAC(timedomainaliasingcancellation)Technology.MDCT uses KBD(Kaiser-Besselderived)The window or sine window can be used. The forward MDCT transformation can be expressed as follows:
The reverse MDCT transformation can be expressed as follows:
Among them,
N=sample number,
N=length of transform block,
I=block number,
The above two discrete cosine transformation formulas are shown in《Discrete function》It is introduced in detail in Mathematical and Physical Equations, only to help interested players understand, without further study.
Instantaneous noise shaping TNS
PerceptionvoiceIn coding, TNS module is a method to control the instantaneous shape of quantization noise, which solves the problem of false matching between masking threshold and quantization noise.The basic idea of this technology is that the tone signal in the time domain has an instantaneous peak in the frequency domain. TNS uses this duality to expand the knownPredictive codingThe quantization noise is placed below the actual signal to avoid wrong matching.
joint stereo coding
Lossy compression
Combined stereo coding(jointstereocoding)It is a spatial coding technology, whose purpose is to remove redundant information in space.MPEG-2AAC system includes two spatial coding technologies: M/S coding(Mid/Sideencoding)Harmony intensity/coupling(Intensity/Coupling)。Use of M/S codematrixOperation, so M/S coding is called matrix stereo coding(matrixedstereocoding)。M/S coding does not transmit left and right channel signals, but uses the nominal "sum" signal and "difference" signal. The former is used for the central M (middle) channel, and the latter is used for the side S (side) channel. Therefore, M/S coding is also called "sum difference coding".There are many names of sound intensity/coupling codes, some called sound intensity stereo codes(intensitystereocoding), or channel coupling coding(channelcouplingcoding)The basic problem they explore is the irrelevance between vocal channels.
Forecast(Prediction)
This is a technology commonly used in voice coding systems, which is mainly used to reduce the redundancy of stationary signals.
Quantizer(Quantizer)
A nonuniform quantizer is used.
Noiseless coding(Noiselesscoding)
Noise free coding is actuallyHoffman codeIt encodes the quantized spectral coefficient, scale factor and direction information.
PS: Personally, I prefer AAC, so I write it in more detail. You can try it, which is definitely better than MP3.You can use iTunes 6 to convert AAC (*. m4a).The operation of iTunes 6AAC is very simple. You can directly copy the AAC (*. 3gp *. mp4 *. m4a) to [MUSIC] to play.
It can be said that aac is the best lossy compression method at present.
The highest quality general comparison does not make any difference (to the naked eye).
3、ATRAC3/ATRAC3+(*.aa3)
Friends who played MD in the early years know that SONY's ATRAC is specially designed for MDAudio formatThe algorithm is later widely used in SONY's NetworkWalkman and other portable audio devices."ATRAC3plus" stands for "adaptive conversionvoiceCode 3+"is a set ofaudio frequencycompression technique , developed from the ATRAC3 format, this technology became increasingly perfect in 2002.This technology is to integrate MDWalkmanThe theoretical basis for reducing the volume of.[1]
To analyze ATRAC3/ATRAC3+, let's talk about its big brother - ATRAC algorithm.WhenDigital audioWhen data is compressed, a certain amount of quantization noise is usually introduced into the signal.To prevent these signals from being perceived by the human ear,audio frequencyThe coding decomposes the signal into a group of units, each of which corresponds to a specific time frequency range.The encoder will analyze according to the psychoacoustic principle mentioned above, and carry out high-precision coding for important units. For insensitive units, it can retain some quantitative noise without affecting the perception quality of the human ear.When decoding, the quantized spectrum will be re established according to the bit allocation, and then synthesizedsound signal。
ATRAC is no exception, but there are some improvements.ATRAC also applies subband decoding and conversion decoding technologies, and the input signals are distributed unevenly to emphasize the frequency division of important bass areas.In addition, ATRAC uses a variable block length to change the input signal, which can ensure efficient decoding when passing stably and will not affect the time resolution when passing instantaneously.Specifically, the input signal is divided into three frequency bands at 5.5125KHz and 11.025KHz.Subband decomposition uses QMF(QuadratureMirrorFiltersIntegral mapping filter);These three bands are MDCT(ModifiedDiscreteCosineTransformIndexing discrete cosine transform - similar to the usualfast Fourier transform , Advanced Mathematics II and Mathematical EquationsWhen converted into spectrum values, MDCT allows 50% overlap between blocks, which can improve frequency resolution while maintaining critical sampling.The length of the block can change according to the type of signal, which is the adaptive part of ATRAC (this practice is mainly used to mask the initial quantization noise).
When ATRAC algorithm has been developed for 10 years, it can not meet the market demand,SONYA new algorithm was launched in August 2002——
ATRAC3/ATRAC3+。Its core algorithm is not substantially changed compared with ATRAC, but it uses improved band separation filtering and MDCT, and uses gain adjustment, tone component separation, joint stereo and other technologies to makeaudio frequencyThe volume of compressed data is further reduced.
4、AAL(ATRACAdvancedLossless)
AAL is the abbreviation of ATRACAdvancedLossless (Adaptive Acoustic Conversion Advanced Lossless), which is a new development of SONYaudio frequencyCompressed formatIts characteristics arelossless compressionA CD can be compressed to 30% - 80% of the original without losing any audio information.
5、Ogg
Ogg's full name should be OGG Vobis (ogg Vorbis)audio frequencyCompressed format, similar to MP3 and other existingMusic format。However, it is completely free, open and free of patent restrictions.One outstanding feature of OGG Vobis is that it supports multi-channel. With its popularity, it will not be a dream to listen to multi-channel works encoded by DTS with a walkman in the future.
This is Vorbisaudio frequencyThe name of the compression mechanism, while Ogg is the name of a plan, which is intended to design a completely openmultimedia system 。
The extension of Ogg Vorbis file isOGG。The design format of this file is very advanced.The created OGG file can be played on any player, so thisfile formatIt can continuously improve the size and sound quality without affecting the old encoder or player.
Compared with aac, it has a slight advantage in low frequency, and is inferior to aac in high frequency.
The highest quality general comparison does not make any difference (to the naked eye).
The highest quality, namely Q10, is almost twice the size of the highest quality Q500 encoded by aac using faac.