Research on Acoustic Masking Protection Technology

March 25, 2024 Source: National Confidential Technology Evaluation Center [Font: large in Small 】 Print

Abstract Sound, as the first carrier of information existence, transmission and expression, exists in many occasions and carries rich information. As an active acoustic interference protection technology, acoustic masking technology has a wide range of applications in protecting sound privacy and preventing sound leakage. This paper introduces the basic principle of acoustic masking, the classification of masking effects, the classification of masking signals, and the optimization design of acoustic masking systems.

Keywords Acoustic Masking Active Acoustic Interference Acoustic Leakage Protection

1 Introduction

Information is the key factor for the survival and development of a country or enterprise, which usually exists, transmits and expresses in the form of sound, light, electromagnetic and other signals. As the first carrier of information existence, transmission and expression, sound is a tool for social communication and ideological exchange. It exists in many occasions and carries rich information. Therefore, how to effectively protect the sensitive information and prevent the loss of confidentiality with voice as the information carrier has become an urgent problem.

As an active acoustic interference protection technology, sound masking technology is one of the necessary measures to reduce speech intelligibility (the degree to which speech information is correctly understood) and protect speech privacy. It mainly uses the acoustic masking effect to mask the target speech signal by transmitting interference signals in the audible frequency band of the human ear, resulting in the target speech not being heard or its intelligibility being reduced. This paper will introduce the basic principle of acoustic masking, the classification of masking effects, the classification of masking signals, and the optimization design of acoustic masking systems.

2 Basic principle of acoustic masking

Sound masking technology is an important measure to protect sound privacy and prevent sound leakage. As shown in Figure 1, acoustic masking improves the ambient background sound of the target area by filling the target area with the masking acoustic signal, and uses the acoustic masking effect to achieve the purpose of masking the target speech.

Figure 1 Schematic Diagram of Acoustic Masking System

Sound masking effect is an acoustic effect related to human hearing. The human ear can distinguish slight sounds in a quiet environment, but in a noisy environment, slight sounds will be submerged by noise. This phenomenon that the existence of noisy sounds increases the hearing threshold of slight sounds is called sound masking effect. As shown in Figure 2, the black solid line in the figure is the hearing threshold of the human ear under normal conditions, and the voice below this threshold is inaudible to the human ear; In the figure, the blue masked frequency sample (the target speech signal that needs to be masked) is audible to the human ear because its sound pressure level is higher than the hearing threshold of the human ear. However, when the sound masking system emits the masking sound shown by the red line in the figure, the existence of the masking sound increases the hearing threshold of the human ear from the black solid line to the black dotted line. At this time, the sound pressure level of the masked signal represented by the blue line is lower than the new hearing threshold of the human ear, and can no longer be heard by the human ear. The difference between the black dotted line and the black solid line is called the acoustic masking amount. Similar to the human ear, the quality of the target voice signal obtained by the eavesdropping device will also be interfered by the presence of masking sound.

Figure 2 Schematic Diagram of Acoustic Masking

It can also be seen from Figure 2 that the masking signal will not only improve the hearing threshold of the human ear in the frequency band where the masking acoustic signal is located, but also improve the hearing threshold of its adjacent frequency bands, but its masking ability (acoustic masking amount) will decrease with the increase of the difference between the target signal frequency and the masking signal frequency. The energy of human speech signal is mainly concentrated in the range of 100Hz~8kHz. Therefore, in order to obtain the optimal masking capability, the masking interference frequency range emitted by general acoustic masking systems is also between 100Hz~8kHz.

In addition to the above features, the sound masking effect also has the following features: the bass is easy to mask the treble, while the treble is difficult to mask the bass; When the sound pressure level of the masked sound is increased, the amount of sound masking will be increased, and the frequency range of the masked sound will also be expanded.

3 Classification of acoustic masking effects

Considering the temporal relationship between the masked signal and the masked signal, the masking effect can be divided into simultaneous masking and sequential masking. When the masked signal and the masked signal appear at the same time, it is called simultaneous masking; When the masking signal appears ahead of or behind the masked signal, that is, the two kinds of sound signals do not appear at the same time, it is called sequential masking. Most of the time, the sound signal is an unsteady instantaneous signal, and the sound pressure level changes rapidly with time, that is, the strong sound is followed by a weak sound, and the weak sound may be followed by a strong sound. Because hearing has a memory function, stronger sounds tend to mask the subsequent weaker sounds. In addition, auditory perception of sound needs a process of establishment, which will have a certain delay. However, the establishment of auditory perception of strong sound is faster than that of weak sound, so the weak sound that arrives first will be masked by the strong sound that arrives later. Sequential masking can also be divided into forward masking and backward masking according to the sequence of masking signals and masked sound signals. Generally speaking, the masking effect is the strongest and the amount of masking is the largest; The forward masking effect is greater than the backward masking effect, and the time of the forward masking effect is much longer than the time of the backward masking.

According to the role of masking sound in the auditory system, acoustic masking can also be divided into energy masking and information masking. Energy masking mainly refers to the interference of masking signal to target speech in the periphery of hearing, while information masking mainly refers to the interference of masking signal to target speech in the auditory center. The traditional noise jamming masking target speech is energy masking, and its masking performance is directly related to the degree of spectral overlap between the masked signal and the masked signal. Information masking refers to the competition for human brain speech perceptual processing resources due to the similarity between masked sounds and masked sounds, or the use of resources for processing masked sounds at the cognitive level due to the intelligibility of masked sounds. The classification of acoustic masking effects is shown in Table 1.

Table 1 Classification of acoustic masking effects

4 Masking signal classification

The core idea of sound masking technology is to suppress sound with sound. It adds a balanced and comfortable masking signal to the space, reduces the speech intelligibility of the potential leak location and improves the privacy and confidentiality of the speaker's voice on the premise of maintaining the speaker's sensory comfort. According to the types of masking source signals, acoustic masking can be divided into three categories: background music masking, background noise masking and coherent target speech masking.

Background music masking is to deploy music as a masking sound source in the sound leakage, eavesdropping and transmission channels. However, this masking will cause new sound interference, because music is a meaningful sound. If it is placed near the normal voice source as a masking sound, it is very easy to attract the attention of speakers and affect the normal working communication between speakers.

Background noise masking is to use white noise, pink noise, simulated air conditioning/fresh air system noise and other types of noise, as well as their mixed noise signals, as masking sounds. Different background noises have different masking capabilities. The problem of background noise masking lies in its low efficiency in masking voice. It requires that the sound pressure level of the masking signal must be higher than the sound pressure level of the target voice to achieve the masking effect at the position where the voice may leak or eavesdrop. However, excessive noise energy will affect the speaker.

The coherent target speech method uses speech signals, even speech signals that are coherent with the target speech, as masking sounds. It is mainly an information masking, and its masking effect and efficiency are related to the similarity between the masking signal and the target speech. The research shows that the high similarity between the masking signal and the target voice can improve the masking effect, resulting in more information masking. It is difficult for the tested to distinguish the voice signals from the same gender, but it is easier to distinguish the voice signals from different genders, Moreover, other speech signals of the target speaker (such as the inverted signal formed by time domain inversion of the target speaker's speech signal or the speech spectrum speech formed by frame by frame processing, modulated speech spectrum speech and speech segment inversion speech) can be used as the masking signal to effectively mask the target speech, and its masking effect is better than that of other speakers' speech signals. In a word, the masking signal based on coherent target speech has a broad application prospect for reducing speech intelligibility and protecting speech privacy, which is worth further studying.

Figure 3 Target recorded by eavesdropping equipment when the acoustic masking system is not working

Voice signal waveform (above) and spectrogram (below)

Figure 4 White noise signal waveform transmitted by the acoustic masking system (above figure)

And spectrogram (below)

Figure 5 Target voice recorded by eavesdropping equipment after interference of sound masking system

Signal waveform (above) and spectrogram (below)

Figure 3 - Figure 5 shows the waveform and spectrogram of the recorded signal of the eavesdropping equipment when the acoustic masking system is not working or working in the background noise masking. Figure 3 shows the target voice signal recorded by the eavesdropping device when the sound masking system is not working. It can be seen that the main energy of the voice signal is below 8kHz, and there are obvious pitch periods and formants in the voice signal. Figure 4 shows the waveform and spectrogram of interference white noise signal emitted by the acoustic masking system. Figure 5 shows the target voice signal waveform and spectrogram recorded by the eavesdropping device when the sound masking system is working. It can be seen from the comparison between Figure 5 and Figure 3 that the white noise emitted by the acoustic masking system improves the background noise level of the environment and reduces the signal-to-noise ratio of the recorded signals of the eavesdropping equipment. In Figure 3, the pitch frequency and formant of some voices have been masked by the masking noise, and the existence of the masking sound has reduced the intelligibility of the target voice signal to a certain extent. However, the masking effect of pure white noise masking signal is poor. Some speech pitch frequencies and formants in the blue dashed box in Figure 5 still exist, and some target speech information can still be recovered through these pitch frequencies and formants. In order to completely mask the target speech signal, it is necessary to further increase the transmission power of the masking sound, but strong masking noise interference will seriously reduce the speaker's comfort. Therefore, the masking signal needs to take comfort and masking into consideration. Under the same masking sound pressure level, the sound masking signal should be reasonably selected and designed to reduce the intelligibility of the target voice signal, improve the confidentiality of the target voice, and achieve higher efficiency and performance of sound masking.

5 Optimization Design of Acoustic Masking System

Based on the principle of acoustic masking and the classification of masking signals, at present, the research of acoustic masking mainly focuses on two aspects, one is the selection and generation of masking sources or masking signals, the other is the combination optimization of masking systems. The best masking effect can be achieved only by selecting the appropriate masking sound source at the location and path where there may be sound leakage or eavesdropping, and optimizing the layout of the masking sound source.

In view of the fact that most of the current eavesdropping devices have the function of data storage, eavesdroppers can conduct in-depth analysis and processing of the eavesdropping data, and use the difference between the masking signal and the target voice signal to separate or restore the target voice signal from the eavesdropping data containing the masking sound signal through signal processing means; It is even possible to place a special eavesdropping device beside the sound generator of the masking sound system to steal the masking sound signal, take the stolen masking sound signal as the reference signal, and use signal processing methods such as adaptive cancellation to carry out adaptive cancellation processing on the masking sound interference signal in the stolen signal, thus restoring the target voice. In order to ensure the masking quality of the masking sound system, the masking sound signal emitted by the masking sound source needs to be resistant to cancellation. The masking sound cannot be composed of a single simple signal, but needs to be composed of music sound, various noises or human speech sounds and their combined interference signals.

In addition, the sound masking system often includes one or more sound sources, so it is necessary to design the placement position of the sound source in advance to make the masking sound field evenly distributed, that is, the sound pressure level of the masking signal in the masking area should be as consistent as possible, without the large difference in the intensity of the masking signal in the same area. Moreover, the acoustic masking control system can also calculate and dynamically adjust the emission amplitude and phase of the masking acoustic signal of each sound source in real time according to the speaker's position and the area to be masked. Through the phase control and amplitude control technology, interference can be generated in a specific direction or area, and the operating distance can be improved to achieve the optimal masking effect.

6 Conclusion

The acoustic masking technology mainly uses the acoustic masking effect to achieve the purpose of masking the target speech by transmitting the masking sound in the audible frequency band of the human ear. This paper introduces the basic principle of acoustic masking, the classification of masking effects, the classification of masking signals, and the optimization design of acoustic masking systems. Increasing the masking source level can achieve a strong masking effect, but because the masking signal is audible to the human ear, strong masking will also reduce the speaker's comfort, thereby affecting normal voice communication. In practical applications, the main way to improve the sound masking effect is to further design masking signals with higher masking performance and more comfortable for the human ear, and optimize the combination control of the sound pressure of the sound masking system under the masking source level that the speaker can bear. At the same time, there are many ways of voice leakage, and the installation location of eavesdropping equipment is unknown, so it is impossible to fully realize the security protection of voice information by relying on a single technical means. Therefore, it is necessary to adopt a variety of sound leakage protection technologies, make full use of the advantages of various protection technologies, and avoid their disadvantages to achieve the best comprehensive protection effect.

(Originally published in the January 2023 issue of Confidential Science and Technology)