Surely you know the situation when you are making a phone call or talking over an intercom and you hear yourself from the loudspeaker with a slight delay. The echo leads to strenuous communication and faltering conversations.
This is mainly caused by crosstalk from the loudspeaker to the microphone on the other end of the line and is particularly strong when hands-free kits are used. To suppress this echo an acoustic echo cancellation (AEC) should be integrated into hands-free systems in particular.
To evaluate the quality of our AEC in direct comparison to a competitor, we have carried out some comparative measurements in our acoustics laboratory. We used the vicCOM-complete 2 system kit and a competitor's product with a similar performance range from the same price segment.
The basic test setup was as follows:
For a first basic comparison of the hardware, the frequency responses of both were measured. Here the first differences already became apparent.
To measure the microphone path, the electrical measurement signal was fed into the Mic-In and measured at the Line-Out.
As can be seen very clearly, the microphone level of the reference product (red) strongly drops towards low frequencies. This results in a very thin, nasal sound, which is very tiring for the listener, especially at high levels. In addition, speech intelligibility suffers because the basic tones of the speech signal are very strongly attenuated.
The vicCOM-complete 2 (blue) also shows a drop in frequency response in the low-frequency range, although this is relatively small. This attenuation of the lower frequency range is caused by a high-pass filter that is used in audio systems to eliminate DC components of the signal. These would otherwise interfere with the digitization of the signal.
In the upper frequency range, no significant differences can be found. Since both systems work with a sampling rate of 16 kHz, the upper frequency limit is just under 8 kHz. Both platforms are therefore suitable for HD telephony.
This time the measurement was from Line-In to Spk-Out.
The difference in frequency response at the loudspeaker output is less dramatic than in the microphone path, but is still clearly visible and audible. Here, too, the drop in the measurement curve is significantly greater with our competitor's system (red) than with vicCOM-complete 2 (blue). This leads to a weaker bass reproduction with the effects on sound perception and speech intelligibility already mentioned above.
Now let's get to the comparison of the AEC algorithms.
Since this topic is difficult to describe verbally, we will concentrate on the comparison of audio examples. This was recorded for both platforms under identical acoustic conditions. In order to be able to reproduce the respective effects, we first show the signal arriving at the near end without signal processing and then with activated algorithm. The differences between the two systems can be assessed best via headphones.
First of all, we dedicate ourselves to echo suppression in "single talk". This means that only the distant speaker speaks and the AEC has to suppress the echo at the near end.
For this example, we deliberately used an artificial test signal instead of a real one, since the sound colouration of the systems can be easily understood by the frequency response of the loudspeaker. It is easy to hear that the signal at the far end of the comparison platform sounds midrange-heavy and unbalanced.
It is correct if you do not hear anything. At least almost nothing, because this is exactly what acoustic echo cancellation is all about. In both systems, the signal of the distant speaker is reduced to the level of the noise floor. Our competitor's platform performs a little bit better, but this very low residual echo is not critical. Keep in mind that the far-end speaker hears an echo of his own words while he is speaking. This extremely quiet residual echo is therefore additionally masked by the speaker himself, so that it is safely below the perception threshold.
Double talk situations are those, in which both speakers speak at the same time. This is where the quality of an AEC becomes apparent, as the algorithm ideally completely suppresses the signal of the far-end speaker, but at the same time passes on the signal of the near-end speaker unchanged.
In the following example, the female voice represents the distant speaker to be suppressed. The male speaker at the near end should be transmitted to the far end with as little change as possible.
The large difference in level results from the feedback of the microphone unit. Both speakers have been calibrated to a level of 60 dB(A) at a distance of 25 cm. However, since the microphone and loudspeaker of an intercom are typically much closer together, the signal level to be suppressed is usually much higher than that of the speakers signal which should be transferred. In addition to the superimposition of the signals, this poses a further significant challenge for signal processing.
The following recordings reflect the image at the far end of the transmission.
Again, the different sound colourings of the platforms are audible. Already in the unprocessed signal the problem of the frequency response becomes clear. In the comparison product the male, near speaker is difficult to understand due to the missing low-end. With the vicCOM-complete 2 it is a little easier to follow the conversation.
When signal processing is activated, the differences between the two platforms can be heard clearly. In the first part (from start to second 4), which represents a single talk situation, the remote speaker is almost completely suppressed.
In the following part (from 4 to 12 seconds), where both speakers overlap, the difference in quality of the AEC becomes very clear. In the comparison product, clear dropouts, sporadically interchanging echo artifacts of the distant speaker and a strong modulation of the near speaker can be heard.
With vicCOM-complete 2, however, the near speaker is heard without dropouts. The sound is balanced and only very slightly modulated. During pauses in speech, the echo of the distant speaker can be heard very quietly, which corresponds to the level of the first part. However, there are no disturbing echo artifacts.
In the last part of the recordings (from 12 seconds to the end), only the speaker from the near end can be heard. This part of the conversation should be transmitted in optimal quality, as this is the actual signal to be transferred. On the comparison platform, a significant improvement in the signal is noticeable, but the mid-range, nasal sound is retained, which is mainly due to the frequency response of the microphone path.
With the vicCOM-complete 2, a slight, but not fundamental change in the sound is noticeable. Due to the complexity of the problem of separating two superimposed, unevenly loud signals from each other, the sound of the nearby speaker is slightly affected by the AEC. In the last part, the full signal quality is then revealed with a very balanced sound image.
When comparing the vicCOM-complete 2 with a competing product in the same price segment, our hardware platform clearly prevailed in almost all scenarios compared. The slight drop in the microphone and loudspeaker frequency response results in a balanced sound image with good speech intelligibility at the near and far ends. The Acoustic Echo Cancellation of both platforms delivers very good results in single talk situations. In double talk, the vicCOM-complete 2 with vicHSES (Hands-free and Speech Enhancement Suite) is clearly in advance and features high sound quality and very good signal separation.
You make a phone call or a conversation via intercom - and you hear yourself with a slight delay from the loudspeaker. The echo makes communication increasingly difficult or the conversation sometimes gets stuck. How can this problem be solved?
For our company´s day of a "open house" we presented something very special to our visitors: a "Magic Mirror" with a voice control from Amazon Alexa and a Raspberry Pi. Read more about it in our construction manual.
Everyone knows the situation - at door intercoms or in public transport announcements, speakers are sometimes difficult to understand. Several factors can be responsible for this.
What parameters are these? How can speech intelligibility be evaluated using measurement technology and how can it be optimised?
+49 351 40752650