How do you measure speech intelligibility?

Starting Point

If speech is to be transmitted via communication devices or public address systems, the focus is primarily on optimum intelligibility of the reproduced signal. Everyone is familiar with the situation of not being able to understand the conversation partner well, e.g. at door intercoms or announcements in public transport. A large number of influencing variables can play a role here, transforming a clear speech signal into incomprehensible mess.

The development and installation of such a system therefore raises a number of questions:

  • Which parameters influence speech intelligibility?
  • How can the intelligibility of a speech signal be measured?
  • How can the speech intelligibility of a transmission system be optimized?

Influencing Variables

From their own experience, most people already know the most important parameters that lead to a deterioration in speech intelligibility.

  • Too low, fluctuating or too high signal level
  • Loud background noises
  • Reverberation
  • Distorted signal
  • Limited frequency spectrum
  • Masking effects

Since the subjective perception of the listener often plays a decisive role in acoustics, it is important to be able to apply objective and reproducible evaluation criteria. When considering speech intelligibility, this is possible by measuring the Speech Transmission Index (STI), which is described in DIN EN 60268-16. By means of the STI, the speech transmission quality of a transmission channel can be determined and a prediction for speech intelligibility can be derived.

Description of the Measurement Method

The measurement of the Speech Transmission Index is based on the empirical investigation that speech intelligibility is mainly determined by the level of intensity fluctuations (modulation) of speech signals. In real speech, these fluctuations result from the acoustic separation of sentences, words and phonemes. The stronger the measured modulation, the easier it is to understand the speech signal.

The following graphic shows a simplified STI measurement setup. On the left you can see the measurement of the microphone signal path and on the right the measurement of the loudspeaker.

The STI can be measured directly or indirectly, whereby the methods differ in the type of test signal and its applicability. The direct measurement uses a speech-like noise that is modulated in 7 octave frequency bands from 0.125 - 8 kHz with 14 modulation frequencies in the range 0.63 - 12.5 Hz each. This results in 98 measured values from which the STI can be calculated directly. With the indirect method, the impulse response of the transmission system is measured using a suitable test signal and the STI is derived from this using mathematical methods. In both variants, the result is a numeric value between 0 and 1, which provides direct information about the quality of speech intelligibility.

In addition, the Speech Transmission Index for Public Address systems (STIPA) was introduced as a shortened procedure, which is calculated from only 14 instead of 98 values of a direct measurement. Here, the focus is on shortening the measurement duration, but restrictions such as the occurrence of non-linear distortions and the estimation of female speakers must be taken into account. The Room Acoustical Speech Transmission Index (RASTI) still exists, but it is obsolete and should no longer be used.

Comparison of Indices

Oktavo bands 7 7 2
Modulation frequencies 14 2 4/5
Combinations 98 14 9
Advantages high accuracy high measuring speed high measuring speed
Disadvantages slightly increased measuring time error-prone in case of impulse noises obsolete measurement method
    sensitive to non-linear distortions error-prone in case of impulse noises
      error-prone in case of disturbing noises containing tones
      error-prone with compressed signals

As with any qualified measurement, it is important to work with calibrated and sufficiently accurate measuring equipment. Any influences that could have an influence on the measurement result have to be minimized by the selection of the measuring equipment. Low-distortion measuring microphones and loudspeakers with an ideally flat frequency response are a prerequisite for reliable results. The measurement itself is then carried out either computer-based or using an appropriate analyzer.

Experience and optimization

When optimizing communication systems, speech intelligibility is highly dependent on the spatial conditions and the position of the listener. The positioning and orientation of loudspeakers or the targeted use of absorbers and diffusers to improve room acoustics are usually the most important considerations. However, even under controlled acoustic conditions in a low-reflection measuring room, the measurement of the STI can provide useful, device-specific insights. Especially in voice communication systems that are not designed for the sound reinforcement of large rooms, a large number of aspects often play a decisive role. For example, design specifications combined with a wide range of functions and a given economic framework often lead to products whose frequency response and distortion behavior are anything but ideal. By measuring speech intelligibility even with unfavourable values of the standard parameters, we were able to prove that speech intelligibility was surprisingly good, contrary to expectations. The behaviour at different signal-to-noise ratios can also be determined by replaying typical background noises for the intended purpose. But it is also possible to identify weak points or series scattering in combination with other measurements. For example, during a manual assembly process of an intercom, sporadic damage to a loudspeaker could be detected which only became noticeable negatively at high playback levels. Furthermore, it is possible to directly reproduce the effects of unfavorably parameterized digital signal processing steps or changes to the device design, material or components and to optimize speech intelligibility through targeted parameterization, variation of individual components or constructive measures.

Add a comment

What is the sum of 7 and 5?

Go back

More blogposts

Acoustic Echo Cancellation (AEC)

You make a phone call or a conversation via intercom - and you hear yourself with a slight delay from the loudspeaker. The echo makes communication increasingly difficult or the conversation sometimes gets stuck. How can this problem be solved?

Voice Control for a Magic Mirror

For our company´s day of a "open house" we presented something very special to our visitors: a "Magic Mirror" with a voice control from Amazon Alexa and a Raspberry Pi. Read more about it in our construction manual.

How do you measure speech intelligibility?

Everyone knows the situation - at door intercoms or in public transport announcements, speakers are sometimes difficult to understand. Several factors can be responsible for this.
What parameters are these? How can speech intelligibility be evaluated using measurement technology and how can it be optimised?


+49 351 40752650
Send email