1. Regulation of audio and video quality in digital broadcasting

1.1 Introduction

Despite the move to "light-touch regulation", I believe that some control over audio and video quality must be enforced by the new agency. This need has arisen because of the move to digital broadcasting.

In analogue broadcasting, the transmission channel sets a limit on the maximum possible audio and video quality. There is no commercial benefit to delivering less than the best possible quality. While many consumers will be unconcerned by technical quality, there are others who will turn off if the picture is noisy or the sound is distorted.

In digital broadcasting, the transmission channel sets a limit on the maximum amount of information that can be delivered. However, a single transmission channel (usually referred to as a "multiplex") carries many stations. The number of stations that can be carried is inversely proportional to the audio and video quality of each station.

It is a common misconception that "digital" equals "good quality". In truth, digital audio can be of any quality, from a GSM mobile phone to a Compact Disc. Likewise, digital video can range from the small, blocky stop-motion of a video streamed over the internet, to better TV pictures than have ever been seen in the UK. Having a digital service provides no guarantee of quality.

There is a strong commercial incentive for pushing the quality down as far as possible: the poorer the quality of each individual station, the greater the number of stations that can be broadcast on a single digital multiplex. In the digital era, licenses are awarded for each multiplex, not each station, so the commercial pressures on the successful licensee are very real.

Quality conscious consumers may be in the minority. In other areas of business, it often makes commercial sense for smaller (or even larger) corporations to target these consumers – they represent a niche market. However, in the field of digital broadcasting, the figures will always cause quantity to be more profitable than technical quality. Hence, without regulation, quality conscious consumers will be totally and permanently neglected.

In this submission, I discuss these issues in more detail. Audio is my own field of expertise, so I shall limit my discussion to digital radio broadcasts. However, please be aware that there is a very similar argument to be made for regulation to protect the image quality in digital television.

1.1.1 About the author

David Robinson is a researcher at the University of Essex, and has a doctorate in audio engineering. For the past four years he has studied the human perception of audio quality, and audio codecs such as MPEG layer 2. David strongly believes that existing radio stations must sound at least as good on the digital service as they do on FM.

1.2 Digital Audio Broadcasting

1.2.1 History

The introduction of Digital Audio Broadcasting (DAB) in the UK marked a unique opportunity to increase the audio quality of our radio stations. The existing FM system has been in use for over 40 years, and works very well in the absence of interference. However, with the widespread adoption of Compact Disc as the de facto consumer audio carrier, many listeners have grown to expect a higher standard of audio quality.

The DAB standard was pioneered by the BBC and other members of the EBU, and has been adopted by much of the world (America being the notable exception). From the outset, the BBC transmitted its existing radio stations over DAB, and is launching several new stations this year.

To make room for the new stations, the BBC has reduced the audio quality of the existing stations. This has angered some "early adopters" of DAB, who paid high prices for equipment. The initial BBC DAB output was of high quality, but the current broadcasts exhibit mediocre sound quality.

The BBC did consult with listeners about the launch of the new stations. However, nowhere were listeners polled about the reductions in quality that would be required to accommodate the new services. Unlike commercial broadcasters, the BBC is not bound by the radio authority, and can broadcast at any quality it desires.

In order to understand the audio quality considerations, an understanding of DAB is required.

1.2.2 DAB multiplex: slicing the Pie

Each DAB multiplex can be thought of as a pie, with each radio station being allocated a slice of that pie, as shown in Figure 1.1. The size of the slice limits the audio quality of the station.

In more technical terms, the multiplex transmits a stream of digital data, or "bits". Only so many bits can be transmitted per second. With the present error correction, a single digital multiplex can transmit about 1.2 million bits per second (1.2 Mbps). This is the size of the pie.

Each station is allocated so many of these bits per second. For example, Radio 2 is currently allocated 128 thousand bits per second. This is the size of Radio 2's slice. Technically, Radio 2 broadcasts at a bitrate of 128 kbps.

Figure 1.1: Allocation of capacity within the BBC national DAB multiplex

The audio information on a Compact Disc consists of 1.4 million bits per second – more than the entire digital multiplex! At first sight it appears impossible to transmit several near-CD quality radio stations using fewer bits per second than a CD. However, some of the information on a CD is redundant, and even more of it is inaudible to human ears. If the inaudible information is removed, and the remainder is expressed in a very compact manner, then the resulting data is only a fraction of the original size. A device which shrinks an audio signal in this way is called an audio codec.

1.2.3 DAB sound quality

The MPEG layer 2 audio codec is used in the DAB system. MPEG layer 2 is designed to work well at bitrates of 192 or 256 kbps in stereo, which involves shrinking the audio information to less than 20% of its original size. MPEG layer 2 does this very well, but at lower bitrates, the audio quality suffers significantly. However, it can be used in mono at 128 kbps without problems.

Audio codecs are strange things: they can sound perfectly fine with one audio signal (e.g. a solo flute), and terrible with another (e.g. a solo harpsichord), even if the same codec and bitrate are used for both signals. Hence, it is only possible to say how "good" MPEG layer 2 sounds at a given bitrate by testing many different audio extracts, and listening to the results. This kind of audio quality test has been done several times, e.g. [Soulodre et al, 1998].

The results fall into five categories, thus:

The difference between the original signal and the output of the audio codec is:

Imperceptible
Not annoying
Slightly annoying
Annoying
Very Annoying

Category 5, Very Annoying, is given to music heard via a mobile phone. In other words, it sounds dreadful – painful even! Category 1, Imperceptible, is only awarded if it is impossible to hear any difference between the original signal and the output of the audio codec.

Bitrate / kbps		Audible difference
256		Not annoying
192		Slightly annoying
128		Annoying

Table 1.1: average audio quality of MPEG layer 2 audio codec

Table 1.1 shows the average results for MPEG layer 2 on this scale. As noted, with certain signals a codec may sound much better or much worse than average. For those audio signals that cause significant problems for MPEG layer 2, it is graded Annoying at 192 kbps. Similar variations occur at other bitrates, which means that MPEG layer 2 at 128 kbps can sound nearly as bad as a mobile phone for certain audio signals!

Thankfully these problem signals are rare, but they're nothing special – they're just certain instruments or combinations of instruments which the codec "can't cope with". It would be wrong to assume that these problem signals are obscure curiosities that will never be broadcast: one track from a popular Fat Boy Slim album causes such obvious problems that it is regularly used to test audio codecs.

Using the data in [Soulodre et al, 1998], the audio quality of MPEG layer 2 is compared with conventional FM broadcasting in Table 1.2. The same data is shown graphically in Figure 1.2. The placing of conventional FM broadcasts on this scale is approximate: if the reception is plagued by interference then the results from FM may be much poorer than indicated; Conversely, if extremely high quality equipment is used then the results from FM may be better than indicated. The 256 kbps data is estimated.

MPEG layer 2 stereo bitrate (kbps)	Audio quality
MPEG layer 2 stereo bitrate (kbps)	Compared to FM on average	Compared to FM at worst/best
256	Much better	Very rarely worse than FM; usually much better
192	Better	Rarely worse than FM; sometimes much better
160	Similar	Sometimes worse; sometimes better
128	Worse	Sometimes much worse; sometimes better

Table 1.2: comparison of MPEG layer 2 with FM audio quality

Figure 1.2: comparison of MPEG layer 2 with FM audio quality
diamonds: average quality at specified bitrate; error bars: best and worst quality at specified bitrate

To summarise:

256 kbps = much better than FM
192 kbps = almost always better than FM
160 kbps = about the same as FM, but sometimes better, sometimes worse
128 kbps = usually worse than FM, sometime much worse

1.2.3.1 Critique of audio quality measurements

The above audio quality measurements are taken from a paper published in the Journal of the Audio Engineering Society. The tests included average audio material, and very critical audio material - i.e. audio extracts that cause problems for audio codecs. The tests have been peer reviewed and stand without criticism. However, there are factors which may cause the quality of actual broadcasts to be better or worse than indicated by these tests.

1.2.3.1.1 Better quality:

The design of the MPEG layer 2 codec allows for the encoder (the device at the broadcaster) to be improved without changing the decoder (the device at the listener). Thus, if a better encoder is developed, it is possible for the broadcasters to use it in transmissions, and listeners will benefit. Be warned: manufacturers' claims' that their encoder will achieve results at 128 kbps comparable to those achieved by some other encoder at 192 kbps are usually based on a comparison with the worst encoder they can find. The encoders used in the audio quality test above were already very good. It would be unwise to expect (or plan for) miracles from new encoders.

1.2.3.1.2 Poorer quality:

In the audio quality tests quoted above, signals were taken directly from CD. However, in a broadcast studio, the signal goes in and out of one or more mixing desks, and through one or more processors before transmission. The resulting signal can still sound excellent, but noise and distortion will have been added. The more noise or distortion that is added to a signal, the more difficult it will be to encode. It is not simply that a slightly poorer signal at the input to the codec will result in a slightly poorer signal at the output; rather, constant background noise or dynamic range compression (both of which are present on most stations) cause the codec serious problems. Typically, the performance drops by one whole grading point!

Secondly, many of the broadcast items have already been passed through an audio codec. Many radio items are broadcast from mini disc, or transferred to the station over ISDN lines; both involve the use of an audio codec. Passing an audio signal through one audio codec and then another is rather like copying video from VHS to VHS: the quality drops exponentially. The lower the bitrate, the more pronounced (and unpredictable) the effect.

1.2.3.1.3 Summary

In summary, the results obtained with MPEG layer 2 in the broadcast world are likely to be poorer than those obtained in the audio quality test discussed above. It is sensible to increase bitrates to compensate for this.

1.2.3.2 Audio Quality: the last word

I hope this evidence is convincing. Finally, let me emphasise that, apart from my experience in this field, and apart from all the test results I have quoted, there is a more basic reason for this submission: listening to the current BBC DAB transmissions is enough to convince me (and others) that they sound significantly worse than FM. Such statements always receive the following reply from someone…

1.2.3.2.1 "It sounds good enough to me!"

The problem with deciding what sounds "good enough" is that everyone hears differently, and the MPEG layer 2 audio codec is designed to "take advantage" of the imperfections of human hearing. In designing a practical system for widespread use, some compromise must be struck, since you can't please all of the people all of the time!

There is a real danger that this compromise is pushed too far because those with the most power also have the poorest hearing. This is a point that is rarely made because of the risk of causing offence. However, the fact is that a 55 year old will have about half the hearing of a teenager - even a teenager who visits noisy clubs every night!

The following graphs each show a spectrogram of an audio signal, with time along the horizontal axis, and frequency along the vertical axis. The bright vertical bars are beats in the music, and the horizontal lines are sustained notes. In each graph, I have blacked out the frequency region that will be inaudible for a listener of a particular age, as indicated to the right of the graph.

	Original signal, from CD
	15 years old The top couple of kHz may be audible to an 8 year old, but not a 15 year old.
	25 years old A few more kHz have been lost from the top-end.
	55 years old This can vary greatly with noise exposure.
	65 years old Again, this can vary greatly, but the deterioration is usually significant.

Figure 1.3: Loss of hearing in older listeners

For each listener, any coding problems that fall into the black region will be inaudible. Hence, if the audio codec was failing at 11 kHz (about half way down the graph), then the 65 year old would not be troubled by this, the 55 year old may just hear it, but the 15 year old would be annoyed by it.

In studying the human perception of audio quality, I've had to learn to trust other people's ears above my own sometimes. It's just a fact of life that, in some respects, other people will be able to hear better than you without even trying. That's not to say that a 65 year old could make no useful criticism of audio quality; they may be able to hear problems in the lower frequency range that other less attentive listeners would miss. However, the opinion of an attentive young listener must always be considered.

Whatever your age, the longer you listen to the output of an audio codec, the more apparent the problems will become. It is unwise to listen to 30 seconds of music, declare "that sounds OK", and commit UK radio stations to this audio quality for the next 20 years! This is why extensive tests have been carried out, and why I believe it is wrong for the BBC and the Radio Authority to ignore the results of these tests. The current bitrates are too low, resulting in poor audio quality. The regulator must act to reverse the downward trend in bitrates.

1.3 Regulation: Rules and Guidelines

1.3.1 Current Radio Authority Guidelines

The radio authority have set minimum bitrate guidelines, as follows [RA, 2001]:

4.9 The bit rate minima which apply to sound programme services are as follows:

	'full-rate coding'	'half-rate coding'
music services, stereo	128 kbits/sec	96 kbits/sec
music services, mono	64 kbits/sec	64 kbits/sec

speech services, stereo	128 kbits/sec	96 kbits/sec
speech services, mono	64 kbits/sec	48 kbits/sec

These figures may be subject to review from time to time, although any revisions would be to lower, not higher, figures.

Only full-rate coding is applicable to high-quality services. The minimum recommendations for stereo music services deliver poorer audio quality than conventional FM.

However, the recommendations do not suggest that services should be run at this bitrate; rather, that this is the absolute minimum permissible bitrate:

4.11 It should be stressed that the Authority's objective of securing "a wide range of programmes" (section 85(3)(a) of the 1990 Act) does not extend to an expectation or preference that services would normally be confined to the specified minimum bit-rates. A general range of expected values would be, for example:

speech programmes, mono:
- full-rate coding 64-96 kbits/sec
- half-rate coding 48-64 kbits/sec
music programmes, stereo: 128-256 kbits/sec

Unfortunately, the advice given in section 4.11 of the Radio Authority's guide to applicants for a local digital multiplex appears to have been ignored. The only message which has been heard by the broadcasters (including the BBC) is that 128 kbps is the minimum permitted bitrate. To demonstrate this fact, Figure 1.4 shows the number of radio stations in the UK which broadcast at each bitrate.

Figure 1.4: Number of stations broadcasting at each DAB bitrate, UK

It is obvious from this graph that the vast majority of stations have chosen to broadcast at the minimum bitrate of 128 kbps. In a regulatory environment where the only restraint on bitrate is the imposition of an absolute minimum, very few broadcasters will choose to provide better than minimum audio quality.

In contrast, broadcasters in other nations are using higher bitrates, as shown in Figure 1.5

Figure 1.5: Number of stations broadcasting at each DAB bitrate, World-wide (ex UK)

Source: [Taylor, 2002]

Recall the audio quality associated with each of these bitrates:

256 kbps = much better than FM
192 kbps = almost always better than FM
160 kbps = about the same as FM, but sometimes better, sometimes worse
128 kbps = usually worse than FM, sometime much worse

The vast majority of stations broadcasting on DAB in the UK sound worse than their FM counterparts. Only 1 station, BBC Radio 3, consistently sounds better on DAB than on FM. So much for progress!

1.3.2 Suggested revision of guidelines

1.3.2.1 Minimum Quality

It is a hard fact to swallow, but the Radio Authority's decision to set the minimum bitrate at 128 kbps was a mistake. It has meant that 128 kbps has become the "standard" for DAB. Rather than increasing audio quality, the move to digital radio has decreased it. The decision to set the minimum bitrate to 128 kbps may have more to do with generating maximum revenues for the commercial broadcasters than with maintaining standards.

With so many stations on air, it would be difficult to raise the minimum bitrate at this stage. However, it is essential that any improvements in audio quality facilitated by improved encoders or processing are passed on to the listener; These factors must not be used as an excuse for cutting bitrates still further! The minimum bitrate for stereo music stations must not be allowed to fall below 128 kbps at any point in the future.

When future multiplex licenses are granted, I believe the following stipulations should be applied. Either:

The minimum bitrate for stereo music stations should be 160 kbps; or
A range of bitrates must be employed, such that at least two stations within the multiplex are broadcasting at 192 kbps or above; and at least half of the music stations within the multiplex are broadcasting at 160 kbps or above.

The latter represents a more pragmatic approach, but the choice can be left to the licensee: either meet (a) or (b).

1.3.2.2 Maximum quality

It is desirable that at least one national station should broadcast at 256 kbps. This is the maximum bitrate (and hence quality) possible with the DAB system. In all areas of life and art it is useful to have a high quality marker. Whether this high quality marker should be provided by the BBC or the commercial sector is not for the regulator to say. However, it would benefit the largest group of people if the most popular station were to be broadcast at the highest quality. At the moment, this would be BBC Radio 2. I urge you to consider this rule: the most popular station should be required to broadcast at 256 kbps. Of course any other station is free to broadcast at 256 kbps should it wish to.

1.3.2.3 Exceptions

Finally, it should be noted that in certain specific circumstances, even the current guidelines are too strict. For example, a station may exclusively target listeners over the age of 65; no one over the age of 65 can hear high frequencies, so broadcasting these frequencies on this particular station would be wasteful. By pre-filtering the signal to exclude these frequencies, good results can be achieved at lower bitrates.

In a similar vein, a station which only broadcasts vintage musical material will never transmit in stereo. The current minimum bitrate for this station would be 64 kbps. However, if the material contains little or no high frequency information (which is typical of vintage recordings), then this bitrate is excessive.

1.3.2.4 Summary

It seems most sensible to take a pragmatic approach which requires licensee to tailor bitrates intelligently to each station. It should be a condition of awarding the license to use higher bitrates to yield improved quality where it is needed, and in return to allow lower bitrates where the program material will not be damaged by this practice. If the licensee does not wish to do this, then a higher minimum bitrate must be enforced.

1.4 Conclusion

On the positive side, "Light-touch" regulation may not reduce the programming quality or diversity of commercial radio broadcasters. There is room for wall-to-wall pop music as well as more inventive formats, all supported by the market place. The BBC look set to continue their tradition of excellent, diverse output. Heavy-handed regulation is not necessary.

In contrast, audio quality is under threat. In the digital world, quality and quantity are directly exchangeable. One 256 kbps station takes the space of two 128 kbps stations. The 256 kbps station may sound better, but the two 128 kbps stations will generate twice the advertising revenue, so the commercial choice is clear. If the nations' ears can be trained to except the dull mush of 128 kbps, then high quality audio broadcasting will be killed for a generation.

DAB promised to increase programme choice and sound quality. The regulator must have the power to ensure that both are delivered.

2. Regulation of Loudness and Dynamic Range in digital broadcasting

2.1 Introduction

A loudness war is being waged throughout the audio industry, in which everyone is trying to be louder than everyone else. Nowhere is this war more fiercely fought than in the area of commercial radio.

This war is being fought at the consumers' expense. Firstly, it reduces the audio quality of radio broadcasts, and will do so more dramatically on the new digital services. Secondly, it annoys and inconveniences consumers that some radio stations are much louder than others.

Fixing this problem is not difficult, and the solution does not have to upset the broadcasters. However, it does require central guidance on signal levels. This can be combined with essential limits that are required to prevent dangerous abuse of the digital services.

This problem has been growing for a number of years. The introduction of digital broadcasting is the right time to solve this problem. In a digital system, there is no disadvantage to restricting the loudness level. More importantly, the loudness wars can be fought more fiercely in the digital world, causing huge damage to audio quality, and inflicting physical damage on consumer audio equipment. Strict, enforceable technical guidelines are required to prevent this.

2.1.1 Why is there a loudness war?

Quite simply, when a listener is flicking through radio stations, a louder station is more likely to catch their attention. Also, it is well known that if two different presentations of the same audio signal are compared sequentially, then the louder one is subjectively preferred.

The advantage is transitory: once a listener has chosen a station, they adjust the volume control on their audio equipment to deliver a comfortable listening level. However, grabbing a listener's attention as they tune through the dial (or flick through pre-set stations) is thought to be vital to a commercial radio station's success.

2.1.2 Why is this a problem?

There is no simple measurable quantity that matches the human perception of "loudness". The perceived loudness of an audio signal depends on many factors. However, the main factor is the short-term average energy of the audio signal.

In a typical audio signal, the peak level is much greater than the average. The difference between the two depends on the signal. For example, if someone is playing a drum, then the peak signal at the moment the drum is hit is 10 to 20 times greater than the average. If someone is gently playing a note on a violin, then the peak is about two or three times greater than the average.

In a digital audio system there is a limit to the largest peak that can be accommodated, as shown in Figure 2.1. If both the drum and violin signals are stored digitally, such that the peaks of each signal are at the system maximum, then the average energy of the violin signal will be ten times greater than the average energy of the drum signal, and it will sound about ten times louder!

The sensible thing to do would be to reduce the level of the violin signal. However, in a loudness war, making something quieter is not an option. The level of the drum signal cannot be increased as it is. However, if the signal is distorted such that the average level is increased, while the peak level is held constant, then the signal will sound louder.

This process is called dynamic range compression, or just compression. There are many clever ways of compressing an audio signal, but the basic idea is to increase the average energy of the signal relative to the peak level, so that the signal sounds louder.

The (sometimes desirable) effect is that quieter parts of the signal are made louder. This means that compressing a real piece of music will remove some or all of the subtleties of loud and quiet that are usually found in musical composition and performance. At the extreme, a decaying note (e.g. a piano) no longer decays, but stays at a constant level until the compressor decides that enough is enough.

This is the crux of the problem: the music is often mutilated to make the audio signal as loud as possible. Sometimes the effect is subtle, but sometimes it destroys all musical enjoyment.

(a) original signal

(b) amplified signal

(d) clipped signal

Figure 2.1: Waveform of a short synthesised note, processed to illustrate various points discussed in the text.

In each graph, the upper and lower horizontal lines represent the digital maximum and minimum values. In waveform (b) the signal has been amplified such that the peak of the signal equals the digital maximum. In waveform (c), severe dynamic range compression has been applied. This increases the average signal, but distorts the decay envelope of the note. In waveform (d), the signal has been amplified such that the peak of the signal is greater than the digital maximum. The sections of the waveform that should have exceeded this value have been flattened or "clipped".

2.2 Dynamic Range Compression

2.2.1 Use and Abuse

There are other reasons for the use of dynamic range compression in audio broadcasting. These include:

To boost the loudness of the station in order to overcome a possibly noisy transmission channel
To raise the level of quieter sounds to make them audible in noisy environments (e.g. in car listening)
To prevent the signal from unexpectedly exceeding the maximum (peak) of the system unexpectedly
To compensate for incompetent presenters who cannot set the levels correctly and consistently themselves
To create a unique station "sound"

The amount of compression employed varies greatly between each of these tasks. For example, (c) requires the compressor to do nothing unless the signal would exceed the maximum allowed level, at which point the level is momentarily reduced. A device which performs this role is called a limiter. In contrast, if it is known that many listeners will experience noisy reception, then (a) requires that the signal should be a loud as possible to hide this noise, which requires a high level of compression.

In the author's view, some or all of these justifications for the use of compression are little more than excuses for entering into the loudness war. However, it is inappropriate to ban compression outright, because in some circumstances it may be useful, or essential.

The solution is to end the loudness war, whilst allowing each broadcaster to use compression however they wish.

This can be achieved by prescribing a perceived loudness level which each station must match. This is not to say that the station's output must be compressed to match this level at all times; rather, the full loud/soft dynamic range of music is allowed, but the average perceived level must match the prescribed loudness level.

If a station exceeds this loudness, then they are warned. If the situation continues, the station is fined. This is such a simple, almost trivial idea that it is surprising that it has not been implemented previously. Unfortunately, the loudness war has prevented the industry-wide agreement that would have allowed this to be achieved via a voluntary code of practice. The author hopes to demonstrate that central regulation of broadcast loudness would be beneficial; the only way in which this can be implemented is via a communications watchdog that has the power to enforce this regulation.

The effect of this regulation on the use of compression is shown in Table 2.1.

Reason for compression	Impact of proposed regulation
To make the station sound as loud as possible	Not allowed: station's perceived loudness must match agreed level
To boost the loudness of the station to overcome a possibly noisy transmission channel	Not allowed: digital transmission channels have 120 dB dynamic range. Any background noise is well below audible limits.
To prevent the signal exceeding the maximum (peak) of the system	No impact: Allowed and positively encouraged (as now)
To raise the level of quieter sounds to make them more audible in noisy environments (e.g. in car listening)	No impact: allowed
To compensate for incompetent presenters who cannot set the levels correctly and consistently themselves	No impact: allowed
To create a unique station "sound"	No impact: allowed

Table 2.1: Impact of perceived loudness regulation on use of compression

Note that this table does not represent a set of rules; rather, it outlines the impact of the single proposed rule that each station's output must match a single agreed perceived loudness.

If a station wishes to highly compress its output, this is still possible under this new rule. However, rather than compressing the output and broadcasting it as loudly as possible, the station will have to compress its output and broadcast it at the agreed perceived loudness.

2.2.2 Reducing the use of dynamic range compression

The author believes that this proposal will reduce the use of dynamic range compression in broadcasting. In the next section the advantages of less compression will be explained. First, the reasons why this proposal will lead to reduced use of compression will be outlined.

Recall from the introduction that if two different presentations of the same audio signal are compared sequentially, then the louder one is subjectively preferred. This means that if you audition a signal without, and then with compression, then the compressed version will sound better. This is not because it really sounds better, but just because it sounds louder!

If the compressed version is reduced in loudness so that it is no louder than the original signal, then the original will be preferred. In other words, at the same loudness, the compressed version sounds inferior. This is demonstrated and discussed at length in [Katz, 2000].

This is an important fact, because this is exactly what consumers do on a daily basis: if they change to a radio station that uses more compression, and hence is louder, then they simply turn down the radio! Thus, for the sake of grabbing their attention as they switch channels, the broadcaster forces the consumer to tolerate a poorer quality signal.

The relevance to the present proposal is this: The compressed station is no longer allowed to be louder – it must match the perceived loudness of all other stations. Hence, the more the station is compressed, the poorer it will sound in comparison to other stations. The one advantage of compression, namely increased loudness, has been removed; the main disadvantage of compression, namely reduced audio quality, has been laid bare.

This will not stop broadcasters using compression. Compression is still needed to raise quiet sounds over the background noise during in-car listening. Compression may still be used to create a unique station "sound", though this will be less desirable if it makes the station sound worse than other stations. Finally, compression may still be used if it is essential to bring the efforts of incompetent DJs into line.

However, the over-use of compression to compete in the loudness race, which so damages musical and audio quality, will be reduced, since it no longer brings any advantage to broadcaster or listener.

Finally, the author predicts that, with the loudness war over, all the other uses of (or excuses for) compression will suddenly seem much less important, and little or no compression will be used in many circumstances.

2.2.3 Advantages to using less dynamic range compression

The proposed regulation will reduce the use of dynamic range compression, which gives two main advantages:

The music will sound more like music, and less like a continuous background noise. The dynamics will be returned. Even without increasing the audio quality of the transmission channel, radio will sound more like CD!
The audio signal will encode with fewer problems using MPEG layer 2. This audio codec, used in DAB, performs very poorly when processing dynamically compressed audio – typically 1 grading point below the figures quoted in section 1.2.3. When dynamic range compression is removed, the performance of DAB improves significantly.

Without regulation, the present situation will deteriorate even further. The loudness war has been growing for years, but the arrival of commercial digital broadcasting marks the introduction of nuclear weaponry into this war! For example, it is possible to transmit an MPEG layer 2 bitstream which instructs the receiver to generate a sound louder than the digital maximum, without breaking any technical limits of the broadcast system. The result is a loud and hideously distorted signal that can damage loudspeakers. Obviously, this must be prevented.

2.2.4 Advantage of regulating perceived loudness

If all stations broadcast at the same perceived loudness, this does not mean that each listener must listen at the same loudness – each radio receiver still has a volume control!

The aim is that every station should sound about as loud as every other. When flicking through stations, the listener would no longer have to adjust their volume control to compensate for the loudness differences between stations. Rather, the listener can set the volume to a comfortable level, and then forget about it until their personal circumstances change.

This is more than a matter of convenience. The majority of radio listening is carried out in the car, and one advantage of this proposal is that it will reduce driver distraction.

Firstly, if the driver chooses to change channels, they often have to adjust the volume as well, to compensate for the difference in loudness between stations. This proposal removes this need, and reduces the amount of attention that must be given to the radio when changing channels.

Secondly, if the driver changes from a quiet station to a louder one, the sudden increase in volume can be very distracting. This proposal removes this distraction.

There are enough distractions on the roads without adding to them. This proposal removes one small annoyance, and may go a small way to improving road safety.

2.3 Regulation of perceived loudness

The complex nature of human hearing means that only a human listener can judge the perceived loudness of an audio signal. Any other indication of loudness is only an estimate, which may or may not match with human perception. Until we begin broadcasting radio for Martians or bats, it is human perception that must be taken into account.

To make the regulation work, three components are required:

A perceived loudness which, as a guideline, all stations must aim to match.
An average signal level which no station must exceed over a 24 hour period.
A consumer group to advise stations whether they are meeting (1)

The first is a guideline – it can be no more, because it is difficult to perfectly define perceived loudness in engineering terms, even though it is obvious to a human listener. The second is a limit which no station must exceed. This will prevent any station from completely ignoring (1), even if they want to. The third consists of interested volunteers who meet when required to listen to a cross-section of each station's output, and determine whether they are meeting (1). The group must have the power to say "Station X – you must reduce your audio signal level by Y to fall into line".

To help the broadcasters match the perceived loudness level, two more items can be made available:

A CD with a reference signal (or signals) to help the stations meet (1). Each broadcaster should compare their output with the signal on the CD, determine whether it is louder or quieter, and adjust as appropriate. This CD may also be used by (3) as a reference.
A measurement which, as accurately as possible, indicates whether a station's output is too loud, too quiet, to about right.

This sounds ambitious, but 1, 2, 4 and 5 are certainly deliverable. 3 depends on public interest, but an advert in the pages of certain hi-fi publications and psychology journals would probably find more than enough willing volunteers.

The exact details of 1-5 are unimportant at this stage, so long as general agreement is given that regulation of this area is in the public interest. However, to demonstrate that these requirements are deliverable, the author offers the following suggestions as a starting point.

2.3.1 Reference loudness

The perceived loudness of each radio station should be equal to the perceived loudness of a -20 dB RMS pink noise signal (relative to a 0 dB peak (full scale) sine wave). To judge the perceived loudness, an audio system calibrated to [SMPTE, 1999] should be used.

In simple English, this means that, as a pre-requisite, the volume control should be set using the guidelines in [SMPTE, 1999]. Then, each station should be compared with the reference signal (a dull sounding noise is suggested) to judge whether it is louder or quieter than the reference.

Given two signals (even two very different signals), most people can say which is the louder, and that is all that is required. For stations which broadcast a very wide dynamic range of content (e.g. Radio 3), the moderately loud sections should equal the loudness of the reference. Most other stations maintain an almost constant perceived loudness, so there can be little argument.

2.3.2 Maximum average level

As a backstop against abuse, a maximum daily average level must be enforced. This, on its own, would not ensure that all stations sounded equally loud, but it will prevent the worst abuses.

In section 2.3.1, the reference is defined in engineering terms, but a human listener carries out the comparison. This is as it should be, because humans are the ones who listen to radio broadcasts, and simple measurements do not match human perception. However, an absolute limit should also be enforced, which does not require human intervention. This should be that, within any single day, the RMS signal level, measured using a rectangular 50 ms non-overlapping window, averaged over the entire day, must not exceed –18 dB relative to a 0 dB peak (full scale) sine wave.

If only this limit is enforced (without a listening panel or a reference perceived loudness), this will solve most of the problems and end the loudness war. If the regulator is scared of enforcing rules that involve human hearing, then this rule can be considered in isolation.

2.3.3 Consumer Group

It is hoped that each broadcaster can be trusted to set their output to match the reference perceived loudness. However, consumers must have the right to complain and challenge any station that is flouting the rules. The group only needs to meet when such complaints are received.

2.3.4 Reference CD

Listening to noise signals for comparison is not enjoyable. To aid broadcasters and the consumer group, a CD should be produced which includes various audio signals, adjusted to match the perceived reference loudness. This CD should include speech, pop music, classical music etc. This CD can be produced by an experienced mastering engineer, or using the tools described in the next section.

2.3.5 Measurement of perceived loudness

The human ear is the final definitive arbiter of perceived loudness. However, a reliable measurement that matches human perception would be a useful substitute. Much research has been carried out into the way in which humans perceive loudness, and several computer models have been produced. The author has produced a simple model which could be applied to this task [Robinson, 2001] ; a more complex and accurate model has been proposed in [Moore et al, 1997], and a very simple calculation is currently implemented in a popular audio editor [Sonic Foundry, 2000]. Any or all of these, or any future models, could be used by broadcasters, regulators, or the consumer panel to simplify or automate the process of regulating perceived loudness.

2.4 Essential limit

Aside from the discussion of perceived loudness, these is another limit which must be imposed, otherwise potentially damaging audio signals can be broadcast using digital transmissions.

The MPEG layer 2 bitstream is capable of representing an audio signal which exceeds the maximum signal level allowed within a digital system. This apparent contradiction stems from the fact that audio codecs such as MPEG layer 2 do not reproduce the exact signal waveform, but instead aim to reproduce something which sounds similar. If the waveform is as loud as possible before encoding, such that the signal reaches the maximum permitted level, then, as the codec changes the shape of the waveform, the signal is almost certain to be pushed above the permitted maximum as some point.

In the receiver, the waveform is reconstructed from the instructions within the MPEG layer 2 bitstream. Where these instructions tell the receiver to generate a signal greater than the digital maximum, the output saturates (or "clips"), and distortion occurs. This is called clipping, because visually it looks as if the top has been clipped from the waveform, as shown in Figure 2.1 (d). Replaying clipped waveforms via loudspeakers can be damaging to the loudspeaker itself.

To prevent this, the following limit must be enforced: The broadcast bitstream must not cause a fully compliant 16-bit MPEG layer 2 decoder to produce a sample value equal to (or greater than) digital full scale.

2.5 Conclusion

As every radio station tries to be louder than every other station, consumers must adjust the volume each time they switch from one station to another. Though this is trivial, is it distracting for drivers, and unnecessary for every listener. More importantly, the dynamic range compression used to increase the loudness of many stations is detrimental to audio quality and musical enjoyment. Finally, in digital radio, dynamic range compression causes the DAB audio codec severe problems, resulting in further degradation to the audio signal.

In digital broadcasting, there are no disadvantages to centrally regulating the loudness of all radio stations, and such regulation will solve these problems, bringing several positive benefits. The regulation cannot be self imposed or voluntary; only a powerful body can bring an end to the loudness war. The new communications regulator must have this power.

3. Summary of proposed regulation

3.1 Quality and bitrate

All broadcasters (including the BBC) must be subject to regulation regarding the audio and video quality of their broadcasts.
Digital broadcasts must offer audio and video quality that is at least as good as the existing analogue broadcasts.
Either:
1. The minimum bitrate for DAB stereo music stations should be 160 kbps; or
2. Within each multiplex, a range of bitrates must be employed, such that at least two stations within the multiplex broadcast at 192 kbps or above. In addition, at least half of the music stations within the multiplex should broadcast at 160 kbps or above.
The most popular national station should broadcast at a bitrate of 256 kbps.

3.2 Maximum signal and Perceived loudness

The broadcast bitstream must not cause a fully compliant 16-bit MPEG layer 2 decoder to produce a sample value equal to (or greater than) digital full scale.
The perceived loudness of each radio station should be equal to the perceived loudness of a -20 dB RMS pink noise signal (where -20 dB RMS is relative to the RMS value of a digital full scale sine wave).
On each day, the RMS signal level, measured using rectangular 50 ms non-overlapping window functions, averaged over the entire 24 hours, must not exceed -18 dB (relative to the RMS value of a digital full scale sine wave).

4. References

Katz, B. (2000).

Integrated Approach to Metering, Monitoring, and Leveling Practices, Part 1: Two-Channel Metering

Journal of the Audio Engineering Society, vol. 48, no. 9, Sept., pp. 800-809. See also
http://www.digido.com/integrated.html

Moore, B. C. J.; Glasberg, B. R.; and Baer, T. (1997).

A Model for the Prediction of Thresholds, Loudness, and Partial Loudness.

Journal of the Acoustical Society of America, vol. 45, no. 4, April, pp. 224-240.

Radio Authority (2001).

Local Radio Multiplex Licences: Notes Of Guidance For Applicants.

http://www.radioauthority.org.uk/publications-archive/word-doc/regulation/codes_guidelines/dabnog0401.doc

Robinson, D. J. M. (2001).

Replay Gain – A Proposed Standard

Appendix K in Perceptual Model for Assessment of Coded Audio, Ph.D. thesis, Department of Electronic Systems Engineering, University of Essex. See also
http://www.replaygain.org/

SMPTE RP 200 (1999).

Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems.

Society of Motion Picture and Television Engineers, Recommended Practices document.

Sonic Foundry (2002).

Sound Forge 6.0 Professional Digital Audio Editor

http://www.sonicfoundry.com/

Soulodre, G. A.; Grusec, T.; Lavoie, M.; and Thibault, L. (1998).

Subjective Evaluation of State-of-the-Art Two-Channel Audio Codecs.

Journal of the Audio Engineering Society, vol. 46, no. 3, Mar., pp. 164-177.

Taylor, C. MIBS (2002).

DAB Ensembles Worldwide.

http://www.wohnort.demon.co.uk/DAB/index.html

Regulation in digital broadcasting

1. Audio and Video Quality

2. Loudness and Dynamic Range

Dr David J M Robinson

Department of Electronic Systems Engineering

University of Essex

Contents