May 1995: Volume 3, Number 9

Subjective and Objective Evaluation of Sound Quality in Low Bit Rate Audio Coding Systems

Presented by Louis Thibault and Ted Grusec

Communications Research Centre

Ottawa, Ontario

Date: Tuesday, May 23, 1995
Time: 7:30 pm
Place: Harris Institute for the Arts
Address: 118 Sherbourne Street, Toronto, Canada

This Month's Meeting Preview

The Topic:

Louis Thibault and Ted Grusec will give an overview of the research activities at the Communications Research Centre in the field of digital audio. The presentation will focus on low bit rate audio coding technologies as well as on novel subjective and objective methods developed at the CRC to reliably assess the subjective quality of audio codecs or any high quality audio devices.

The use of low bit rate audio coding is proliferating in several areas. In the audio industry, consumer products such as the Philips Digital Compact Cassette (DCC) and the Sony MiniDisc are based on low bit rate audio coding. On the recording and production side, several digital audio workstations (DAW) also rely on audio compression to make efficient use of storage media. In broadcasting, low bit rate audio coding will form the backbone of the upcoming Digital Radio Broadcasting (DRB) service and will be used in the multichannel sound system of Advanced Television (ATV). Low bit rate audio codecs are already in widespread use in point-to-point telecommunication networks (telephone, ISDN, STL, etc...). The recent adoption of the ISO/MPEG audio coding standard will fuel the development of many more applications in the near future.

State-of-the-art audio compression technologies are often claimed to provide transparent (i.e. "CD") quality with compression ratios as high as 7:1 as compared to the compact disc data rate. To achieve such high compression, these perceptual audio codecs, as they are often labelled, exploit a number of characteristics of the human auditory system. The shape of the audio waveform is different from the original after lossy compression. However, the goal is for the encoded-decoded signal to be subjectively similar to the original version. For perceptual audio codecs, traditional objective measures of sound quality, such as the Signal-to-Noise Ratio (SNR), are totally unreliable. So far, formal subjective tests have been the only reliable way to assess the sound quality of perceptual audio codecs.

The assessment of sound quality of digital audio systems has been one of the research areas at the CRC since 1990. On the subjective measurement side, sensitive methodologies were developed for the assessment of high quality audio coding systems. Unique facilities, including a reference listening room meeting ITU-R specifications and a custom computer-based playback system, with several unique features, were developed. Both these methodologies and facilities were used in listening tests conducted at the CRC in 1992 and 1993. These tests were done for the ITU-R as part of the international standardization of low bit rate audio coding techniques for broadcast applications. Since May 1994, the CRC has been hosting the listening tests for the subjective evaluation of DRB systems submitted to the EIA/NRSC DAR standardization process in the U.S.A.

On the objective measurement side, research has been carried out in psychoacoustics towards the development of models of the human auditory system. These models are being developed for two purposes: first, as advanced psychoacoustical models to be incorporated in low bit rate audio coding algorithms to improve sound quality and, second, as stand alone models to measure objectively the subjective quality of sound signals after these have been processed through audio codecs or other audio devices.

The presentation will start with an overview of the basic principles of low bit rate audio coding and show why conventional objective measurements fail to provide accurate sound quality measurement for such devices. The basic characteristics of the human ear will be described along with the model of the auditory system being developed at the CRC. How the output of such an auditory model can be used to measure audio signal degradations, and hence sound quality, will be explained. Some results, showing the correlation between objective perceptual measures and Mean Opinion Scores obtained from human listeners in formal subjective tests, will be presented.

The second part of the presentation will deal with the subjective assessment of sound quality. The particular problems encountered when assessing audio systems with small impairments will be identified. The presentation will outline the facilities and methodologies used in the CRC's subjective tests. The characteristics of the CRC listening room will be presented. The computer-based playback system, which permits seamless switching among auditory stimuli to be compared and rated by listeners, will be described. On the behavioral side, key events and elements necessary for sensitive tests will be detailed. These include: (1) the choice of critical audio materials for the comparative evaluation of different systems; (2) listener selection and evaluation; (3) listener training; (4) characteristics of the quality grading scale; (5) blind rating methods; and, (6) data analysis and the arrival at appropriate conclusions. Just as links in a chain, each of these events and elements is crucially important to valid, final evaluations of the devices under test. Key results from the ITU-R tests will be presented in order to characterize the quality provided by low bit-rate audio codecs when these are operated in different configurations (i.e., at different bit-rates, with various numbers of coding stages in tandem).

While the context is two-channel audio coding systems, the presentation will highlight the fact that both the objective and subjective methods described are applicable to the subjective evaluation of any audio device that process a reference audio signal present at its input and produces an output signal which may contain some unwanted degradations. Examples of such devices or processes are transmission channels, buried-data channels in compact discs, A/D and D/A converters in tandem, etc.

The Presenters:

Louis Thibault received a bachelor degree (1976) and master's degree (1979) in electrical engineering from University of Sherbrooke, Sherbrooke, Canada. After graduation, he worked for 2 years at this institution as a research engineer in speech coding and for 2 years as a consultant in applying microprocessors and microelectronics technologies in the manufacturing industry. He then joined the Department of Communications of the Canadian government in 1983 where he held positions in both the Cable TV Planning and in the International Broadcast Planning groups. In 1988, he moved to the Communications Research Centre where he has been working in the field of digital radio broadcasting (DRB).

His research areas include audio source coding, psychoacoustics and subjective evaluation of audio systems as well as channel coding and modulation techniques for digital radio broadcasting. He is actively involved in a number of standards committees related to DRB in Canada, the U.S.A. and internationally.

He is currently manager of the Signal Processing and Psychoacoustics group at the CRC and was the international vice-chairman of ITU-R Task Group 10/2 on Low Bit-Rate Digital Audio Coding. He has authored and presented a number of papers at conferences dealing with the subjective evaluation of audio codecs and off-air digital transmission techniques for DRB. He is a member of the Audio Engineering Society.

Ted Grusec completed his B.A. in the honors psychology program at the University of Toronto. He won a Woodrow Wilson Fellowship to do graduate work. He carried this out at Stanford University in California where he earned his M.A. and Ph.D. in experimental psychology. He held academic positions for 8 years, mostly (6 years) in the Psychology Department at the University of Toronto. He then came to Ottawa to work at Bell-Northern Research for 6 years as a senior scientist. He was an independent consultant for 4 years before joining the Department of Communications (DoC - now Industry Canada) in 1983.

Since coming to Ottawa, most of his work has been in human communications issues and problems in relation to new technologies. At the DoC, his first project was to examine the human impacts of the rapidly proliferating information technologies.

In 1986, he moved from DoC Headquarters to the Communications Research Centre (CRC). Here, he began pursuing audio and music research, setting up a new lab for this purpose. In 1993, he became part of the Signal Processing and Psychoacoustics group.

His most recent work (since 1991) has been the subjective evaluation of audio coding systems that use bit- rate compression techniques based on psychoacoustic principles of hearing. He is a member of ITU- R Task Group 10/3 which is mandated to define and recommend new methods for the subjective assessment of small impairments in audio systems.

For more information about the Communications Research Centre, contact Louis Thibault, Manager, Signal Processing and Psychoacoustics, Communications Research Centre, 3701 Carling, Ottawa, Ont., Canada K2H 8S2 Email: louis.thibault@crc.doc.ca, Tel: (+1)613-990-4349, Fax: (+1)613-993-9950

What's Inside

Last Month's Meeting Review

MICVIEW REVIEW

If you missed the April Special Event, here are the highlights of the all-day microphone workshop/seminar, held on April 22, at Ryerson, in the Eaton Lecture Theater.

AES Section Chairman Denis Tremblay introduced the Moderator, the justly famous audio author, past AES President, and Delos Recording Engineer, John Eargle.

John introduced and timed the speakers, posed questions and added remarks throughout the day, in an effort to optimise the presentations. Audience questions and comments were encouraged.

The first part of the day was split up into topical segments for the contributing manufacturers, who shared catalogs, data sheets and displayed many mics [to caress], starting with:

NEUMANN / SENNHEISER: Introduction To Pro Studio Mics.

Given by Juergen Wahl, National Sales Manager, Neumann USA, in a personable and involving style, we were treated to PCM sound samples and clear VGA from his computer (on the big screen). Excellent mics have been around for about 68 years (Haydn string quartet sample) and are crucial for many things, such as DSP-based synthesizers, to initially record the various instrument samples.

The choice of a mic depends on its environment, e.g. in a studio one can use fragile or fussy mics, because they will be properly used. Not so in home, PA or rock use. In concert halls there is less expertise and indifferent personnel, so ruggedness and neutrality are needed. Some jobs need the 140+ dB capability of modern mics. Some users want that edgy/tube sound, others despise it. TV mics must not be seen. Parabolic reflector mics, shotgun mics and boundary-effect mics can pick up that baseball "pop", basketball shoe squeak, or hockey ice-scrape sound. Beware of the foul language that accompanies these sounds.

Mic choice also depends on the dynamic range expected, the frequency range, (don't use a cardioid for low bass) working distance, and problem instruments such as flute, violin, cello, piano, bassoon or organ.

Basic dynamic mic properties are: rugged, quiet, passive (no power needed), hot/cold/ damp-proof, cheap, but also: nonlinear at high levels, poor transient behavior, limited frequency range, low-ish output level, hum induction. The condenser/electret mic avoids these negatives, but needs power and to stay dry. Top of the line mics of both types have very few of these shortcomings, but at a price.

In the diffuse soundfield, where most sound has been reflected off a wall, floor or ceiling, and thus out of phase with the direct sound, an omni mic will have a strong treble rolloff. A cardioid cancels most of the rear input, and thus has less treble rolloff. Therefore the nearfield working distance limit, (in the direct sound field) of a cardioid is about 1.7 times that of an omni, a hypercardioid does about twice, and shotgun mics more than 2.2 times.

But cardioids have poorer off-axis response and the "proximity effect" bass rise at short range. This bass rise effect is proportional to the mic element diameter, as is the treble directionality increase (above 8 KHz) of all mics, and the treble roll-off of omnis. Maximum SPL ratings are at .5% or 1% distortion or unstated. Demand these specs before selecting a mic for high SPL work.

A number of tie pins and mic accessories were handed out for correct audience answers and comments.

After a refreshments break, the floor [um, actually, the wireless mic] was given to:

AKG: Overview Of Its Product Line, with Paul Gonsalves, of Acoustical Services Canada, representing AKG.

Using the datasheets, Paul walked us through the application strengths and intentions of all current AKG mics, e.g. what the treble emphasis and bass rolloff options can do, and the best uses of the various specialty mics. There are now cardioids that can handle low bass.

That brought us to the lunch break.

Next up was:

BRUEL & KJAER: Overview Of Its Product Line, with Gary Baldassari, B& K Representative,

guest of TGI.

Gary's 30 years mixing, directing, producing of many albums, orchestras and shows involved some extreme mic requirements: 166dB SPL at Cape Kennedy was handled with a catalog mic and interface. (130V power) Damp-proof B& K electret technology, rated for a 900 year life, was shown with a dunking into a glass of water. The 4040 mic offers a 7dB self noise, useful in actual fields and quiet venues. The uses of conical and spherical push-on mic accessories for tweaking the off-axis and the on- axis responses, were explained. And please stop damaging your left-ear hearing while driving: close that window.

AUDIO-TECHNICA: Automated Measurements with Audio Precision was presented by Kelly Statham, Development Engineer, Audio-Technica US, with a series of transparencies and a laconic commentary.

The use of an Audio Precision instrument with custom software additions and high-speed testjigs is the main tool for production testing, (all Pro mics) QC and development testing, and all related documentation. This setup in an anechoic chamber regularly checks and recalibrates itself, plots the polar response at many frequencies and .9 degree increments via MLS stimulus, at a 50 cm distance. Lesser mics than the 40 series are lot and batch sampled. A video clip of the setup in action was shown. All parameters are tested and recorded: self-noise, power drain, sensitivity, impedance, etc. The automation makes the system very economical.

The PRACTICAL APPLICATIONS part of the day was next, after a refreshments break. Given by expert mic users, it began with:

STEREO MIC TECHNIQUES, by the Moderator, John Eargle, JME Consulting.

Several setups were discussed, such as wide-spaced omnis with a center mic at 6dB down, vs. close-spaced cardioid pairs. Large-element mics have 6-10 dB more directionality above 10KHz, so expect 8dB rolloff with crossed-mics set at the usual 120 degrees. Even half-inch mics have this problem. Remember that the 180 degree response of most mics is quite variable at acoustic wavelength equal to the mic diameter. This will force use of many mics in an orchestra rather than only two, far away. This also avoids diffuse soundfield treble rolloff.

Organs are difficult. Try two mics 8-10ft apart, at 30ft away. Pianos, 2 ft apart, at 3 ft away. To check for correct working distance, invert one channel.

This brought us to the next subject:

LOCATION SOUND, by Doug McClement, owner, chief engineer, Live Wire Remote Recorders.

In twenty years of recording, and lately, 24-track air transportable digital recording, some lessons stand out. Different artists will require different mics, often selected by some other insistent person. A band may use many wireless mics, spread out over 4 or 5 chairs on a chairlift, or may require everything battery portable. Don't change mics or shift speakers in an established show to make recording easier, nor use PA mics on instruments. To obtain isolation, try suspending two lavaliers inside a closed piano. With choirs, close-mic with four to eight mics to get isolation from a band. A record mix is very different from the house or artist-monitor mix. Live recording is tricky. Hall sound, as in a performance, does not sound good in a living room. Replacing stage monitors with in-ear monitors makes a much cleaner recorded sound by eliminating back wall bounce.

Many audience questions later, we got to the next subject:

VARIETY TELEVISION, by Simon Bowers, CBC Television.

This veteran of eight Juno Awards shows and many, many other specials has won Gemini Awards for his work, which started when the "635" mics did everything, in the era when all "mics should be heard, not seen." Recently most mics are wireless ones, because of extreme application simplicity on a crowded set, and mature technology. This avoids grounding and low mic level problems altogether. Some tiny mics are either in the hairline or on the chest. The other mics are usually cardioids for isolation, but care must go into avoidance of off-axis or reflected sound. In noisy locations (games, races) the headset mic is a noise- cancelling type. When two body mics get close together, as in intimate scenes, shut off one to avoid phasing; the other will pick up both actors.

Second-lastly, there was:

STUDIO, by Jeff Wolpert, Chief Engineer, McClear Pathe Recording Studios.

The 46 year old company has lots of older analog equipment and mics, and finds them very much in demand again. A number of the old condenser mics have been overhauled, and upgraded by installing thinner than original diaphragms, to obtain higher output and better transient response. In studio work, one works in short takes, to be assembled later, using panned mono to obtain stereo, as well as accurately control the stereo image, being careful of some mics' off-axis quirks. It is important to keep the final mix complexity down: If you mix three layers of bass, you end up with mush. Mix and match older tube mics with modern equipment; too much of same is detrimental. Don't EQ if possible: If a big drum is too loud, mic it on its side, where it partially cancels itself. Super-flat mics often require EQ. Better exploit the emphasis and rolloffs in common mics.

And lastly, we had:

P.A. PRACTICE, by Gary Baldessari, B& K.

Much cleaner PA sound is now possible by equipment improvements in the last few years, and a conscious effort to control max. loudness and hence distortion, as well as participants' hearing. The equipment is also more flexible: To control runaway sibilants or plosives, raise the lower end of the dynamic range, then slice the remaining p, s, and t transients off with an aggressive top limiter. Also saves speakers. Long mic lines give problems such as noise pickup, treble rolloff, etc. Use line drivers at the mics to avoid these, or even use fibre optic links for problem runs. And last but not least, when a runaway guitarist drowns out the rest of the band....unplug. He/she/it will soon learn to coexist!

That ended the presentations, but Juergen Wahl quickly explained why some mics use gold, others aluminum or nickel on the diaphragm. The coating is so thin that the cost or weight doesn't matter. All metals are excellent conductors. Convenience or familiarity with different vacuum depositing techniques is the main consideration in choosing coating metals.

Much of this, and other engineering information isn't in books, therefore events like this one are necessary for the pro.

John Eargle thanked all the speakers and organizers for their efforts, and the audience for its active and constructive participation. Based on this and past events, he finds the Toronto AES Section efforts always interesting and of a high caliber, comparable to the best anywhere else.

John Fourdraine, in Toronto

March Meeting Review

Claude Fortier and the Athena Project

Our March meeting had more than the usual acoustical and loudspeaker design representation among the 50 or so in the superb presentation theater at Adcom Electronics. One of their own, Dr. Claude Fortier, of State of the Art Electronik Inc. studio monitor fame, explained some results of the Athena project, started in 1989. It is a collaboration of his Canadian Audio Research Consortium and the National Research Council of Canada. The aim is to eliminate the influences of loudspeaker placement, room resonant modes and listener location on reproduced sound in most studios or living rooms, via low cost digital signal processing, so that the consortium members, Audio Products Int'l, PSB Loudspeakers Int'l, Paradigm Electronics Inc., State of the Art Electronik Inc., the N.R.C., and the Industrial Research Assistance Program may apply the results commercially.

Basic experiments were carried out initially, in just one corner of a room, (only two walls and the floor) using clumsy DSP, to learn the basic acoustic/geometric rules and dependencies. The recent setup uses a 460 by 670 cm (plus a 180 cm stub) L-shaped room with a modicum of acoustic treatment (drapes) behind the loudspeaker and listener, a model CF-150 two-way bookshelf studio monitor loudspeaker, a custom- made DSP with two Motorola 56000 chips per audio channel, and the best (so far) algorithms for flat response at the listener location.

After a 3 year development the system was tried in this room, at 6 central listener locations on a 100 by 50 cm grid, and from 5 loudspeaker locations on a 100 by 100 cm grid, (50 cm away from the corner) at two different heights, using 50 cm steps, to mimic typical home-use. Unequalised response measurements were made, 72 in all, and then repeated using the Athena project DSP and the findings on psycho-acoustic perception, room-resonances, and off-axis sound combining in and out of phase with the on-axis sound (due to reflections).

Analysis showed that the room's influence caused most of the +/- 11 dB max., +/- 7 dB typ. variations below 400 Hz, and that above 400 Hz the response is largely independent of source position, but not room absorption. Prior research showed that a one-third octave equalizer has not enough resolution, and an analog parametric eq. has not enough bands to compensate accurately. The DSP has good frequency resolution and concentrates on correcting the magnitude response, since some time-domain corrections (to cancel reverb, etc.) introduce audible aberrations. The final curve set of the 72 equalised setups showed +/- 2 dB max. variations, with those above 400 Hz partly due to loudspeaker response, which led to its minor redesign.

The conclusions of the Athena project are very promising: with a modest DSP engine a major correction in reproduction accuracy can be had in a typical living-room or studio. This correction can be changed as often as desired with other positions of source and/or listener, needs no user interaction, (just push the cal button, with the source and mike in desired locations) and holds well inside a 100 by 50 cm listener window. The next generation of studio monitors and high-end consumer audio will soon use this technology.

Dr. Fortier answered many audience questions, during and after his presentation, and an enjoyable as well as productive time was had by all.

John Fourdraine, Toronto.

What's Inside

Previous and Next Issues

Back to April 1995

Forward to June 1995

Articles may be used with the Author's Permission. Contact the Bulletin Editor:earlm@hookup.net

Earl McCluskie Assistant Editor: Anne Reynolds Layout Editor: Lee White

The Bulletin is prepared in print by Lee White, and on Horizon and the Internet by Earl McCluskie.