Parametric Time-Frequency Domain Spatial Audio -  - E-Book

Parametric Time-Frequency Domain Spatial Audio E-Book

0,0
106,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive guide that addresses the theory and practice of spatial audio This book provides readers with the principles and best practices in spatial audio signal processing. It describes how sound fields and their perceptual attributes are captured and analyzed within the time-frequency domain, how essential representation parameters are coded, and how such signals are efficiently reproduced for practical applications. The book is split into four parts starting with an overview of the fundamentals. It then goes on to explain the reproduction of spatial sound before offering an examination of signal-dependent spatial filtering. The book finishes with coverage of both current and future applications and the direction that spatial audio research is heading in. Parametric Time-frequency Domain Spatial Audio focuses on applications in entertainment audio, including music, home cinema, and gaming--covering the capturing and reproduction of spatial sound as well as its generation, transduction, representation, transmission, and perception. This book will teach readers the tools needed for such processing, and provides an overview to existing research. It also shows recent up-to-date projects and commercial applications built on top of the systems. * Provides an in-depth presentation of the principles, past developments, state-of-the-art methods, and future research directions of spatial audio technologies * Includes contributions from leading researchers in the field * Offers MATLAB codes with selected chapters An advanced book aimed at readers who are capable of digesting mathematical expressions about digital signal processing and sound field analysis, Parametric Time-frequency Domain Spatial Audio is best suited for researchers in academia and in the audio industry.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 754

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Parametric Time–Frequency Domain Spatial Audio

Edited by

Ville Pulkki, Symeon Delikaris-Manias, and Archontis Politis

Aalto University Finland

This edition first published 2018 © 2018 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Ville Pulkki, Symeon Delikaris-Manias and Archontis Politis to be identified as the authors of the editorial material in this work has been asserted in accordance with law.

Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Pulkki, Ville, editor. | Delikaris-Manias, Symeon, editor. | Politis, Archontis, editor. Title: Parametric time-frequency domain spatial audio / edited by Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis, Aalto University, Aalto, Finland. Description: First edition. | Hoboken, NJ, USA : Wiley, 2018. | Includes bibliographical references and index. | Identifiers: LCCN 2017020532 (print) | LCCN 2017032223 (ebook) | ISBN 9781119252580 (pdf) | ISBN 9781119252610 (epub) | ISBN 9781119252597 (hardback) Subjects: LCSH: Surround-sound systems--Mathematical models. | Time-domain analysis. | Signal processing. | BISAC: TECHNOLOGY & ENGINEERING / Electronics / General. Classification: LCC TK7881.83 (ebook) | LCC TK7881.83 .P37 2018 (print) | DDC 621.382/2--dc23 LC record available at https://lccn.loc.gov/2017020532

Cover Design: Wiley Cover Image: © Vectorig/Gettyimages

CONTENTS

List of Contributors

Preface

Notes

About the Companion Website

Part I Analysis and Synthesis of Spatial Sound

1 Time–Frequency Processing: Methods and Tools

1.1 Introduction

1.2 Time–Frequency Processing

1.3 Processing of Spatial Audio

Note

References

2 Spatial Decomposition by Spherical Array Processing

2.1 Introduction

2.2 Sound Field Measurement by a Spherical Array

2.3 Array Processing and Plane-Wave Decomposition

2.4 Sensitivity to Noise and Standard Regularization Methods

2.5 Optimal Noise-Robust Design

2.6 Spatial Aliasing and High Frequency Performance Limit

2.7 High Frequency Bandwidth Extension by Aliasing Cancellation

2.8 High Performance Broadband PWD Example

2.9 Summary

2.10 Acknowledgment

References

3 Sound Field Analysis Using Sparse Recovery

3.1 Introduction

3.2 The Plane-Wave Decomposition Problem

3.3 Bayesian Approach to Plane-Wave Decomposition

3.4 Calculating the IRLS Noise-Power Regularization Parameter

3.5 Numerical Simulations

3.6 Experiment: Echoic Sound Scene Analysis

3.7 Conclusions

Appendix

References

Part II Reproduction of Spatial Sound

4 Overview of Time–Frequency Domain Parametric Spatial Audio Techniques

4.1 Introduction

4.2 Parametric Processing Overview

References

5 First-Order Directional Audio Coding (DirAC)

5.1 Representing Spatial Sound with First-Order B-Format Signals

5.2 Some Notes on the Evolution of the Technique

5.3 DirAC with Ideal B-Format Signals

5.4 Analysis of Directional Parameters with Real Microphone Setups

5.5 First-Order DirAC with Monophonic Audio Transmission

5.6 First-Order DirAC with Multichannel Audio Transmission

5.7 DirAC Synthesis for Headphones and for Hearing Aids

5.8 Optimizing the Time–Frequency Resolution of DirAC for Critical Signals

5.9 Example Implementation

5.10 Summary

References

6 Higher-Order Directional Audio Coding

6.1 Introduction

6.2 Sound Field Model

6.3 Energetic Analysis and Estimation of Parameters

6.4 Synthesis of Target Setup Signals

6.5 Subjective Evaluation

6.6 Conclusions

Note

References

7 Multi-Channel Sound Acquisition Using a Multi-Wave Sound Field Model

7.1 Introduction

7.2 Parametric Sound Acquisition and Processing

7.3 Multi-Wave Sound Field and Signal Model

7.4 Direct and Diffuse Signal Estimation

7.5 Parameter Estimation

7.6 Application to Spatial Sound Reproduction

7.7 Summary

Notes

References

8 Adaptive Mixing of Excessively Directive and Robust Beamformers for Reproduction of Spatial Sound

8.1 Introduction

8.2 Notation and Signal Model

8.3 Overview of the Method

8.4 Loudspeaker-Based Spatial Sound Reproduction

8.5 Binaural-Based Spatial Sound Reproduction

8.6 Conclusions

References

9 Source Separation and Reconstruction of Spatial Audio Using Spectrogram Factorization

9.1 Introduction

9.2 Spectrogram Factorization

9.3 Array Signal Processing and Spectrogram Factorization

9.4 Applications of Spectrogram Factorization in Spatial Audio

9.5 Discussion

9.6 Matlab Example

Note

References

Part III Signal-Dependent Spatial Filtering

10 Time–Frequency Domain Spatial Audio Enhancement

10.1 Introduction

10.2 Signal-Independent Enhancement

10.3 Signal-Dependent Enhancement

References

11 Cross-Spectrum-Based Post-Filter Utilizing Noisy and Robust Beamformers

11.1 Introduction

11.2 Notation and Signal Model

11.3 Estimation of the Cross-Spectrum-Based Post-Filter

11.4 Implementation Examples

11.5 Conclusions and Further Remarks

11.6 Source Code

Note

References

12 Microphone-Array-Based Speech Enhancement Using Neural Networks

12.1 Introduction

12.2 Time–Frequency Masks for Speech Enhancement Using Supervised Learning

12.3 Artificial Neural Networks

12.4 Mask Learning: A Simulated Example

12.5 Mask Learning: A Real-World Example

12.6 Conclusions

12.7 Source Code

Notes

References

Part IV Applications

13 Upmixing and Beamforming in Professional Audio

13.1 Introduction

13.2 Stereo-to-Multichannel Upmix Processor

13.3 Digitally Enhanced Shotgun Microphone

13.4 Surround Microphone System Based on Two Microphone Elements

13.5 Summary

References

14 Spatial Sound Scene Synthesis and Manipulation for Virtual Reality and Audio Effects

14.1 Introduction

14.2 Parametric Sound Scene Synthesis for Virtual Reality

14.3 Spatial Manipulation of Sound Scenes

14.4 Summary

References

15 Parametric Spatial Audio Techniques in Teleconferencing and Remote Presence

15.1 Introduction and Motivation

15.2 Background

15.3 Immersive Audio Communication System (ImmACS)

15.4 Capture and Reproduction of Crowded Acoustic Environments

15.5 Conclusions

Notes

References

Index

EULA

List of Tables

Chapter 9

Table 9.1

Table 9.2

Chapter 12

Table 12.1

Table 12.2

Guide

Cover

Table of Contents

Preface

Pages

xiii

xiv

xv

xvi

xvii

xix

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

49

50

51

52

53

54

55

56

57

58

59

61

62

65

66

67

69

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

96

97

98

99

100

101

102

103

104

105

106

108

109

110

111

112

114

115

116

117

118

119

120

122

125

131

132

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

201

202

203

204

205

206

207

208

209

210

211

212

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

236

237

238

239

240

241

243

244

245

246

247

248

249

250

251

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

327

329

330

331

332

333

334

335

336

337

338

339

341

343

344

345

346

347

348

350

351

352

353

354

355

356

357

358

359

360

361

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

List of Contributors

Ahonen, Jukka

Akukon Ltd, Finland

Alexandridis, Anastasios

Foundation for Research and Technology-Hellas, Institute of Computer Science (FORTH-ICS), Heraklion, Crete, Greece

Alon, David Lou

Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel

Bäckström, Tom

Department of Signal Processing and Acoustics, Aalto University, Finland

Delikaris-Manias, Symeon

Department of Signal Processing and Acoustics, Aalto University, Finland

Epain, Nicolas

CARLab, School of Electrical and Information Engineering, University of Sydney, Australia

Faller, Christof

Illusonic GmbH, Switzerland and École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

Habets, Emanuël

International Audio Laboratories Erlangen, Germany

Jin, Craig T.

CARLab, School of Electrical and Information Engineering, University of Sydney, Australia

Laitinen, Mikko-Ville

Nokia Technologies, Finland

Mouchtaris, Athanasios

Foundation for Research and Technology-Hellas, Institute of Computer Science (FORTH-ICS), Heraklion, Crete, Greece

Nikunen, Joonas

Department of Signal Processing, Tampere University of Technology, Finland

Noohi, Tahereh

CARLab, School of Electrical and Information Engineering, University of Sydney, Australia

Pavlidi, Despoina

Foundation for Research and Technology-Hellas, Institute of Computer Science (FORTH-ICS), Heraklion, Crete, Greece

Pertilä, Pasi

Department of Signal Processing, Tampere University of Technology, Finland

Pihlajamäki, Tapani

Nokia Technologies, Finland

Politis, Archontis

Department of Signal Processing and Acoustics, Aalto University, Finland

Pulkki, Ville

Department of Signal Processing and Acoustics, Aalto University, Finland

Rafaely, Boaz

Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel

Stefanakis, Nikolaos

Foundation for Research and Technology-Hellas, Institute of Computer Science (FORTH-ICS), Heraklion, Crete, Greece

Thiergart, Oliver

International Audio Laboratories Erlangen, Germany

Vilkamo, Juha

Nokia Technologies, Finland

Virtanen, Tuomas

Department of Signal Processing, Tampere University of Technology, Finland

Preface

A plethora of methods for capturing, storing, and reproducing monophonic sound signals has been developed in the history of audio, starting from early mechanical devices, and progressing via analog electronic devices to faithful digital representation. In recent decades there has also been considerable effort to capture and recreate the spatial characteristics of sound scenes to a listener. When reproducing a sound scene, the locations of sound sources and responses of listening spaces should be perceived as in the original conditions, in either faithful replication or with deliberate modification. A vast number of research articles have been published suggesting methods to capture, store, and recreate spatial sound over headphone or loudspeaker listening setups. However, one cannot say that the field has matured yet, as new techniques and paradigms are still actively being published.

Another important task in spatial sound reproduction is the directional filtering of sound, where unwanted sound coming from other directions is attenuated when compared to the sound arriving from the direction of the desired sound source. Such techniques have applications in surround sound, teleconferencing, and head-mounted virtual reality displays.

This book covers a number of techniques that utilize signal-dependent time–frequency domain processing of spatial audio for both tasks: spatial sound reproduction and directional filtering. The application of time–frequency domain techniques in spatial audio is relatively new, as the first attempts were published about 15 years ago. A common property of the techniques is that the sound field is captured with multiple microphones, and its properties are analyzed for each time instance and individually for different frequency bands. These properties can be described by a set of parameters, which are subsequently used in processing to achieve different tasks, such as perceptually motivated reproduction of spatial sound, spatial filtering, or spatial sound synthesis. The techniques are loosely gathered under the title “time–frequency domain parametric spatial audio.”

The term “parameter” generally denotes any characteristic that can help in defining or classifying a particular system. In spatial audio techniques, the parameter somehow quantifies the properties of the sound field depending on frequency and time. In some techniques described in this book, measures having physical meaning are used, such as the direction of arrival, or the diffuseness of the sound field. Many techniques measure the similarity or dissimilarity of signals from closely located microphones, which also quantifies the spatial attributes of the sound field, although the mapping from parameter value to physical quantities is not necessarily very easy. In all cases, the time- and frequency-dependent parameter directly affects the reproduction of sound, which makes the outputs of the methods depend on the spatial characteristics of the captured sound field. With these techniques, in most cases a significant improvement is obtained with such signal-dependent signal processing compared with more traditional signal-independent processing, when an input with relatively few audio channels is processed.

Signal-dependent processing often relies on implicit assumptions about the properties of the spatial and spectral resolution of the listener, and/or of the sound field. In spatial sound reproduction, the systems should relay sound signals to the ear canals of the listener such that the desired perception of the acoustical surroundings is obtained. The resolution of all perceivable attributes, such as sound spectrum, direction of arrival, or characteristics of reverberation, should be as high as required so that no difference from the original is perceived. On the other hand, the attributes should not be reproduced with an accuracy that is higher than needed, so that the use of computational resources is optimal. Optimally, an authentic reproduction is obtained with a moderate amount of resources, i.e., only a few microphones are needed, the computational requirements are not excessive, and the listening setup consists of only a few electroacoustic transducers.

When the captured acoustical scene deviates from the assumed model, the benefit obtained by the parametric processing may be lost, in addition to potential undesired audible degradations of the audio. An important theme in all the methods presented is how to make them robust to such degradations, by assuming extended and complex models, and/or by handling estimation errors and deviations without detrimental perceptual effects by allowing the result to deviate from reality. Such an optimization requires a deep knowledge of sound field analysis, microphone array processing, statistical signal processing, and spatial hearing. That makes the research topic rich in technological approaches.

The composition of this book was motivated by work on parametric spatial audio at Aalto University. A large number of publications and theses are condensed in this book, aiming to make the core of the work easily accessible. In addition, several chapters are contributed by established international researchers in this topic, offering a wide view of the approaches and solutions in this field.

The first part of the book concerns the analysis of and the synthesis of spatial audio. The first chapter reviews the methods that are commonly used in industrial audio applications when transforming signals to the time–frequency domain. It also provides background knowledge for methods of reproducing sound with controllable spatial attributes, the methods being utilized in several chapters in the book. The two other chapters in this part consider methods for analysis of spatial sound captured with a spherical microphone array: how to decompose the sound field recording into plane waves.

The second part considers systems that consist of a whole sound reproduction chain including capture with a microphone array; time–frequency domain analysis, processing, and synthesis; and often also subjective evaluation of the result. The basic question is how to reproduce a spatial sound scene in such a way that a listener would not notice a difference between original and reproduced occasions. All the methods are parametric in some sense, however, with different assumptions about the sound field and the listener, and with different microphone arrays utilized; the solutions end up being very different.

The third part starts with a review of current signal-dependent spatial filtering approaches. After this, two chapters with new contributions to the field follow. The second chapter discusses a method based on stochastic estimates between higher-order directional patterns, and the third chapter suggests using machine learning and neural networks to perform spatial filtering tasks.

The fourth part extends the theoretic framework to more practical approaches. The first chapter shows a number of commercial devices that utilize parametric time–frequency domain audio techniques. The second chapter discusses the application of the techniques in synthesis of spatial sound for virtual acoustic environments, and the third chapter covers applications in teleconferencing and remote presence.

The reader should possess a good knowledge of the fields of acoustics, audio, psychoacoustics, and digital signal processing; introductions to these fields can be found in other sources.1 Finally, the working principles of many of the proposed techniques are demonstrated with code examples, written in Matlab®, which focus mostly only on the parametric part of the processing. Tools for time–frequency domain transforms and for linear processing of spatial audio are typically also needed in the implementation of complete audio systems, and the reader might find useful the broad range of tools developed by our research group at Aalto University.2 In addition, a range of research-related demos are available.3

Ville Pulkki, Archontis Politis, and Symeon Delikaris-Manias

Otaniemi, Espoo 2017

Notes

1

See, for example,

Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics

, V. Pulkki and M. Karjalainen, Wiley, 2015.

2

See

http://spa.aalto.fi/en/research/research_groups/communication_acoustics/acoustics_software/

.

3

http://spa.aalto.fi/en/research/research_groups/communication_acoustics/demos/

.

About the Companion Website

Don't forget to visit the companion website for this book:

www.wiley.com/go/pulkki/parametrictime-frequency

There you will find valuable material designed to enhance your learning, including:

Part IAnalysis and Synthesis of Spatial Sound

1Time–Frequency Processing: Methods and Tools

Juha Vilkamo1 and Tom Bäckström2

1Nokia Technologies, Finland

2Department of Signal Processing and Acoustics, Aalto University, Finland

1.1 Introduction

In most audio applications, the purpose is to reproduce sounds for human listening, whereby it is essential to design and optimize systems for perceptual quality. To achieve such optimal quality with given resources, we often use principles in the processing of signals that are motivated by the processes involved in hearing. In the big picture, human hearing processes the sound entering the ears in frequency bands (Moore, 1995). The hearing is thus sensitive to the spectral content of ear canal signals, which changes quickly with time in a complex way. As a result of frequency-band processing, the ear is not particularly sensitive to small differences in weaker sounds in the presence of a stronger masking sound near in frequency and time to the weaker sound (Fastl and Zwicker, 2007). Therefore, a representation of audio signals where we have access to both time and frequency information is a well-motivated choice.

A prerequisite for efficient audio processing methods is a representation of the signal that presents features desirable to hearing in an accessible form and also allows high-quality playback of signals. Useful properties of such a representation are, for example, that its coefficients have physically or perceptually relevant interpretations, and that the coefficients can be processed independently from each other. The time–frequency domain is such a domain, and it is commonly used in audio processing (Smith, 2011). Spectral coefficients in this domain explain the signal content in terms of frequency components as a function of time, which is an intuitive and unambiguous physical interpretation. Moreover, time–frequency components are approximately uncorrelated, whereby they can be independently processed and the effect on the output is deterministic. These properties make the spectrum a popular domain for audio processing, and all the techniques discussed in this book utilize it. The first part of this chapter will give an overview of the theory and practice of the tools typically needed in time–frequency processing of audio channels.

The time–frequency domain is also useful when processing the spatial characteristics of sound, for example in microphone array processing. Differences in directions of arrival of wavefronts are visible as differences in time of arrival and amplitude between microphone signals. When the microphone signals are transformed to the time–frequency domain, the differences directly correspond to differences in phase and magnitude in a similar fashion to the way spatial cues used by a human listener are encoded in the ear canal signals (Blauert, 1997). The time–frequency domain differences between microphone channels have proven to be very useful in the capture, analysis, and reproduction of spatial audio, as is shown in the other chapters of this book. The second part of this chapter introduces a few signal processing techniques commonly used, and serves as background information for the reader.

This chapter assumes understanding of basic digital signal processing techniques from the reader, which can be obtained from such basic resources as Oppenheim and Schafer (1975) or Mitra and Kaiser (1993).

1.2 Time–Frequency Processing

1.2.1 Basic Structure

A block diagram of a typical parametric time–frequency processing algorithm is shown in Figure 1.1. The processing involves transforms between the time domain input signal xi(t), the time–frequency domain signal xi(k, n), and the time domain output signal yj(t), where t is the time index in the time domain, and k and n are indexes for the frequency and time frame in the time–frequency domain, respectively; i and j are then the channel indexes in the case of multi-channel input and/or output. Additionally, the processing involves short-time stochastic analysis and parameter-driven processing, where the time domain signal y(k, n) is formed based on the parameters and x(k, n). The parametric data consists of any information describing the frequency band signals, for example the stochastic properties, information based on the audio objects, or user input parameters. In some use cases, such as in parametric spatial audio coding decoders, the stochastic estimation block is not applied, and the processing acts entirely on the parametric data provided in the bit stream.

Figure 1.1 Block diagram of a typical parametric time–frequency processing algorithm. The processing operates on three sampling rates: that of the wide-band signal, that of the frequency band signal, and that of the parametric information.

The parametric processing techniques typically operate on several different sampling rates: The sampling rate Fs of the wide-band signal, the sampling rate Fs/K of the frequency band signals, where K is the downsampling factor, and the sampling rate of the parametric information. Since the samples in the parametric information typically describe the signal properties over time frames, it potentially operates at a sampling rate below Fs/K. The parametric processing can also take place using a varying sampling rate, for example when the frame size adapts with the observed onsets of audio signals. In the following sections, the required background for the processing blocks in Figure 1.1 is discussed in detail.

Audio signals are generally time-varying signals whereby the spectrum is not constant in time. Should we analyze a long segment, then its spectrum would contain a mixture of all the different sounds within that segment. We could then not easily access the independent sounds, but only see their mixture, and the application of efficient processing methods would become difficult. It is therefore important to choose segments of the signal of such a length that we obtain good temporal separation of the audio content. Also, other properties such as constraints on algorithmic delay and requirements on spectral resolution impose demands on the length of analysis windows. It is then clear that while the spectrum or the frequency domain is an efficient domain for audio processing, the time axis also has to be factored in to the representation.

Computationally effective algorithms for time–frequency analysis have enabled their current popular usage. Namely, the basis of most time–frequency algorithms is the fast Fourier transform (FFT), which is a practical implementation of the discrete Fourier transform (DFT). It belongs to the class of super-fast algorithms that have an algorithmic complexity of , where N