36,59 €
Design and use machine learning models for music generation using Magenta and make them interact with existing music creation tools
Key Features
Book Description
The importance of machine learning (ML) in art is growing at a rapid pace due to recent advancements in the field, and Magenta is at the forefront of this innovation. With this book, you'll follow a hands-on approach to using ML models for music generation, learning how to integrate them into an existing music production workflow. Complete with practical examples and explanations of the theoretical background required to understand the underlying technologies, this book is the perfect starting point to begin exploring music generation.
The book will help you learn how to use the models in Magenta for generating percussion sequences, monophonic and polyphonic melodies in MIDI, and instrument sounds in raw audio. Through practical examples and in-depth explanations, you'll understand ML models such as RNNs, VAEs, and GANs. Using this knowledge, you'll create and train your own models for advanced music generation use cases, along with preparing new datasets. Finally, you'll get to grips with integrating Magenta with other technologies, such as digital audio workstations (DAWs), and using Magenta.js to distribute music generation apps in the browser.
By the end of this book, you'll be well-versed with Magenta and have developed the skills you need to use ML models for music generation in your own style.
What you will learn
Who this book is for
This book is for technically inclined artists and musically inclined computer scientists. Readers who want to get hands-on with building generative music applications that use deep learning will also find this book useful. Although prior musical or technical competence is not required, basic knowledge of the Python programming language is assumed.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 418
Veröffentlichungsjahr: 2020
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Mrinmayee KawalkarAcquisition Editor: Ali AbidiContent Development Editor:Nazia ShaikhSenior Editor: Ayaan HodaTechnical Editor:Joseph SunilCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Deepika Naik
First published: January 2020
Production reference: 1300120
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83882-441-9
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.
Gogul Ilango has a bachelor's degree in electronics and communication engineering from Thiagarajar College of Engineering, Madurai, and a master's degree in VLSI design and embedded systems from Anna University, Chennai, where he was awarded the University Gold Medal for academic performance. He has published four research papers in top conferences and journals related to artificial intelligence. His passion for music production and deep learning led him to learn about and contribute to Google's Magenta community, where he created an interactive web application called DeepDrum as well as DeepArp using Magenta.js, available in Magenta's community contribution demonstrations. He is a lifelong learner, hardware engineer, programmer, and music producer.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Music Generation with Magenta
About Packt
Why subscribe?
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Code in Action
Get in touch
Reviews
Section 1: Introduction to Artwork Generation
Introduction to Magenta and Generative Art
Technical requirements
Overview of generative art
Pen and paper generative music
Computerized generative music
New techniques with machine learning
Advances in deep learning
Representation in music processes
Representing music with MIDI
Representing music as a waveform
Representing music with a spectrogram
Google's Magenta and TensorFlow in music generation
Creating a music generation system
Looking at Magenta's content
Differentiating models, configurations, and pre-trained models
Generating and stylizing images
Generating audio
Generating, interpolating, and transforming score
Installing Magenta and Magenta for GPU
Choosing the right versions
Creating a Python environment with Conda
Installing prerequisite software
Installing Magenta
Installing Magenta for GPU (optional)
Installing the music software and synthesizers
Installing the FluidSynth software synthesizer
Installing SoundFont
Installing FluidSynth
Testing your installation
Using a hardware synthesizer (optional)
Installing Audacity as a digital audio editor (optional)
Installing MuseScore for sheet music (optional)
Installing a Digital Audio Workstation (optional)
Installing the code editing software
Installing Jupyter Notebook (optional)
Installing and configuring an IDE (optional)
Generating a basic MIDI file
Summary
Questions
Further reading
Section 2: Music Generation with Machine Learning
Generating Drum Sequences with the Drums RNN
Technical requirements
The significance of RNNs in music generation
Operating on a sequence of vectors
Remember the past to better predict the future
Using the right terminology for RNNs
Using the Drums RNN on the command line
Magenta's command-line utilities
Generating a simple drum sequence
Understanding the model's parameters
Changing the output size
Changing the tempo
Changing the model type
Priming the model with Led Zeppelin
Configuring the generation algorithm
Other Magenta and TensorFlow flags
Understanding the generation algorithm
Generating the sequence branches and steps
Making sense of the randomness
Using the Drums RNN in Python
Generating a drum sequence using Python
Packaging checkpoints as bundle files
Encoding MIDI using Protobuf in NoteSequence
Mapping MIDI notes to the real world
Encoding percussion events as classes
Sending MIDI files to other applications
Summary
Questions
Further reading
Generating Polyphonic Melodies
Technical requirements
LSTM for long-term dependencies
Looking at LSTM memory cells
Exploring alternative networks
Generating melodies with the Melody RNN
Generating a song for Fur Elisa
Understanding the lookback configuration
Understanding the attention mask
Losing track of time
Generating polyphony with the Polyphony RNN and Performance RNN
Differentiating conditioning and injection
Explaining the polyphonic encoding
Performance music with the Performance RNN
Generating expressive timing like a human
Summary
Questions
Further reading
Latent Space Interpolation with MusicVAE
Technical requirements
Continuous latent space in VAEs
The latent space in standard AEs
Using VAEs in generating music
Score transformation with MusicVAE and GrooVAE
Initializing the model
Sampling the latent space
Writing the sampling code
Refining the loss function with KL divergence
Sampling from the same area of the latent space
Sampling from the command line
Interpolating between two samples
Getting the sequence length right
Writing the interpolation code
Interpolating from the command line
Humanizing the sequence
Writing the humanizing code
Humanizing from the command line
More interpolation on melodies
Sampling the whole band
An overview of other pre-trained models
Understanding TensorFlow code
Building the VAE graph
Building an encoder with BidirectionalLstmEncoder
Building a decoder with CategoricalLstmDecoder
Building the hidden layer
Looking at the sample method
Looking at the interpolate method
Looking at the groove method
Summary
Questions
Further reading
Audio Generation with NSynth and GANSynth
Technical requirements
Learning about WaveNet and temporal structures for music
Looking at NSynth and WaveNet autoencoders
Visualizing audio using a constant-Q transform spectrogram
The NSynth dataset
Neural audio synthesis with NSynth
Choosing the WaveNet model
Encoding the WAV files
Visualizing the encodings
Saving the encodings for later use
Mixing encodings together by moving in the latent space
Synthesizing the mixed encodings to WAV
Putting it all together
Preparing the audio clips
Generating new instruments
Visualizing and listening to our results
Using NSynth generated samples as instrument notes
Using the command line
More of NSynth
Using GANSynth as a generative instrument
Choosing the acoustic model
Getting the notes information
Gradually sampling from the latent space
Generating random instruments
Getting the latent vectors
Generating the samples from the encoding
Putting it all together
Using the command line
Summary
Questions
Further reading
Section 3: Training, Learning, and Generating a Specific Style
Data Preparation for Training
Technical requirements
Looking at existing datasets
Looking at symbolic representations
Building a dataset from the ground up
Using the LMD for MIDI and audio files
Using the MSD for metadata information
Using the MAESTRO dataset for performance music
Using the Groove MIDI Dataset for groovy drums
Using the Bach Doodle Dataset
Using the NSynth dataset for audio content
Using APIs to enrich existing data
Looking at other data sources
Building a dance music dataset
Threading the execution to handle large datasets faster
Extracting drum instruments from a MIDI file
Detecting specific musical structures
Analyzing the beats of our MIDI files
Writing the process method
Calling the process method using threads
Plotting the results using Matplotlib
Processing a sample of the dataset
Building a jazz dataset
The LMD extraction tools
Fetching a song's genre using the Last.fm API
Reading information from the MSD
Using top tags to find genres
Finding instrument classes using MIDI
Extracting jazz, drums, and piano tracks
Extracting and merging jazz drums
Extracting and splitting jazz pianos
Preparing the data using pipelines
Refining the dataset manually
Looking at the Melody RNN pipeline
Launching the data preparation stage on our dataset
Understanding a pipeline execution
Writing your own pipeline
Looking at MusicVAE data conversion
Summary
Questions
Further reading
Training Magenta Models
Technical requirements
Choosing the model and configuration
Comparing music generation use cases
Creating a new configuration
Training and tuning a model
Organizing datasets and training data
Training on a CPU or a GPU
Training RNN models
Creating the dataset and launching the training
Launching the evaluation
Looking at TensorBoard
Explaining underfitting and overfitting
Fixing underfitting
Fixing overfitting
Defining network size and hyperparameters
Determining the batch size
Fixing out of memory errors
Fixing a wrong network size
Fixing a model not converging
Fixing not enough training data
Configuring attention and other hyperparameters
Generating sequences from a trained model
Using a specific checkpoint to implement early stops
Packaging and distributing the result using bundles
Training MusicVAE
Splitting the dataset into evaluation and training sets
Launching the training and evaluation
Distributing a trained model
Training other models
Using Google Cloud Platform
Creating and configuring an account
Preparing an SSH key (optional)
Creating a VM instance from a Tensforflow image
Initializing the VM
Installing the NVIDIA CUDA drivers
Installing Magenta GPU
Launching the training
Summary
Questions
Further reading
Section 4: Making Your Models Interact with Other Applications
Magenta in the Browser with Magenta.js
Technical requirements
Introducing Magenta.js and TensorFlow.js
Introducing TensorFlow.js for machine learning in the browser
Introducing Magenta.js for music generation in the browser
Converting trained models for Magenta.js
Downloading pre-trained models locally
Introducing Tone.js for sound synthesis in the browser
Creating a Magenta.js web application
Generating instruments in the browser using GANSynth
Writing the page structure
Sampling audio using GANSynth
Launching the web application
Generating a trio using MusicVAE
Using a SoundFont for more realistic-sounding instruments
Playing generated instruments in a trio
Using the Web Workers API to offload computations from the UI thread
Using other Magenta.js models
Making Magenta.js interact with other apps
Using the Web MIDI API
Running Magenta.js server side with Node.js
Summary
Questions
Further reading
Making Magenta Interact with Music Applications
Technical requirements
Sending MIDI to a DAW or synthesizer
Introducing some DAWs
Looking at MIDI ports using Mido
Creating a virtual MIDI port on macOS and Linux
Creating a virtual MIDI port on Windows using loopMIDI
Adding a virtual MIDI port on macOS
Sending generated MIDI to FluidSynth
Sending generated MIDI to a DAW
Using NSynth generated samples as instruments
Looping the generated MIDI
Using the MIDI player to loop a sequence
Synchronizing Magenta with a DAW
Sending MIDI clock and transport
Using MIDI control message
Using Ableton Link to sync devices
Sending MIDI to a hardware synthesizer
Using Magenta as a standalone application with Magenta Studio
Looking at Magenta Studio's content
Integrating Magenta Studio in Ableton Live
Summary
Questions
Further Reading
Assessments
Chapter 1: Introduction to Magenta and Generative Art
Chapter 2: Generating Drum Sequences with the Drums RNN
Chapter 3: Generating Polyphonic Melodies
Chapter 4: Latent Space Interpolation with MusicVAE
Chapter 5: Audio Generation with NSynth and GANSynth
Chapter 6: Data Preparation for Training
Chapter 7: Training Magenta Models
Chapter 8: Magenta in the Browser with Magenta.js
Chapter 9: Making Magenta Interact with Music Applications
Other Books You May Enjoy
Leave a review - let other readers know what you think
The place of machine learning in art is becoming more and more strongly established because of recent advancements in the field. Magenta is at the forefront of that innovation. This book provides a hands-on approach to machine learning models for music generation and demonstrates how to integrate them into an existing music production workflow. Complete with practical examples and explanations of the theoretical background required to understand the underlying technologies, this book is the perfect starting point to begin exploring music generation.In Hands-On Music Generation with Magenta, you'll learn how to use models in Magenta to generate percussion sequences, monophonic and polyphonic melodies in MIDI, and instrument sounds in raw audio. We'll be seeing plenty of practical examples and in-depth explanations of machine learning models, such as Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs). Leveraging that knowledge, we'll be creating and training our own models for advanced music generation use cases, and we'll be tackling the preparation of new datasets. Finally, we'll be looking at integrating Magenta with other technologies, such as Digital Audio Workstations (DAWs), and using Magenta.js to distribute music generation applications in the browser.By the end of this book, you'll be proficient in everything Magenta has to offer and equipped with sufficient knowledge to tackle music generation in your own style.
This book will appeal to both technically inclined artists and musically inclined computer scientists. It is directed to any reader who wants to gain hands-on knowledge about building generative music applications that use deep learning. It doesn't assume any musical or technical competence from you, apart from basic knowledge of the Python programming language.
Chapter 1, Introduction to Magenta and Generative Art, will show you the basics of generative music and what already exists. You'll learn about the new techniques of artwork generation, such as machine learning, and how those techniques can be applied to produce music and art. Google's Magenta open source research platform will be introduced, along with Google's open source machine learning platform TensorFlow, along with an overview of its different parts and the installation of the required software for this book. We'll finish the installation by generating a simple MIDI file on the command line.
Chapter 2, Generating Drum Sequences with the Drums RNN, will show you what many consider the foundation of music—percussion. We'll show the importance of RNNs for music generation. You'll then learn how to use the Drums RNN model using a pre-trained drum kit model, by calling it in the command-line window and directly in Python, to generate drum sequences. We'll introduce the different model parameters, including the model's MIDI encoding, and show how to interpret the output of the model.
Chapter 3, Generating Polyphonic Melodies, will show the importance of Long Short-Term Memory (LSTM) networks in generating longer sequences. We'll see how to use a monophonic Magenta model, the Melody RNN—an LSTM network with a loopback and attention configuration. You'll also learn to use two polyphonic models, the Polyphony RNN and Performance RNN, both LSTM networks using a specific encoding, with the latter having support for note velocity and expressive timing.
Chapter 4, Latent Space Interpolation with MusicVAE, will show the importance of continuous latent space of VAEs and its importance in music generation compared to standard autoencoders (AEs). We'll use the MusicVAE model, a hierarchical recurrent VAE, from Magenta to sample sequences and then interpolate between them, effectively morphing smoothly from one to another. We'll then see how to add groove, or humanization, to an existing sequence, using the GrooVAE model. We'll finish by looking at the TensorFlow code used to build the VAE model.
Chapter 5, Audio Generation with NSynth and GANSynth, will show audio generation. We'll first provide an overview of WaveNet, an existing model for audio generation, especially efficient in text to speech applications. In Magenta, we'll use NSynth, a Wavenet Autoencoder model, to generate small audio clips, that can serve as instruments for a backing MIDI score. NSynth also enables audio transformation like scaling, time stretching and interpolation. We'll also use GANSynth, a faster approach based on GAN.
Chapter 6, Data Preparation for Training, will show how training our own models is crucial since it allows us to generate music in a specific style, generate specific structures or instruments. Building and preparing a dataset is the first step before training our own model. To do that, we first look at existing datasets and APIs to help us find meaningful data. Then, we build two datasets in MIDI for specific styles—dance and jazz. Finally, we prepare the MIDI files for training using data transformations and pipelines.
Chapter 7, Training Magenta models, will show how to tune hyperparameters, like batch size, learning rate, and network size, to optimize network performance and training time. We’ll also show common training problems such as overfitting and models not converging. Once a model's training is complete, we'll show how to use the trained model to generate new sequences. Finally, we'll show how to use the Google Cloud Platform to train models faster on the cloud.
Chapter 8, Magenta in the browser with Magenta.js, will show a JavaScript implementation of Magenta that gained popularity for its ease of use, since it runs in the browser and can be shared as a web page. We'll introduce TensorFlow.js, the technology Magenta.js is built upon, and show what models are available in Magenta.js, including how to convert our previously trained models. Then, we'll create small web applications using GANSynth and MusicVAE for sampling audio and sequences respectively. Finally, we'll see how Magenta.js can interact with other applications, using the Web MIDI API and Node.js.
Chapter 9, Making Magenta Interact with Music Applications, will show how Magenta fits in a broader picture by showing how to make it interact with other music applications such as DAWs and synthesizers. We'll explain how to send MIDI sequences from Magenta to FluidSynth and DAWs using the MIDI interface. By doing so, we'll learn how to handle MIDI ports on all platforms and how to loop MIDI sequences in Magenta. We'll show how to synchronize multiple applications using MIDI clocks and transport information. Finally, we'll cover Magenta Studio, a standalone packaging of Magenta based on Magenta.js that can also integrate into Ableton Live as a plugin.
This book doesn't require any specific knowledge about music or machine learning to enjoy, as we'll be covering all the technical aspects regarding those two subjects throughout the book. However, we do assume that you have some programming knowledge using Python. The code we provide is thoroughly commented and explained, though, which makes it easy for newcomers to use and understand.
The provided code and content works on all platforms, including Linux, macOS, and Windows. We'll be setting up the development environment as we go along, so you don't need any specific setup before we start. If you already are using an Integrated Development Environment (IDE) and a DAW, you'll be able to use them during the course of this book.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Music-Generation-with-Magenta. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781838824419_ColorImages.pdf.
Visit the following link to check out videos of the code being run:http://bit.ly/2uHplI4
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This section consists of an introduction to artwork generation and the use of machine learning in the field, with a comprehensive overview of Magenta and TensorFlow. We'll go through the different models used in music generation and explain why those models are important.
This section contains the following chapter:
Chapter 1
,
Introduction to Magenta and Generative Art
In this chapter, you'll learn the basics of generative music and what already exists. You'll learn about the new techniques of artwork generation, such as machine learning, and how those techniques can be applied to produce music and art. Google's Magenta open source research platform will be introduced, along with Google's open source machine learning platform TensorFlow, along with an overview of its different parts and the installation of the required software for this book. We'll finish the installation by generating a simple MIDI file on the command line.
The following topics will be covered in this chapter:
Overview of generative artwork
New techniques with machine learning
Magenta and TensorFlow in music generation
Installing Magenta
Installing the music software and synthesizers
Installing the code editing software
Generating a basic MIDI file
In this chapter, we'll use the following tools:
Python
,
Conda
, and
pip
, to install and execute the Magenta environment
Magenta
, to test our setup by performing music generation
Magenta GPU
(
optional
), CUDA drivers, and cuDNN drivers, to make Magenta run on the GPU
FluidSynth
, to listen to the generated music sample using a software synthesizer
Other optional software we might use throughout this book, such as
Audacity
for audio editing,
MuseScore
for sheet music editing, and
Jupyter Notebook
for code editing.
It is recommended that you follow this book's source code when you read the chapters in this book. The source code also provides useful scripts and tips. Follow these steps to check out the code in your user directory (you can use another location if you want):
First, you need to install Git, which can be installed on any platform by downloading and executing the installer at
git-scm.com/downloads
. Then, follow the prompts and make sure you add the program to your PATH so that it is available on the command line.
Then, clone the source code repository by opening a new Terminal and executing the following command:
> git clone https://github.com/PacktPublishing/hands-on-music-generation-with-magenta
> cd hands-on-music-generation-with-magenta
Each chapter has its own folder; Chapter01, Chapter02, and so on. For example, the code for this chapter is located at https://github.com/PacktPublishing/hands-on-music-generation-with-magenta/tree/master/Chapter01. The examples and code snippets will be located in this chapter's folder. For this chapter, you should open cd Chapter01 before you start.
Check out the following video to see the Code in Action:http://bit.ly/2O847tW
The term generative art has been coined with the advent of the computer, and since the very beginning of computer science, artists and scientists used technology as a tool to produce art. Interestingly, generative art predates computers, because generative systems can be derived by hand.
In this section, we'll provide an overview of generative music by showing you interesting examples from art history going back to the 18th century. This will help you understand the different types of generative music by looking at specific examples and prepare the groundwork for later chapters.
There's a lot of examples of generative art in the history of mankind. A popular example dates back to the 18th century, where a game called Musikalisches Würfelspiel (German for musical dice game) grew popular in Europe. The concept of the game was attributed to Mozart by Nikolaus Simrock in 1792, though it was never confirmed to be his creation.
The players of the game throw a dice and from the result, select one of the predefined 272 musical measures from it. Throwing the dice over and over again allows the players to compose a full minute (the musical genre that is generated by the game) that respects the rules of the genre because it was composed in such a way that the possible arrangements sound pretty.
In the following table and the image that follows, a small part of a musical dice game can be seen. In the table, the y-axis represents the dice throw outcome while the x-axis represents the measure of the score you are currently generating. The players will throw two dices 16 times:
On the first throw of two dices, we read the first column. A total of two will output the measure 96 (first row), a total of two will output the measure 32 (second row), and so on.
On the second throw of two dices, we read the second column. A total of two will output the measure 22 (first row), a total of three will output the measure 6 (second row), and so on.
After 16 throws, the game will have output 16 measures for the index:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
96
22
141
41
105
122
11
30
70
121
26
9
112
49
109
14
3
32
6
128
63
146
46
134
81
117
39
126
56
174
18
116
83
4
69
95
158
13
153
55
110
24
66
139
15
132
73
58
145
79
5
40
17
113
85
161
2
159
100
90
176
7
34
67
160
52
170
6
148
74
163
45
80
97
36
107
25
143
64
125
76
136
1
93
7
104
157
27
167
154
68
118
91
138
71
150
29
101
162
23
151
8
152
60
171
53
99
133
21
127
16
155
57
175
43
168
89
172
9
119
84
114
50
140
86
169
94
120
88
48
166
51
115
72
111
10
98
142
42
156
75
129
62
123
65
77
19
82
137
38
149
8
11
3
87
165
61
135
47
147
33
102
4
31
164
144
59
173
78
12
54
130
10
103
28
37
106
5
35
20
108
92
12
124
44
131
The preceding table shows a small part of the whole score, with each measure annotated with an index. For each of the generated 16 indexes, we take the corresponding measure in order, which constitutes our minuet (the minuet is the style that's generated by this game – basically, it's a music score with specific rules).
There are different types of generative properties:
Chance or randomness
, which the dice game is a good example of, where the outcome of the generated art is partially or totally defined by chance. Interestingly, adding randomness to a process in art is often seen as
humanizing
the process, since an underlying rigid algorithm might generate something that sounds
artificial
.
Algorithmic generation
(or rule-based generation), where the rules of the generation will define its outcome. Good examples of such generation include a cellular automaton, such as the popular Conway's Game of Life, a game where a grid of cells changes each iteration according to predefined rules: each cell might be on or off, and the neighboring cells are updated as a function of the grid's state and rules. The result of such generation is purely deterministic; it has no randomness or probability involved.
Stochastic-based generation
, where sequences are derived from the probability of elements. Examples of this include Markov chains, a stochastic model in which for each element of a sequence, the resulting probability of the said event is defined only on the present state of the system. Another good example of stochastic-based generation is machine learning generation, which we'll be looking at throughout this book.
We will use a simple definition of generative art for this book:
By now, you should understand that we don't actually need a computer to generate art since the rules of a system can be derived by hand. But using a computer makes it possible to define complex rules and handle tons of data, as we'll see in the following chapters.
The first instance of generative art by computer dates back to 1957, where Markov chains were used to generate a score on an electronic computer, the ILLIAC I, by composers Lejaren Hiller and Leonard Issacson. Their paper, Musical Composition with a High-Speed Digital Computer, describes the techniques that were used in composing the music. The composition, titled Illac Suite, consists of four movements, each exploring a particular technique of music generation, from a rule-based generation of cantus firmi to stochastic generation with Markov chain.
Many famous examples of generative composition have followed since, such as Xenakis's Atrées in 1962, which explored the idea of stochastic composition; Ebcioglo's composition software named CHORAL, which contained handcrafted rules; and David Cope's software called EMI, which extended the concept to be able to learn from a corpus of scores.
As of today, generative music is everywhere. A lot of tools allow musicians to compose original music based on the generative techniques we described previously. A whole genre and musical community, called algorave, originated from those techniques. Stemming from the underground electronic music scene, musicians use generative algorithms and software to produce live dance music on stage, hence the name of the genre. Software such as TidalCycles and Orca allow the musician to define rules on the fly and let the system generate the music autonomously.
Looking back on those techniques, stochastic models such as Markov chains have been widely used in generative music. It stems from the fact that they are conceptually simple and easy to represent since the model is a transition probability table and can learn from a few examples. The problem with Markov models is that representing a long-term temporal structure is hard since most models will only consider n previous states, where n is small, to define the resulting probability. Let's take a look at what other types of models can be used to generate music.
Machine learning is important for computer science because it allows complex functions to be modeled without them being explicitly written. Those models are automatically learned from examples, instead of being manually defined. This has a huge implication for arts in general since explicitly writing the rules of a painting or a musical score is inherently difficult.
In recent years, the advent of deep learning has propelled machine learning to new heights in terms of efficiency. Deep learning is especially important for our use case of music generation since using deep learning techniques doesn't require a preprocessing step of feature extraction, which is necessary for classical machine learning and hard to do on raw data such as image, text, and – you guessed it – audio. In other words, traditional machine learning algorithms do not work well for music generation. Therefore, all the networks in this book will be deep neural networks.
In this section, we'll learn what advances in deep learning allow for music generation and introduce the concepts we'll be using throughout this book. We'll also look at the different types of musical representations for those algorithms, which is important as it will serve as the groundwork for this book for data in general.
We all know that deep learning has recently become a fast-growing domain in computer science. Not so long ago, no deep learning algorithms could outperform standard techniques. That was before 2012 when, for the first time, a deep learning algorithm, AlexNet, did better in an image classification competition by using a deep neural network trained on GPUs (see the Further reading section for the AlexNet paper, one of the most influential papers that was published in computer vision). Neural network techniques are more than 30 years old, but the recent reemergence can be explained by the availability of massive data, efficient computing power, and technical advances.
Most importantly, a deep learning technique is general, in the sense that, as opposed to the music generation techniques we've specified previously, a machine learning system is agnostic and can learn from an arbitrary corpus of music. The same system can be used in multiple musical genres, as we'll see during this book when we train an existing model on jazz music in Chapter 6, Data Preparation for Training.
Many techniques in deep learning were discovered a long time ago but only find meaningful usage today. Of the technical advances in the field that concern music generation, those are present in Magenta and will be explained later in this book:
Recurrent Neural Networks
(
RNNs
) are interesting for music generation because they allow us to operate over sequences of vectors for the input and output. When using classic neural networks or convolutional networks (which are used in image classification), you are limited to a fixed size input vector to produce a fixed size output vector, which would be very limiting for music processing, but works well for certain types of image processing. The other advantage of RNN is the possibility of producing a new state vector at each pass by combining a function with the previous state vector, which a powerful mean of describing complex behavior and long-term state. We'll be talking about RNNs in
Chapter 2
,
Generating Drum Sequences with Drums RNN
.
Long Sho
rt-Term Memory
(
LSTM
) is an RNN with slightly different properties. It solves the problem of vanishing gradients that is present in RNNs and makes it impossible for the network to learn long-term dependencies, even if it theoretically could. The approach of using LSTM in music generation has been presented by Douglas Eck and Jurgen Schmidhuber in 2002 in a paper called
Finding temporal structure in music: Blues improvisation with LSTM recurrent networks
. We'll be talking about LSTM in
Chapter 3
,
Generating Polyphonic Melodies
.
Variational autoencoders
(
VAEs
) are analogous to classical autoencoders, in the sense that their architecture is similar, consisting of an encoder (for the input to a hidden layer), a decoder (for a hidden layer to the output), and a loss function, with the model learning to reconstruct the original input with specific constraints. The usage of VAE in generative models is recent but has shown interesting results. We'll be talking about VAE in
Chapter 4
,
Latent Space Interpolation with Music VAE
.
Generative adversarial networks
(
GANs
) are a class of machine learning systems where two neural networks compete with each other in a game: a generative network generates candidates while a discriminating network evaluates them. We'll be talking about GANs in
Chapter 5
,
Audio Generation with NSynth and GANSynth
.
Recent deep learning advances have profoundly changed not only music generation but also genre classification, audio transcription, note detection, composition, and more. We won't be talking about these subjects here, but they all share common ground: musical representation.
These systems can work with different representations:
Symbolic representation
, such as the
MIDI
(
Musical Instrument Digital Interface
(
MIDI
), describes the music using a notation containing the musical notes and timing, but not the sound or timbre of the actual sound. In general, sheet music is a good example of this. A symbolic representation of music has no sound by itself; it has to be played by instruments.
Sub-symbolic representation
, such as a raw audio waveform or a spectrogram, describes the actual sound of the music.
Different processes will require a different representation. For example, most speech recognition and synthesis models work with spectrograms, while most of the examples we will see in this book uses MIDI to generate music scores. Processes that integrate both representations are rare, but an example of this could be a score transcription that takes an audio file and translate it into MIDI or other symbolic representations.
There are other symbolic representations than MIDI, such as MusicXML and AbcNotation, but MIDI is by far the most common representation. The MIDI specification also doubles down as a protocol since it is used to carry note messages that can be used in real-time performance as well as control messages.
Let's consider some parts of a MIDI message that will be useful for this book:
Channel [0-15]
: This indicates the track that the message is sent on
Note number [0-127]
: This indicates the pitch of the note
Velocity [0-127]
: This indicates the volume of the note
To represent a musical note in MIDI, you have to send two different message types with proper timing: a Note On event, followed by a Note Off event. This implicitly defines the length of the note, which is not present in the MIDI message. This is important because MIDI was defined with live performance in mind, so using two messages – one for a keypress and another for a key release – makes sense.
From a data perspective, we'll need either need to convert MIDI notes into a format that has the note length encoded in it or keep a note on and note off approach, depending on what we're trying to do. For each model in Magenta, we'll see how the MIDI notes are encoded.
The following image shows a MIDI representation of a generated drum file, shown as a plot of time and pitch. Each MIDI note is represented by a rectangle. Because of the nature of percussion data, all the notes have the same length ("note on" followed by "note off" messages), but in general, that could vary. A drum file, by essence, is polyphonic, meaning that multiple notes can be played at the same time. We'll be talking about monophony and polyphony in the upcoming chapters.
Note that the abscissa is expressed in seconds, but it is also common to note it with bars or measures. The MIDI channel is absent from this diagram:
In the case of music generation, the majority of current deep learning systems use symbolic notation. This is also the case with Magenta. There are a couple of reasons for this:
It is easier to represent the essence of music in terms of composition and harmony with symbolic data.
Processing those two types of representations by using a deep learning network is similar, so choosing between both boils down to whichever is faster and more convenient. A good example of this is that the WaveNet audio generation network also has a MIDI implementation, known as the MidiNet symbolic generation network.
We'll see that the MIDI format is not directly used by Magenta, but converted into and from NoteSequence, a Protocol Buffers (Protobuf) implementation of the musical structure that is then used by TensorFlow. This is hidden from the end user since the input and output data is always MIDI. The NoteSequence implementation is useful because it implements a data format that can be used by the models for training. For example, instead of using two messages to define a note's length, a Note in a NoteSequence has a length attribute. We'll be explaining the NoteSequence implementation as we go along.
An audio waveform is a graph displaying amplitude changes over time. Zoomed out, a waveform looks rather simple and smooth, but zoomed in, we can see tiny variations – it is those variations that represent the sound.
To illustrate how a waveform works, imagine a speaker cone that's is at rest when the amplitude is at 0. If the amplitude moves to a negative value of 1, for example, then the speaker moves backward a little bit, or forward in the case of a positive value. For each amplitude variation, the speaker will move, making the air move, thus making your eardrums move.
The bigger the amplitude is in the waveform, the more the speaker cone moves in terms of distance, and the louder the sound. This is expressed in decibel (dB), a measure of sound pressure.
The faster the movement, the higher the pitch. This is expressed in hertz (Hz).
In the following image, we can see the MIDI file from the previous section played by instruments to make a WAV recording. The instrument that's being used is a 1982 Roland TR-808 drum sample pack. You can visually match some instruments, such as double the Conga Mid (MIDI note 48) at around 4.5 seconds. In the upper right corner, you can see a zoom of the waveform at 100th of a second to show the actual amplitude change:
In machine learning, using a raw audio waveform used to be uncommon as a data source since the computational load is bigger than other transformed representations, both in terms of memory and processing. But recent advances in the field, such as WaveNet models, makes it on par with other methods of representing audio, such as spectrograms, which were historically more popular for machine learning algorithms, especially for speech recognition and synthesis.
Bear in mind that training on audio is really cost-intensive because raw audio is a dense medium. Basically, a waveform is a digital recreation of a dynamic voltage over time. Simply put, a process called Pulse Code Modulation (PCM) assigns a bit value to each sample at the sampling rate you are running. The sampling rate for recording purposes is pretty standard: 44,100 Hz, which is called the Nyquist Frequency. But you don't always need a 44,100 Hz sample rate; for example, 16,000 Hz is more than enough to cover human speech frequencies. At that frequency, the first second of audio is represented by 16,000 samples.
If you want to know more about PCM, the sampling theory for audio, and the Nyquist Frequency, check out the Further reading section at the end of this chapter.