Hands-On Music Generation with Magenta - Alexandre DuBreuil - E-Book

Hands-On Music Generation with Magenta E-Book

Alexandre DuBreuil

0,0
36,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Design and use machine learning models for music generation using Magenta and make them interact with existing music creation tools




Key Features



  • Learn how machine learning, deep learning, and reinforcement learning are used in music generation


  • Generate new content by manipulating the source data using Magenta utilities, and train machine learning models with it


  • Explore various Magenta projects such as Magenta Studio, MusicVAE, and NSynth



Book Description



The importance of machine learning (ML) in art is growing at a rapid pace due to recent advancements in the field, and Magenta is at the forefront of this innovation. With this book, you'll follow a hands-on approach to using ML models for music generation, learning how to integrate them into an existing music production workflow. Complete with practical examples and explanations of the theoretical background required to understand the underlying technologies, this book is the perfect starting point to begin exploring music generation.






The book will help you learn how to use the models in Magenta for generating percussion sequences, monophonic and polyphonic melodies in MIDI, and instrument sounds in raw audio. Through practical examples and in-depth explanations, you'll understand ML models such as RNNs, VAEs, and GANs. Using this knowledge, you'll create and train your own models for advanced music generation use cases, along with preparing new datasets. Finally, you'll get to grips with integrating Magenta with other technologies, such as digital audio workstations (DAWs), and using Magenta.js to distribute music generation apps in the browser.






By the end of this book, you'll be well-versed with Magenta and have developed the skills you need to use ML models for music generation in your own style.




What you will learn



  • Use RNN models in Magenta to generate MIDI percussion, and monophonic and polyphonic sequences


  • Use WaveNet and GAN models to generate instrument notes in the form of raw audio


  • Employ Variational Autoencoder models like MusicVAE and GrooVAE to sample, interpolate, and humanize existing sequences


  • Prepare and create your dataset on specific styles and instruments


  • Train your network on your personal datasets and fix problems when training networks


  • Apply MIDI to synchronize Magenta with existing music production tools like DAWs



Who this book is for



This book is for technically inclined artists and musically inclined computer scientists. Readers who want to get hands-on with building generative music applications that use deep learning will also find this book useful. Although prior musical or technical competence is not required, basic knowledge of the Python programming language is assumed.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 418

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Music Generation with Magenta

 

 

Explore the role of deep learning in music generation and assisted music composition

 

 

 

 

 

 

 

Alexandre DuBreuil

 

 

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On Music Generation with Magenta

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Mrinmayee KawalkarAcquisition Editor: Ali AbidiContent Development Editor:Nazia ShaikhSenior Editor: Ayaan HodaTechnical Editor:Joseph SunilCopy Editor: Safis EditingProject Coordinator:Aishwarya MohanProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Deepika Naik

First published: January 2020

Production reference: 1300120

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-83882-441-9

www.packt.com

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Alexandre DuBreuil is a software engineer and generative music artist. Through collaborations with bands and artists, he has worked on many generative art projects, such as generative video systems for music bands in concerts that create visuals based on the underlying musical structure, a generative drawing software that creates new content based on a previous artist's work, and generative music exhibits in which the generation is based on real-time events and data. Machine learning has a central role in his music generation projects, and Alexandre has been using Magenta since its release for inspiration, music production, and as the cornerstone for making autonomous music generation systems that create endless soundscapes.

 

 

About the reviewer

Gogul Ilango has a bachelor's degree in electronics and communication engineering from Thiagarajar College of Engineering, Madurai, and a master's degree in VLSI design and embedded systems from Anna University, Chennai, where he was awarded the University Gold Medal for academic performance. He has published four research papers in top conferences and journals related to artificial intelligence. His passion for music production and deep learning led him to learn about and contribute to Google's Magenta community, where he created an interactive web application called DeepDrum as well as DeepArp using Magenta.js, available in Magenta's community contribution demonstrations. He is a lifelong learner, hardware engineer, programmer, and music producer.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Music Generation with Magenta

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Code in Action

Get in touch

Reviews

Section 1: Introduction to Artwork Generation

Introduction to Magenta and Generative Art

Technical requirements

Overview of generative art

Pen and paper generative music

Computerized generative music

New techniques with machine learning

Advances in deep learning

Representation in music processes

Representing music with MIDI

Representing music as a waveform

Representing music with a spectrogram

Google's Magenta and TensorFlow in music generation

Creating a music generation system

Looking at Magenta's content

Differentiating models, configurations, and pre-trained models

Generating and stylizing images

Generating audio

Generating, interpolating, and transforming score

Installing Magenta and Magenta for GPU

Choosing the right versions

Creating a Python environment with Conda

Installing prerequisite software

Installing Magenta

Installing Magenta for GPU (optional)

Installing the music software and synthesizers

Installing the FluidSynth software synthesizer

Installing SoundFont

Installing FluidSynth

Testing your installation

Using a hardware synthesizer (optional)

Installing Audacity as a digital audio editor (optional)

Installing MuseScore for sheet music (optional)

Installing a Digital Audio Workstation (optional)

Installing the code editing software

Installing Jupyter Notebook (optional)

Installing and configuring an IDE (optional)

Generating a basic MIDI file

Summary

Questions

Further reading

Section 2: Music Generation with Machine Learning

Generating Drum Sequences with the Drums RNN

Technical requirements

The significance of RNNs in music generation

Operating on a sequence of vectors

Remember the past to better predict the future

Using the right terminology for RNNs

Using the Drums RNN on the command line

Magenta's command-line utilities

Generating a simple drum sequence

Understanding the model's parameters

Changing the output size

Changing the tempo

Changing the model type

Priming the model with Led Zeppelin

Configuring the generation algorithm

Other Magenta and TensorFlow flags

Understanding the generation algorithm

Generating the sequence branches and steps

Making sense of the randomness

Using the Drums RNN in Python

Generating a drum sequence using Python

Packaging checkpoints as bundle files

Encoding MIDI using Protobuf in NoteSequence

Mapping MIDI notes to the real world

Encoding percussion events as classes

Sending MIDI files to other applications

Summary

Questions

Further reading

Generating Polyphonic Melodies

Technical requirements

LSTM for long-term dependencies

Looking at LSTM memory cells

Exploring alternative networks

Generating melodies with the Melody RNN

Generating a song for Fur Elisa

Understanding the lookback configuration

Understanding the attention mask

Losing track of time

Generating polyphony with the Polyphony RNN and Performance RNN

Differentiating conditioning and injection

Explaining the polyphonic encoding

Performance music with the Performance RNN

Generating expressive timing like a human

Summary

Questions

Further reading

Latent Space Interpolation with MusicVAE

Technical requirements

Continuous latent space in VAEs

The latent space in standard AEs

Using VAEs in generating music

Score transformation with MusicVAE and GrooVAE

Initializing the model

Sampling the latent space

Writing the sampling code

Refining the loss function with KL divergence

Sampling from the same area of the latent space

Sampling from the command line

Interpolating between two samples

Getting the sequence length right

Writing the interpolation code

Interpolating from the command line

Humanizing the sequence

Writing the humanizing code

Humanizing from the command line

More interpolation on melodies

Sampling the whole band

An overview of other pre-trained models

Understanding TensorFlow code

Building the VAE graph

Building an encoder with BidirectionalLstmEncoder

Building a decoder with CategoricalLstmDecoder

Building the hidden layer

Looking at the sample method

Looking at the interpolate method

Looking at the groove method

Summary

Questions

Further reading

Audio Generation with NSynth and GANSynth

Technical requirements

Learning about WaveNet and temporal structures for music

Looking at NSynth and WaveNet autoencoders

Visualizing audio using a constant-Q transform spectrogram

The NSynth dataset

Neural audio synthesis with NSynth

Choosing the WaveNet model

Encoding the WAV files

Visualizing the encodings

Saving the encodings for later use

Mixing encodings together by moving in the latent space

Synthesizing the mixed encodings to WAV

Putting it all together

Preparing the audio clips

Generating new instruments

Visualizing and listening to our results

Using NSynth generated samples as instrument notes

Using the command line

More of NSynth

Using GANSynth as a generative instrument

Choosing the acoustic model

Getting the notes information

Gradually sampling from the latent space

Generating random instruments

Getting the latent vectors

Generating the samples from the encoding

Putting it all together

Using the command line

Summary

Questions

Further reading

Section 3: Training, Learning, and Generating a Specific Style

Data Preparation for Training

Technical requirements

Looking at existing datasets

Looking at symbolic representations

Building a dataset from the ground up

Using the LMD for MIDI and audio files

Using the MSD for metadata information

Using the MAESTRO dataset for performance music

Using the Groove MIDI Dataset for groovy drums

Using the Bach Doodle Dataset

Using the NSynth dataset for audio content

Using APIs to enrich existing data

Looking at other data sources

Building a dance music dataset

Threading the execution to handle large datasets faster

Extracting drum instruments from a MIDI file

Detecting specific musical structures

Analyzing the beats of our MIDI files

Writing the process method

Calling the process method using threads

Plotting the results using Matplotlib

Processing a sample of the dataset

Building a jazz dataset

The LMD extraction tools

Fetching a song's genre using the Last.fm API

Reading information from the MSD

Using top tags to find genres

Finding instrument classes using MIDI

Extracting jazz, drums, and piano tracks

Extracting and merging jazz drums

Extracting and splitting jazz pianos

Preparing the data using pipelines

Refining the dataset manually

Looking at the Melody RNN pipeline

Launching the data preparation stage on our dataset

Understanding a pipeline execution

Writing your own pipeline

Looking at MusicVAE data conversion

Summary

Questions

Further reading

Training Magenta Models

Technical requirements

Choosing the model and configuration

Comparing music generation use cases

Creating a new configuration

Training and tuning a model

Organizing datasets and training data

Training on a CPU or a GPU

Training RNN models

Creating the dataset and launching the training

Launching the evaluation

Looking at TensorBoard

Explaining underfitting and overfitting

Fixing underfitting

Fixing overfitting

Defining network size and hyperparameters

Determining the batch size

Fixing out of memory errors

Fixing a wrong network size

Fixing a model not converging

Fixing not enough training data

Configuring attention and other hyperparameters

Generating sequences from a trained model

Using a specific checkpoint to implement early stops

Packaging and distributing the result using bundles

Training MusicVAE

Splitting the dataset into evaluation and training sets

Launching the training and evaluation

Distributing a trained model

Training other models

Using Google Cloud Platform

Creating and configuring an account

Preparing an SSH key (optional)

Creating a VM instance from a Tensforflow image

Initializing the VM

Installing the NVIDIA CUDA drivers

Installing Magenta GPU

Launching the training

Summary

Questions

Further reading

Section 4: Making Your Models Interact with Other Applications

Magenta in the Browser with Magenta.js

Technical requirements

Introducing Magenta.js and TensorFlow.js

Introducing TensorFlow.js for machine learning in the browser

Introducing Magenta.js for music generation in the browser

Converting trained models for Magenta.js

Downloading pre-trained models locally

Introducing Tone.js for sound synthesis in the browser

Creating a Magenta.js web application

Generating instruments in the browser using GANSynth

Writing the page structure

Sampling audio using GANSynth

Launching the web application

Generating a trio using MusicVAE

Using a SoundFont for more realistic-sounding instruments

Playing generated instruments in a trio

Using the Web Workers API to offload computations from the UI thread

Using other Magenta.js models

Making Magenta.js interact with other apps

Using the Web MIDI API

Running Magenta.js server side with Node.js

Summary

Questions

Further reading

Making Magenta Interact with Music Applications

Technical requirements

Sending MIDI to a DAW or synthesizer

Introducing some DAWs

Looking at MIDI ports using Mido

Creating a virtual MIDI port on macOS and Linux

Creating a virtual MIDI port on Windows using loopMIDI

Adding a virtual MIDI port on macOS

Sending generated MIDI to FluidSynth

Sending generated MIDI to a DAW

Using NSynth generated samples as instruments

Looping the generated MIDI

Using the MIDI player to loop a sequence

Synchronizing Magenta with a DAW

Sending MIDI clock and transport

Using MIDI control message

Using Ableton Link to sync devices

Sending MIDI to a hardware synthesizer

Using Magenta as a standalone application with Magenta Studio

Looking at Magenta Studio's content

Integrating Magenta Studio in Ableton Live

Summary

Questions

Further Reading

Assessments

Chapter 1: Introduction to Magenta and Generative Art

Chapter 2: Generating Drum Sequences with the Drums RNN

Chapter 3: Generating Polyphonic Melodies

Chapter 4: Latent Space Interpolation with MusicVAE

Chapter 5: Audio Generation with NSynth and GANSynth

Chapter 6: Data Preparation for Training

Chapter 7: Training Magenta Models

Chapter 8: Magenta in the Browser with Magenta.js

Chapter 9: Making Magenta Interact with Music Applications

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

The place of machine learning in art is becoming more and more strongly established because of recent advancements in the field. Magenta is at the forefront of that innovation. This book provides a hands-on approach to machine learning models for music generation and demonstrates how to integrate them into an existing music production workflow. Complete with practical examples and explanations of the theoretical background required to understand the underlying technologies, this book is the perfect starting point to begin exploring music generation.In Hands-On Music Generation with Magenta, you'll learn how to use models in Magenta to generate percussion sequences, monophonic and polyphonic melodies in MIDI, and instrument sounds in raw audio. We'll be seeing plenty of practical examples and in-depth explanations of machine learning models, such as Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs). Leveraging that knowledge, we'll be creating and training our own models for advanced music generation use cases, and we'll be tackling the preparation of new datasets. Finally, we'll be looking at integrating Magenta with other technologies, such as Digital Audio Workstations (DAWs), and using Magenta.js to distribute music generation applications in the browser.By the end of this book, you'll be proficient in everything Magenta has to offer and equipped with sufficient knowledge to tackle music generation in your own style.

Who this book is for

This book will appeal to both technically inclined artists and musically inclined computer scientists. It is directed to any reader who wants to gain hands-on knowledge about building generative music applications that use deep learning. It doesn't assume any musical or technical competence from you, apart from basic knowledge of the Python programming language.

What this book covers

Chapter 1, Introduction to Magenta and Generative Art, will show you the basics of generative music and what already exists. You'll learn about the new techniques of artwork generation, such as machine learning, and how those techniques can be applied to produce music and art. Google's Magenta open source research platform will be introduced, along with Google's open source machine learning platform TensorFlow, along with an overview of its different parts and the installation of the required software for this book. We'll finish the installation by generating a simple MIDI file on the command line.

Chapter 2, Generating Drum Sequences with the Drums RNN, will show you what many consider the foundation of music—percussion. We'll show the importance of RNNs for music generation. You'll then learn how to use the Drums RNN model using a pre-trained drum kit model, by calling it in the command-line window and directly in Python, to generate drum sequences. We'll introduce the different model parameters, including the model's MIDI encoding, and show how to interpret the output of the model.

Chapter 3, Generating Polyphonic Melodies, will show the importance of Long Short-Term Memory (LSTM) networks in generating longer sequences. We'll see how to use a monophonic Magenta model, the Melody RNN—an LSTM network with a loopback and attention configuration. You'll also learn to use two polyphonic models, the Polyphony RNN and Performance RNN, both LSTM networks using a specific encoding, with the latter having support for note velocity and expressive timing.

Chapter 4, Latent Space Interpolation with MusicVAE, will show the importance of continuous latent space of VAEs and its importance in music generation compared to standard autoencoders (AEs). We'll use the MusicVAE model, a hierarchical recurrent VAE, from Magenta to sample sequences and then interpolate between them, effectively morphing smoothly from one to another. We'll then see how to add groove, or humanization, to an existing sequence, using the GrooVAE model. We'll finish by looking at the TensorFlow code used to build the VAE model.

Chapter 5, Audio Generation with NSynth and GANSynth, will show audio generation. We'll first provide an overview of WaveNet, an existing model for audio generation, especially efficient in text to speech applications. In Magenta, we'll use NSynth, a Wavenet Autoencoder model, to generate small audio clips, that can serve as instruments for a backing MIDI score. NSynth also enables audio transformation like scaling, time stretching and interpolation. We'll also use GANSynth, a faster approach based on GAN.

Chapter 6, Data Preparation for Training, will show how training our own models is crucial since it allows us to generate music in a specific style, generate specific structures or instruments. Building and preparing a dataset is the first step before training our own model. To do that, we first look at existing datasets and APIs to help us find meaningful data. Then, we build two datasets in MIDI for specific styles—dance and jazz. Finally, we prepare the MIDI files for training using data transformations and pipelines.

Chapter 7, Training Magenta models, will show how to tune hyperparameters, like batch size, learning rate, and network size, to optimize network performance and training time. We’ll also show common training problems such as overfitting and models not converging. Once a model's training is complete, we'll show how to use the trained model to generate new sequences. Finally, we'll show how to use the Google Cloud Platform to train models faster on the cloud.

Chapter 8, Magenta in the browser with Magenta.js, will show a JavaScript implementation of Magenta that gained popularity for its ease of use, since it runs in the browser and can be shared as a web page. We'll introduce TensorFlow.js, the technology Magenta.js is built upon, and show what models are available in Magenta.js, including how to convert our previously trained models. Then, we'll create small web applications using GANSynth and MusicVAE for sampling audio and sequences respectively. Finally, we'll see how Magenta.js can interact with other applications, using the Web MIDI API and Node.js.

Chapter 9, Making Magenta Interact with Music Applications, will show how Magenta fits in a broader picture by showing how to make it interact with other music applications such as DAWs and synthesizers. We'll explain how to send MIDI sequences from Magenta to FluidSynth and DAWs using the MIDI interface. By doing so, we'll learn how to handle MIDI ports on all platforms and how to loop MIDI sequences in Magenta. We'll show how to synchronize multiple applications using MIDI clocks and transport information. Finally, we'll cover Magenta Studio, a standalone packaging of Magenta based on Magenta.js that can also integrate into Ableton Live as a plugin.

To get the most out of this book

This book doesn't require any specific knowledge about music or machine learning to enjoy, as we'll be covering all the technical aspects regarding those two subjects throughout the book. However, we do assume that you have some programming knowledge using Python. The code we provide is thoroughly commented and explained, though, which makes it easy for newcomers to use and understand.

The provided code and content works on all platforms, including Linux, macOS, and Windows. We'll be setting up the development environment as we go along, so you don't need any specific setup before we start. If you already are using an Integrated Development Environment (IDE) and a DAW, you'll be able to use them during the course of this book.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Music-Generation-with-Magenta. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781838824419_ColorImages.pdf.

Code in Action

Visit the following link to check out videos of the code being run:http://bit.ly/2uHplI4

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Introduction to Artwork Generation

This section consists of an introduction to artwork generation and the use of machine learning in the field, with a comprehensive overview of Magenta and TensorFlow. We'll go through the different models used in music generation and explain why those models are important.

This section contains the following chapter:

Chapter 1

Introduction to Magenta and Generative Art

Introduction to Magenta and Generative Art

In this chapter, you'll learn the basics of generative music and what already exists. You'll learn about the new techniques of artwork generation, such as machine learning, and how those techniques can be applied to produce music and art. Google's Magenta open source research platform will be introduced, along with Google's open source machine learning platform TensorFlow, along with an overview of its different parts and the installation of the required software for this book. We'll finish the installation by generating a simple MIDI file on the command line.

The following topics will be covered in this chapter:

Overview of generative artwork

New techniques with machine learning

Magenta and TensorFlow in music generation

Installing Magenta

Installing the music software and synthesizers

Installing the code editing software

Generating a basic MIDI file

Technical requirements

In this chapter, we'll use the following tools:

Python

,

Conda

, and

pip

, to install and execute the Magenta environment

Magenta

, to test our setup by performing music generation

Magenta GPU

(

optional

), CUDA drivers, and cuDNN drivers, to make Magenta run on the GPU

FluidSynth

, to listen to the generated music sample using a software synthesizer

Other optional software we might use throughout this book, such as

Audacity

for audio editing,

MuseScore

for sheet music editing, and

Jupyter Notebook

for code editing.

It is recommended that you follow this book's source code when you read the chapters in this book. The source code also provides useful scripts and tips. Follow these steps to check out the code in your user directory (you can use another location if you want):

First, you need to install Git, which can be installed on any platform by downloading and executing the installer at 

git-scm.com/downloads

. Then, follow the prompts and make sure you add the program to your PATH so that it is available on the command line.

Then, clone the source code repository by opening a new Terminal and executing the following command:

> git clone https://github.com/PacktPublishing/hands-on-music-generation-with-magenta

> cd hands-on-music-generation-with-magenta

Each chapter has its own folder; Chapter01, Chapter02, and so on. For example, the code for this chapter is located at https://github.com/PacktPublishing/hands-on-music-generation-with-magenta/tree/master/Chapter01. The examples and code snippets will be located in this chapter's folder. For this chapter, you should open cd Chapter01 before you start.

We won't be using a lot of Git commands except git clone, which duplicates a code repository to your machine, but if you are unfamiliar with Git and want to learn more, a good place to start is the excellent Git Book (git-scm.com/book/en/v2), which is available in multiple languages.

Check out the following video to see the Code in Action:http://bit.ly/2O847tW

Overview of generative art

The term generative art has been coined with the advent of the computer, and since the very beginning of computer science, artists and scientists used technology as a tool to produce art. Interestingly, generative art predates computers, because generative systems can be derived by hand.

In this section, we'll provide an overview of generative music by showing you interesting examples from art history going back to the 18th century. This will help you understand the different types of generative music by looking at specific examples and prepare the groundwork for later chapters.

Pen and paper generative music

There's a lot of examples of generative art in the history of mankind. A popular example dates back to the 18th century, where a game called Musikalisches Würfelspiel (German for musical dice game) grew popular in Europe. The concept of the game was attributed to Mozart by Nikolaus Simrock in 1792, though it was never confirmed to be his creation.

The players of the game throw a dice and from the result, select one of the predefined 272 musical measures from it. Throwing the dice over and over again allows the players to compose a full minute (the musical genre that is generated by the game) that respects the rules of the genre because it was composed in such a way that the possible arrangements sound pretty.

In the following table and the image that follows, a small part of a musical dice game can be seen. In the table, the y-axis represents the dice throw outcome while the x-axis represents the measure of the score you are currently generating. The players will throw two dices 16 times:

On the first throw of two dices, we read the first column. A total of two will output the measure 96 (first row), a total of two will output the measure 32 (second row), and so on.

On the second throw of two dices, we read the second column. A total of two will output the measure 22 (first row), a total of three will output the measure 6 (second row), and so on.

After 16 throws, the game will have output 16 measures for the index:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

2

96

22

141

41

105

122

11

30

70

121

26

9

112

49

109

14

3

32

6

128

63

146

46

134

81

117

39

126

56

174

18

116

83

4

69

95

158

13

153

55

110

24

66

139

15

132

73

58

145

79

5

40

17

113

85

161

2

159

100

90

176

7

34

67

160

52

170

6

148

74

163

45

80

97

36

107

25

143

64

125

76

136

1

93

7

104

157

27

167

154

68

118

91

138

71

150

29

101

162

23

151

8

152

60

171

53

99

133

21

127

16

155

57

175

43

168

89

172

9

119

84

114

50

140

86

169

94

120

88

48

166

51

115

72

111

10

98

142

42

156

75

129

62

123

65

77

19

82

137

38

149

8

11

3

87

165

61

135

47

147

33

102

4

31

164

144

59

173

78

12

54

130

10

103

28

37

106

5

35

20

108

92

12

124

44

131

 

The preceding table shows a small part of the whole score, with each measure annotated with an index. For each of the generated 16 indexes, we take the corresponding measure in order, which constitutes our minuet (the minuet is the style that's generated by this game – basically, it's a music score with specific rules).

There are different types of generative properties:

Chance or randomness

, which the dice game is a good example of, where the outcome of the generated art is partially or totally defined by chance. Interestingly, adding randomness to a process in art is often seen as 

humanizing

 the process, since an underlying rigid algorithm might generate something that sounds

artificial

.

Algorithmic generation

(or rule-based generation), where the rules of the generation will define its outcome. Good examples of such generation include a cellular automaton, such as the popular Conway's Game of Life, a game where a grid of cells changes each iteration according to predefined rules: each cell might be on or off, and the neighboring cells are updated as a function of the grid's state and rules. The result of such generation is purely deterministic; it has no randomness or probability involved.

Stochastic-based generation

, where sequences are derived from the probability of elements. Examples of this include Markov chains, a stochastic model in which for each element of a sequence, the resulting probability of the said event is defined only on the present state of the system. Another good example of stochastic-based generation is machine learning generation, which we'll be looking at throughout this book.

We will use a simple definition of generative art for this book:

"Generative art is an artwork partially or completely created by an autonomous system".

By now, you should understand that we don't actually need a computer to generate art since the rules of a system can be derived by hand. But using a computer makes it possible to define complex rules and handle tons of data, as we'll see in the following chapters.

Computerized generative music

The first instance of generative art by computer dates back to 1957, where Markov chains were used to generate a score on an electronic computer, the ILLIAC I, by composers Lejaren Hiller and Leonard Issacson. Their paper, Musical Composition with a High-Speed Digital Computer, describes the techniques that were used in composing the music. The composition, titled Illac Suite, consists of four movements, each exploring a particular technique of music generation, from a rule-based generation of cantus firmi to stochastic generation with Markov chain.

Many famous examples of generative composition have followed since, such as Xenakis's Atrées in 1962, which explored the idea of stochastic composition; Ebcioglo's composition software named CHORAL, which contained handcrafted rules; and David Cope's software called EMI, which extended the concept to be able to learn from a corpus of scores.

As of today, generative music is everywhere. A lot of tools allow musicians to compose original music based on the generative techniques we described previously. A whole genre and musical community, called algorave, originated from those techniques. Stemming from the underground electronic music scene, musicians use generative algorithms and software to produce live dance music on stage, hence the name of the genre. Software such as TidalCycles and Orca allow the musician to define rules on the fly and let the system generate the music autonomously.

Looking back on those techniques, stochastic models such as Markov chains have been widely used in generative music. It stems from the fact that they are conceptually simple and easy to represent since the model is a transition probability table and can learn from a few examples. The problem with Markov models is that representing a long-term temporal structure is hard since most models will only consider n previous states, where n is small, to define the resulting probability. Let's take a look at what other types of models can be used to generate music.

In a 2012 paper titled Ten Questions Concerning Generative Computer Art, the author talks about the possibility of machine creation, the formalization of human aesthetics, and randomness. More importantly, it defines the limitations of such systems. What can a generative system produce? Can machines only do what they are instructed to?

New techniques with machine learning

Machine learning is important for computer science because it allows complex functions to be modeled without them being explicitly written. Those models are automatically learned from examples, instead of being manually defined. This has a huge implication for arts in general since explicitly writing the rules of a painting or a musical score is inherently difficult.

In recent years, the advent of deep learning has propelled machine learning to new heights in terms of efficiency. Deep learning is especially important for our use case of music generation since using deep learning techniques doesn't require a preprocessing step of feature extraction, which is necessary for classical machine learning and hard to do on raw data such as image, text, and – you guessed it – audio. In other words, traditional machine learning algorithms do not work well for music generation. Therefore, all the networks in this book will be deep neural networks.

In this section, we'll learn what advances in deep learning allow for music generation and introduce the concepts we'll be using throughout this book. We'll also look at the different types of musical representations for those algorithms, which is important as it will serve as the groundwork for this book for data in general.

Advances in deep learning

We all know that deep learning has recently become a fast-growing domain in computer science. Not so long ago, no deep learning algorithms could outperform standard techniques. That was before 2012 when, for the first time, a deep learning algorithm, AlexNet, did better in an image classification competition by using a deep neural network trained on GPUs (see the Further reading section for the AlexNet paper, one of the most influential papers that was published in computer vision). Neural network techniques are more than 30 years old, but the recent reemergence can be explained by the availability of massive data, efficient computing power, and technical advances.

Most importantly, a deep learning technique is general, in the sense that, as opposed to the music generation techniques we've specified previously, a machine learning system is agnostic and can learn from an arbitrary corpus of music. The same system can be used in multiple musical genres, as we'll see during this book when we train an existing model on jazz music in Chapter 6, Data Preparation for Training.

Many techniques in deep learning were discovered a long time ago but only find meaningful usage today. Of the technical advances in the field that concern music generation, those are present in Magenta and will be explained later in this book:

Recurrent Neural Networks

 (

RNNs

) are interesting for music generation because they allow us to operate over sequences of vectors for the input and output. When using classic neural networks or convolutional networks (which are used in image classification), you are limited to a fixed size input vector to produce a fixed size output vector, which would be very limiting for music processing, but works well for certain types of image processing. The other advantage of RNN is the possibility of producing a new state vector at each pass by combining a function with the previous state vector, which a powerful mean of describing complex behavior and long-term state. We'll be talking about RNNs in

Chapter 2

,

Generating Drum Sequences with Drums RNN

.

Long Sho

rt-Term Memory

 (

LSTM

) is an RNN with slightly different properties. It solves the problem of vanishing gradients that is present in RNNs and makes it impossible for the network to learn long-term dependencies, even if it theoretically could. The approach of using LSTM in music generation has been presented by Douglas Eck and Jurgen Schmidhuber in 2002 in a paper called

Finding temporal structure in music: Blues improvisation with LSTM recurrent networks

. We'll be talking about LSTM in

Chapter 3

,

Generating Polyphonic Melodies

.

Variational autoencoders

 (

VAEs

) are analogous to classical autoencoders, in the sense that their architecture is similar, consisting of an encoder (for the input to a hidden layer), a decoder (for a hidden layer to the output), and a loss function, with the model learning to reconstruct the original input with specific constraints. The usage of VAE in generative models is recent but has shown interesting results. We'll be talking about VAE in

Chapter 4

,

Latent Space Interpolation with Music VAE

Generative adversarial networks

 (

GANs

) are a class of machine learning systems where two neural networks compete with each other in a game: a generative network generates candidates while a discriminating network evaluates them. We'll be talking about GANs in

Chapter 5

,

Audio Generation with NSynth and GANSynth

.

Recent deep learning advances have profoundly changed not only music generation but also genre classification, audio transcription, note detection, composition, and more. We won't be talking about these subjects here, but they all share common ground: musical representation.

Representation in music processes

These systems can work with different representations:

Symbolic representation

, such as the 

MIDI

 (

Musical Instrument Digital Interface

 (

MIDI

), describes the music using a notation containing the musical notes and timing, but not the sound or timbre of the actual sound. In general, sheet music is a good example of this. A symbolic representation of music has no sound by itself; it has to be played by instruments.

Sub-symbolic representation

, such as a raw audio waveform or a spectrogram, describes the actual sound of the music.

Different processes will require a different representation. For example, most speech recognition and synthesis models work with spectrograms, while most of the examples we will see in this book uses MIDI to generate music scores. Processes that integrate both representations are rare, but an example of this could be a score transcription that takes an audio file and translate it into MIDI or other symbolic representations.

Representing music with MIDI

There are other symbolic representations than MIDI, such as MusicXML and AbcNotation, but MIDI is by far the most common representation. The MIDI specification also doubles down as a protocol since it is used to carry note messages that can be used in real-time performance as well as control messages.

Let's consider some parts of a MIDI message that will be useful for this book:

Channel [0-15]

: This indicates the track that the message is sent on

Note number [0-127]

: This indicates the pitch of the note

Velocity [0-127]

: This indicates the volume of the note

To represent a musical note in MIDI, you have to send two different message types with proper timing: a Note On event, followed by a Note Off event. This implicitly defines the length of the note, which is not present in the MIDI message. This is important because MIDI was defined with live performance in mind, so using two messages – one for a keypress and another for a key release – makes sense.

From a data perspective, we'll need either need to convert MIDI notes into a format that has the note length encoded in it or keep a note on and note off approach, depending on what we're trying to do. For each model in Magenta, we'll see how the MIDI notes are encoded.

The following image shows a MIDI representation of a generated drum file, shown as a plot of time and pitch. Each MIDI note is represented by a rectangle. Because of the nature of percussion data, all the notes have the same length ("note on" followed by "note off" messages), but in general, that could vary. A drum file, by essence, is polyphonic, meaning that multiple notes can be played at the same time. We'll be talking about monophony and polyphony in the upcoming chapters.

Note that the abscissa is expressed in seconds, but it is also common to note it with bars or measures. The MIDI channel is absent from this diagram:

The script for plotting a generated MIDI file can be found in the GitHub code for this chapter in the Chapter01/provided folder. The script is called midi2plot.py.

In the case of music generation, the majority of current deep learning systems use symbolic notation. This is also the case with Magenta. There are a couple of reasons for this:

It is easier to represent the essence of music in terms of composition and harmony with symbolic data.

Processing those two types of representations by using a deep learning network is similar, so choosing between both boils down to whichever is faster and more convenient. A good example of this is that the WaveNet audio generation network also has a MIDI implementation, known as the MidiNet symbolic generation network.

We'll see that the MIDI format is not directly used by Magenta, but converted into and from NoteSequence, a Protocol Buffers (Protobuf) implementation of the musical structure that is then used by TensorFlow. This is hidden from the end user since the input and output data is always MIDI. The NoteSequence implementation is useful because it implements a data format that can be used by the models for training. For example, instead of using two messages to define a note's length, a Note in a NoteSequence has a length attribute. We'll be explaining the NoteSequence implementation as we go along.

Representing music as a waveform

An audio waveform is a graph displaying amplitude changes over time. Zoomed out, a waveform looks rather simple and smooth, but zoomed in, we can see tiny variations – it is those variations that represent the sound.

To illustrate how a waveform works, imagine a speaker cone that's is at rest when the amplitude is at 0. If the amplitude moves to a negative value of 1, for example, then the speaker moves backward a little bit, or forward in the case of a positive value. For each amplitude variation, the speaker will move, making the air move, thus making your eardrums move.

The bigger the amplitude is in the waveform, the more the speaker cone moves in terms of distance, and the louder the sound. This is expressed in decibel (dB), a measure of sound pressure.

The faster the movement, the higher the pitch. This is expressed in hertz (Hz).

In the following image, we can see the MIDI file from the previous section played by instruments to make a WAV recording. The instrument that's being used is a 1982 Roland TR-808 drum sample pack. You can visually match some instruments, such as double the Conga Mid (MIDI note 48) at around 4.5 seconds. In the upper right corner, you can see a zoom of the waveform at 100th of a second to show the actual amplitude change:

The script for plotting a WAV file can be found in the GitHub code for this chapter in the Chapter01/provided folder. The script is called wav2plot.py.

In machine learning, using a raw audio waveform used to be uncommon as a data source since the computational load is bigger than other transformed representations, both in terms of memory and processing. But recent advances in the field, such as WaveNet models, makes it on par with other methods of representing audio, such as spectrograms, which were historically more popular for machine learning algorithms, especially for speech recognition and synthesis.

Bear in mind that training on audio is really cost-intensive because raw audio is a dense medium. Basically, a waveform is a digital recreation of a dynamic voltage over time. Simply put, a process called Pulse Code Modulation (PCM) assigns a bit value to each sample at the sampling rate you are running. The sampling rate for recording purposes is pretty standard: 44,100 Hz, which is called the Nyquist Frequency. But you don't always need a 44,100 Hz sample rate; for example, 16,000 Hz is more than enough to cover human speech frequencies. At that frequency, the first second of audio is represented by 16,000 samples.

If you want to know more about PCM, the sampling theory for audio, and the Nyquist Frequency, check out the Further reading section at the end of this chapter.

This frequency was chosen for a very specific purpose. Thanks to the Nyquist theorem, it allows us to recreate the original audio without a loss of sounds that humans can hear.