LLVM Code Generation - Quentin Colombet - E-Book

LLVM Code Generation E-Book

Quentin Colombet

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files.
You’ll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You’ll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs.
By the end of this book, you’ll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 1056

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



LLVM Code Generation

A deep dive into compiler backend development

Quentin Colombet

LLVM Code Generation

Copyright © 2025 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Kunal Chaudhari

Relationship Lead: Samriddhi Murarka

Project Manager: Ashwin Dinesh Kharwa

Content Engineer: Sujata Tripathi

Technical Editor: Rohit Singh

Copy Editor: Safis Editing

Indexer: Hemangini Bari

Production Designer: Vijay Kamble

Growth Lead: Vinishka Kalra

First published: May 2025

Production reference: 3100725

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83763-778-2

www.packtpub.com

To my wife, Luce, and my three sons, Clovis, Mathias, and Gabriel, for supporting and encouraging me throughout this project. I may not have been the most present dad during this time, and your patience with me has been noticed and appreciated. With love.

– Quentin Colombet

Foreword

At first glance, it might seem like only a few people ever work on LLVM backends. After all, the number of backends in upstream LLVM is limited, and most of them are already stable and functioning well. So, why would they need significant changes?

In reality, LLVM has become the de facto standard for code generation—not just for CPUs but also for increasingly diverse compute engines such as GPUs and other accelerators. When a new CPU or accelerator needs code generation support, the default choice is often to adopt LLVM and implement a backend.

Furthermore, the existing upstream backends are under constant improvement. They’re regularly updated to support new CPU instructions, refine and enhance optimizations, introduce additional security-hardening features, and much more.

Beyond development in industry and by enthusiasts, LLVM is also a top choice in academia for systems and compiler research. Many innovations in performance tuning, security, and other areas require modifying LLVM backends to enable experimentation.

These are just a few scenarios where someone might need to create or modify LLVM backends—and I’m sure there are many more. But even considering just these three, it’s clear that thousands of developers need to do at least some LLVM backend development, and they need high-quality documentation to do it well.

While the LLVM project offers extensive documentation and tutorials, there continues to be a gap in documenting very clearly everything you need to know to become proficient in backend development.

Effectively working on LLVM backends often requires reverse engineering and internalizing their architecture. Historically, the most efficient way to learn has been to find an expert and engage in long, detailed conversations to piece everything together. Of course, not everyone has access to such experts. Even though the LLVM community makes a huge effort to share expert knowledge—through comprehensive documentation (https://llvm.org/docs/), hundreds of recorded talks and presentations (https://llvm.org/devmtg/), and programs such as office hours and online sync-ups (https://llvm.org/docs/GettingInvolved.html#office-hours)—the learning curve remains steep.

A few years ago, I had a conversation with Quentin at one of the LLVM Developers’ Meetings about this very topic. I thought then (and still do!) that Quentin is one of the most knowledgeable LLVM backend engineers out there. I was thinking out loud, “Wouldn’t it be amazing if all of your knowledge—especially about LLVM backend development—could be made easily available to the entire LLVM community? Just imagine how much easier and faster backend development would become if your insights were accessible in a book...”

I’m thrilled that our conversation helped inspire Quentin to write this fantastic book. It distills deep insights and practical knowledge into a single, well-organized resource—ideal for anyone starting or continuing their journey in LLVM backend development. I hope that everyone working on backends reads it, and that it fuels even more innovation and progress in the LLVM ecosystem.

Thank you, Quentin, for writing all this down!

Kristof Beyls

Senior Technical Director and Fellow, Arm

Contributors

About the author

Quentin Colombet is a veteran LLVM contributor who focuses on the development of backends. He is the architect of the new instruction selection framework (GlobalISel) and code owner of the LLVM register allocators.

He has more than two decades of experience working on different compiler backends for various architectures (GPU, CPU, microcontroller, DSP, and ASIC, among others) and compiler frameworks (Open64, LLVM, IREE, and Glow, to name the main ones). He joined the LLVM project when he started at Apple in 2012 and has worked on the x86, AArch64, and Apple GPU backends and all the products that include these processing units. Since starting on the LLVM infrastructure, he has helped interns and new hires onboard the LLVM infrastructure at Apple, Meta, and Google, as well as, more recently, his own company, Brium, while contributing to the projects using that technology in these companies.

I want to thank Bruno Cardoso Lopes, who inspired me to write this book and introduced me to the Packt team. Thank you to the Packt team for their support and continuous feedback, more specifically, Aditi Chatterjee, Ashwin Dinesh Kharwa, Samriddhi Murarka, and Sujata Tripathi, who all worked closely with me to make this project a reality. Thanks to my technical reviewer, Shuo Niu, who brought a different perspective to the book and helped me clarify the content, resulting in a better experience. And, of course, thank you to my wife, Luce, who encouraged me to get started on this project and supported me along the way.

About the reviewer

Shuo Niu holds a Master of Engineering in computer engineering from the University of Toronto. With six years of experience in LLVM compiler development, specializing in middle-end and backend optimizations for FPGA HLS compilers, Shuo is now extending his expertise to building AI compilers for low-power AI chips. Committed to fostering a stronger LLVM community, Shuo also served as a technical reviewer for Learn LLVM 17, Second Edition.

Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Part 1: Getting Started with LLVM

Building LLVM and Understanding the Directory Structure

Getting the most out of this book – get to know your free benefits

Technical requirements

Getting ready for LLVM’s world

Prerequisites

Identifying the right version of the tools

Installing the right tools

Building a compiler

What is a compiler?

Opening Clang’s hood

Building Clang

Experimenting with Clang

Building LLVM

Configuring the build system

Crash course on Ninja

Building the core LLVM project

Testing a compiler

Crash course on the Google test infrastructure

Crash course on the LLVM Integrated Tester

Testing in lit

Directives

Describing the RUN command

The lit driver – llvm-lit

Crash course on FileCheck

FileCheck by example

LLVM unit tests

Finding the source of a test

Running unit tests manually

The unit tests pass, what now?

LLVM functional tests

The LLVM test suite

The functional tests fail – what do you do?

Understanding the directory structure

High-level directory structure

Focusing on the core LLVM project

A word on the include files

Private headers

What is the deal with <project>/include/<project>?

What is include/<project>-c?

Overview of some of the LLVM components

Generic LLVM goodness

Working with the LLVM IR

Generic backend infrastructure

Target-specific constructs

Summary

Quiz time

Contributing to LLVM

Reporting an issue

Engaging with the community

Reviewing patches

Contributing patches

Understanding patch contribution in a nutshell

Following up with your contribution

A word on adding tests

Summary

Quiz time

Compiler Basics and How They Map to LLVM APIs

Technical requirements

A word on APIs

Understanding compiler jargon

Target

Host

Lowering

Canonical form

Build time, compile time, and runtime

Backend and middle-end

Application binary interface

Encoding

Working with basic structures

Module

A module at the LLVM IR level

A module at the Machine IR level

Function

A function in the LLVM IR

A function in the Machine IR

Basic block

A basic block in the LLVM IR

A basic block in the Machine IR

Instruction

An instruction in the LLVM IR

An instruction in the Machine IR

Control flow graph

Reverse post-order traversal

Backedge

Critical edge

Irreducible graph

Building your first IRs

Building your first LLVM IR

A walk over the required APIs

Your turn

Building your first Machine IR

A walk over the required APIs

Your turn

Summary

Quiz time

Writing Your First Optimization

Technical requirements

The concept of value

SSA

Constructing the SSA form

Dominance

Def-use and use-def chains

Def-use and use-def chains in the LLVM IR

Def-use and use-def chains in the Machine IR

Tackling optimizations

Legality

Integer overflow/underflow

Fast-math flags

Side effects

Profitability

Instruction lowering – TargetTransformInfo and TargetLowering

Library support – TargetLibraryInfo

Datatype properties – DataLayout

Register pressure

Basic block frequency

More precise instruction properties – scheduling model and instruction description

Transformation jargon

Instcombine

Fixed point

Liveness

Hoisting

Sinking

Folding

Loops

Terminology

Preheader

Header

Exiting block

Latch

Exit block

Where to get loop information

Writing a simple constant propagation optimization

The optimization

Simplifying assumptions

Missing APIs

The Constant class

The APInt class

Creating a constant

Replacing a value

Your turn

Going further

Legality

Profitability

Propagating constants across types

Summary

Quiz time

Dealing with Pass Managers

Technical requirements

What is a pass?

What is a pass manager?

The legacy and new pass manager

Pass managers’ capabilities

Populating a pass manager

Inner workings of pass managers

Creating a pass

Writing a pass for the legacy pass manager

Using the proper base class

Expressing the dependencies of a pass

Preserving analyses

Specificities of the Pass class

Writing a pass for the new pass manager

Implementing the right method

Registering an analysis

Describing the effects of your pass

Inspecting the pass pipeline

Available developer tools

Plumbing up the information you need

Interpreting the logs of pass managers

The pass pipeline structure

Time profile

Your turn

Writing your own pass

Writing your own pass pipeline

Summary

Further reading

Quiz time

TableGen – LLVM Swiss Army Knife for Modeling

Technical requirements

Getting started with TableGen

The TableGen programming language

Types

Programming with TableGen

Defining multiple records at once

Assigning fields

Discovering a TableGen backend

General information on TableGen backends for LLVM

Discovering a TableGen backend

The implementation of intrinsics

The content of a generated file

The source of a TableGen backend

Debugging the TableGen framework

Identifying the failing component

Cracking open a TableGen backend

Summary

Further reading

Quiz time

Part 2: Middle-End: LLVM IR to LLVM IR

Understanding LLVM IR

Technical requirements

Understanding the need for an IR

What an IR is

Why use an IR?

Introducing LLVM IR

Identifiers

Functions

Basic blocks

Instructions

Types

Single-value types

The label type

Aggregate types

Types in the LLVM IR API

Walking through an example

Target-specific elements in LLVM IR

Intrinsic functions

Triple

Function attributes

Data layout

Application binary interface

Textual versus binary format

LLVM IR API – cheat sheet

Summary

Further reading

Quiz time

Survey of the Existing Passes

Technical requirements

How to find the unknown

Leveraging opt

Using the LLVM code base

Starting from the implementation

Survey of the helper passes

The verifier

The printer

Analysis passes

Target transformation information

Loop information

Alias analysis

Block frequency info

Dominator tree information

Value tracking

Canonicalization passes

The instruction combiner

An example of a canonical rewrite

An example of an optimization

How to use instcombine

The memory to register rewriter

The converter to loop-closed-SSA form

Optimization passes

Interprocedural optimizations

Scalar optimizations

Vectorization

Summary

Further reading

Quiz time

Introducing Target-Specific Constructs

Technical requirements

Adding a new backend in LLVM

Connecting your target to the build system

Registering your target with Clang

Adding a new architecture to the Triple class

Populating the Target instance

Plumbing your Target through Clang

Creating your own intrinsics

The pros and cons of intrinsics

Creating an intrinsic in the backend

Defining our intrinsics

Hooking up the TableGen backend

Teaching LLVM IR about our intrinsics

Connecting an intrinsic to Clang

Writing the .def file by hand

Using the TableGen capabilities

Hooking up the built-in information

Establishing the code generation link

Adding a target-specific TargetTransformInfo implementation

Establishing a connection to your target-specific information

Introducing target-specific costs

Customizing the default middle-end pipeline

Using the new pass manager

Using the legacy pass manager

A one-time setup – assembling a codegen pipeline

Faking the instruction selector

Faking the lowering of the object file

Creating a skeleton for the assembly information

Using the right abstraction

Summary

Further reading

Quiz time

Hands-On Debugging LLVM IR Passes

Technical requirements

The logging capabilities in LLVM

Printing the IR between passes

Printing the debug log

Printing high-level information about what happened

Reducing the input IR size

Extracting a subset of the input IR

Shrinking the IR automatically

Using sanitizers

A crash course on LLDB

Starting a debugging session

Controlling the execution

Stopping the program

Command resolution

Resuming the execution

Inspecting the state of a program

The LLVM code base through a debugger

Summary

Further reading

Quiz time

Part 3: Introduction to the Backend

Getting Started with the Backend

Technical requirements

Introducing the Machine IR

Here comes the Machine IR

The Machine IR textual representation

The .mir file format

A primer on the YAML syntax

The semantics of the different fields

Mapping the content of a .mir file to the C++ API

A deep dive into the body of a MachineFunction instance

Working with a .mir file

Generating a .mir file

Running passes

Shrinking a .mir file

The anatomy of a MachineInstr instance

Introducing the MC layer

Working with MachineOperand instances

Unboxing a MachineOperand instance

Dealing with explicit and implicit operands

Understanding the constraints of an operand

Working with registers

The concept of the register class

The concept of sub-registers

The concept of register tuples

The concept of register units

The registers and SSA and non-SSA forms

Interacting with registers in the debugger

Creating MachineInstr objects

Describing registers

Writing the target description

Describing instructions

Summary

Further reading

Quiz time

Getting Started with the Machine Code Layer

Technical requirements

The use of the MC layer

Connecting the MC layer

What instructions to describe

Augmenting the target description with MC information

Defining the MC layer for the registers

Defining the MC layer for the instructions

Enabling MC-based tools

Leveraging TableGen

Implementing the missing pieces

Implementing your own MCInstPrinter class

Implementing your own MCCodeEmitter class

Implementing your own XXXAsmParser class

Summary

Quiz time

The Machine Pass Pipeline

Technical requirements

The Machine pass pipeline at a glance

Injecting passes

Using the generic Machine optimizations

Generic passes worth mentioning

The CodeGenPrepare pass

The PeepholeOptimizer pass

The MachineCombiner pass

Summary

Further reading

Quiz time

Part 4: LLVM IR to Machine IR

Getting Started with Instruction Selection

Technical requirements

Overview of the instruction selection frameworks

How does instruction selection work?

Framework complementarity

Overall differences between the selectors

Compile time

Modularity and testability

Scope

Which selector to use?

FastISel

SDISel

GlobalISel

Selectors’ inner workings

Understanding the DAG representation

Textual representation of the SelectionDAG class

Manipulating a DAG

Understanding the generic Machine IR

Textual representation of generic attributes

Lowering constraints of the generic Machine IR

APIs to work with the generic Machine IR

Groundwork to connect the codegen pipeline

Instantiating the codegen pass pipeline

Providing the key target APIs to the codegen pipeline

Connecting SDISel to the codegen pipeline

Connecting FastISel to the codegen pipeline

Connecting GlobalISel to the codegen pipeline

Choosing between different selectors

Summary

Further reading

Quiz time

Instruction Selection: The IR Building Phase

Technical requirements

Overview of the IR building

Describing the calling convention

Writing your target description of the calling convention

Connecting the gen-callingconv TableGen backend

Anatomy of the CCValAssign class

Lowering the ABI with SDISel

Implementing the lowering of formal arguments

Providing custom description for the SDNode class

Handling of stack locations

Lowering the ABI with FastISel

Lowering the ABI with GlobalISel

Summary

Further reading

Quiz time

Instruction Selection: The Legalization Phase

Technical requirements

Legalization overview

Legalization actions

Legalization in SDISel

Describing your legal types

Describing your legalization actions

Implementing a custom legalization action

Legalization in GlobalISel

Describing your legalization actions with the LegalizeRuleSet class

Custom legalization in GlobalISel

Summary

Quiz time

Instruction Selection: The Selection Phase and Beyond

Technical requirements

Register bank selection

The goal of the register bank selection

Describing the register banks

Implementing your RegisterBankInfo class

Instruction selection

Expressing your selection patterns

Introduction to the selection patterns

Advanced selection patterns

Selection in SDISel

Selection in FastISel

Selection in GlobalISel

Setting up the InstructionSelector class

Importing the selection patterns

Going beyond patterns

Finalizing the selection pipeline

Using custom inserters

Customizing the TargetLowering::finalizeLowering method

Optimizations

Using the DAGCombiner framework

Leveraging the combiner framework

Debugging the selectors

Debugging SDISel

Debugging the GlobalISel match table

Summary

Quiz time

Part 5: Final Lowering and Optimizations

Instruction Scheduling

Technical requirements

Overview of the instruction scheduling framework

The ScheduleDAGInstrs class

Changing the scheduling algorithm

The scheduling model

The scheduling events

The processing units

The scheduling bindings

Gluing everything together

Implementing your scheduling model

Connecting your scheduling model

Describing a processor model

Instantiating your subtarget

Guidelines to get started with your scheduling model

Summary

Quiz time

Register Allocation

Technical requirements

Overview of register allocation in LLVM

Enabling the register allocation infrastructure

Introducing the slot indexes

Introducing the live intervals

Maintaining the live intervals

Summary

Further reading

Quiz time

Lowering of the Stack Layout

Technical requirements

Overview of stack lowering

Handling of stack slots

From frame index to stack slot

The lowering of the stack frame

Introducing the reserved call frame

Implementing the frame-lowering target hooks

The expansion of the frame indices

Introducing register scavenging

Provisioning an emergency spill slot

Expanding the frame indices

Summary

Quiz time

Getting Started with the Assembler

Technical requirements

Overview of the lowering of a textual assembly file

Assembling with the LLVM infrastructure

Implementing an assembler

Providing the MCCodeEmitter class

Handling the fixups with the MCAsmBackend class

Recording the relocations with the MCObjectTargetWriter class

Summary

Further reading

Quiz time

Unlock Your Book’s Exclusive Benefits

How to unlock these benefits in three easy steps

Need help?

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Part 1

Getting Started with LLVM

In this part, we start with an introduction to the LLVM ecosystem, its community, and the various parts that make up the LLVM infrastructure.

This part assumes that you have no prior experience with LLVM and little to no experience with compilers.

More specifically, in this part, you will learn the following:

How to set up your environment to build and test the different projects that the LLVM infrastructure offersHow to interact with the LLVM community and, in particular, how to seek help and contributeAbout the basic concepts used in compilers and how to manipulate them through the LLVM application programming interfaces (APIs)How to write your first optimization pass and the things to consider while optimizing your programHow to build and customize your optimization pipelineHow TableGen, LLVM’s domain-specific language (DSL), fits into the LLVM infrastructure

By the end of this part, you will have a complete picture of the overall structure of the LLVM infrastructure and will be ready to dive into its inner workings.

This part of the book includes the following chapters:

Chapter 1, Building LLVM and Understanding the Directory StructureChapter 2, Contributing to LLVMChapter 3, Compiler Basics and How They Map to the LLVM APIsChapter 4, Writing Your First OptimizationChapter 5, Dealing with the Pass ManagersChapter 6, TableGen - The LLVM Swiss Army Knife for Modeling

1

Building LLVM and Understanding the Directory Structure

The LLVM infrastructure provides a set of libraries that can be assembled to create different tools and compilers.

LLVM originally stood for Low-Level Virtual Machine. Nowadays, it is much more than that, as you will shortly learn, and people just use LLVM as a name.

Given the sheer volume of code that makes the LLVM repository, it can be daunting to even know where to start.

In this chapter, we will give you the keys to approach and use this code base confidently. Using this knowledge, you will be able to do the following:

Understand the different components that make a compilerBuild and test the LLVM projectNavigate LLVM’s directory structure and locate the implementation of different componentsContribute to the LLVM project

This chapter covers the basics needed to get started with LLVM. If you are already familiar with the LLVM infrastructure or followed the tutorial from the official LLVM website (https://llvm.org/docs/GettingStarted.html), you can skip it. You can, however, check the Quiz time section at the end of the chapter to see whether there is anything you may have missed.

Getting the most out of this book – get to know your free benefits

Unlock exclusive free benefits that come with your purchase, thoughtfully crafted to supercharge your learning journey and help you learn without limits.

Here’s a quick overview of what you get with this book:

Next-gen reader

Figure 1.1: Illustration of the next-gen Packt Reader’s features

Our web-based reader, designed to help you learn effectively, comes with the following features:

Multi-device progress sync: Learn from any device with seamless progress sync.

Highlighting and notetaking: Turn your reading into lasting knowledge.

Bookmarking: Revisit your most important learnings anytime.

Dark mode: Focus with minimal eye strain by switching to dark or sepia mode.

Interactive AI assistant (beta)

Figure 1.2: Illustration of Packt’s AI assistant

Our interactive AI assistant has been trained on the content of this book, so it can help you out if you encounter any issues. It comes with the following features:

Summarize it: Summarize key sections or an entire chapter.

AI code explainers: In the next-gen Packt Reader, click the Explain button above each code block for AI-powered code explanations.

Note: The AI assistant is part of next-gen Packt Reader and is still in beta.

DRM-free PDF or ePub version

Figure 1.3: Free PDF and ePub

Learn without limits with the following perks included with your purchase:

Learn from anywhere with a DRM-free PDF copy of this book.

Use your favorite e-reader to learn using a DRM-free ePub version of this book.

Unlock this book’s exclusive benefits now

Take a moment to get the most out of your purchase and enjoy the complete learning experience.

https://www.packtpub.com/unlock/9781837637782

Note: Have your purchase invoice ready before you begin.

Technical requirements

To work with the LLVM code base, you need specific tools on your system. In this section, we list the required versions of these tools for the latest major LLVM release: 20.1.0.

Later, in Identifying the right version of the tools, you will learn how to find the version of the tools required to build a specific version of LLVM, including older and newer releases and the LLVM top-of-tree (that is, the actively developed repository). Additionally, you will learn how to install them.

With no further due, here are the versions of the tools required for LLVM 20.1.0:

Tool

Required version

Git

None specified

C/C++ toolchain

>=Clang 5.0

>=Apple Clang 10.0

>=GCC 7.4

>=Visual Studio 2019 16.8

CMake

>=3.20.0

Ninja

None specified

Python

>=3.8

Table 1.1: Tools required for LLVM 20.1.0

Furthermore, this book comes with scripts, examples, and more that will ease your journey with learning the LLVM infrastructure. We will specifically list the relevant content in the related sections, but remember that the repository lives at https://github.com/PacktPublishing/LLVM-Code-Generation.

Getting ready for LLVM’s world

In the Technical requirement section, we already listed which version of tools you needed to work with LLVM 20.1.0. However, LLVM is a lively project and what is required today may be different than what is required tomorrow. Also, to step back a bit, you may not know why you need these tools to begin with and/or how to get them.

This section addresses these questions, and you will learn the following in the process:

The purpose of each required toolHow to check that your environment has the proper toolsHow to install the proper tools

Depending on how familiar you are with development on Linux/macOS, this setup can be tedious or a walk in the park.

Ultimately, this section aims to teach you how to go beyond a fixed release of LLVM by giving you the knowledge required to find the information you need.

If you are familiar with package managers (e.g., the apt-get command-line tool on Linux and Homebrew (https://brew.sh) on macOS), you can skip this part and directly install Git, Clang, CMake, Ninja, and Python through them. For Windows, if you do not have a package manager, the steps provided here are all manual, meaning that if you pick the related Windows binary distribution of the related tools, it should just work. Now, for Windows again, you may be better off installing these tools through Visual Studio Code (VS Code) (https://code.visualstudio.com) via the VS Code’s extensions.

In any case, you might want to double-check which version of these tools you need by going through the Identifying the right version of the tools section.

Prerequisites

As mentioned previously, you need a set of specific tools to build the LLVM code base. This section summarizes what each of these tools does and how they work together to build the LLVM project.

This list of tools is as follows:

Git: The software used for the versioning control of LLVMA C/C++ toolchain: The LLVM code base is in C/C++, and as such, we will need a toolchain to build that type of codeCMake: The software used to configure the build systemNinja: The software used to drive the build systemPython: The scripting language and execution environment used for testing

Figure 1.1 illustrates how the different tools work together to build an LLVM compiler:

Figure 1.1: The essential command-line tools to build an LLVM compiler

Breaking this figure down, here are the steps it takes:

Git retrieves the source code.CMake generates the build system for a particular driver, such as Ninja, and a particular C/C++ toolchain.Ninja drives the build process.The C/C++ toolchain builds the compiler.Python drives the execution of the tests.

Identifying the right version of the tools

The required version of these tools depends on the version of LLVM you are building. For instance, see the Technical requirements section for the latest major release of LLVM, 20.1.0.

To check the required version for a specific release, check out the Getting Started page of the documentation for this release. To get there, perform the following steps:

Go to https://releases.llvm.org/.Scroll down to the Download section.In the documentation column, click on the link named llvm or docs for the release you are interested in. For instance, release 20.1.0 should bring you to a URL such as https://releases.llvm.org/20.1.0/docs/index.html.Scroll down to the Documentation section.Click on Getting Started/Tutorials.Find the Software and the Host C++ Toolchain[...] sections. For instance, for release 20.1.0, the Software section lives at https://releases.llvm.org/20.1.0/docs/GettingStarted.html#software.

To find the requirements for LLVM top-of-tree, simply follow the same steps but with the release named Git. This release should have a release date of Current.

You learned how to identify which version of the tools you need to have to be able to work with LLVM. Now, let’s see how to install these versions.

Note

Ninja is the preferred driver of the build system of LLVM. However, LLVM also supports other drivers such as Makefile (the default), Xcode, and, to some extent, Bazel. Feel free to choose what works best for you.

Installing the right tools

Depending on your operating system (OS), you may have already all the necessary tools installed. You can use the following commands to check which version of the tools are installed and whether they meet the minimum requirements that we described in the previous section:

Tool

Checking the availability

Git

git –version

C/C++ toolchain (LLVM)

clang –version

CMake

cmake –version

Ninja

ninja –version

Python

python3 –version

Table 1.2: Commands to install the right tools

If any of the commands from this table fails or if any of the versions do not meet the minimum requirements, you will have to install/update the related tools.

Assuming you are missing some of the tools, here are the steps to install them from the official websites. Feel free to use your own package manager if you do not want to do this manually.

In a nutshell, you need to do the following:

Go to the official website for the tool.Go to the Downloads page.Download the proper package for your OS.Unpack/install the package to a location of your choice.

The official websites are as follows:

Tool

Where to get it

Git

https://git-scm.com/downloads or https://git-scm.com, and then click on Downloads

C/C++ toolchain (LLVM)

https://releases.llvm.org or https://www.llvm.org, and then click on All Releases

CMake

https://cmake.org/download/ or https://cmake.org/, and then click on Downloads

Ninja

https://github.com/ninja-build/ninja/releases or https://ninja-build.org, and then click on download the Ninja binary

Python

https://www.python.org/downloads/ or https://www.python.org, and then click on Downloads

Table 1.3: Websites where you can find the required tools

Note that, on macOS, Git and Clang come with the Xcode CLI package. To install them on this OS, please run the following command:

$ xcode-select --install

Quick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button (1) to quickly copy code into your coding environment, or click the Explain button (2) to get the AI assistant to explain a block of code to you.

The next-gen Packt Reader is included for free with the purchase of this book. Unlock it by scanning the QR code below or visiting https://www.packtpub.com/unlock/9781837637782.

To make things easier, you will find a script that can help you set up the environment for macOS in the ch1 directory of the Git repository of this book.

If you do not have Git, you can get this script with the following command:

$ curl --location https://raw.githubusercontent.com/PacktPublishing/LLVM-Code-Generation/main/ch1/setup_env.sh --output setup_env.sh

If you have Git, simply run the following command:

$ git clone https://github.com/PacktPublishing/LLVM-Code-Generation.git $ cd LLVM-Code-Generation/ch1

After you get the script one way or another, run the following command:

$ bash setup_env.sh ${INSTALL_PREFIX}

INSTALL_PREFIX is the path where you want the tools to be installed.

At this point, you know how to identify the required version of the tools to build LLVM. You also acquired a basic understanding of how these tools interact with each other during the build process.

From this point forward, we will assume that you have all the necessary tools available at one of the directories recorded in the PATH environment variable. In other words, you can use these tools without having to explicitly set their path on the command line.

Now that we have taken care of the setup of the environment, we can start playing with LLVM.

Building a compiler

In this section, we will introduce the different parts of what makes a compiler and how they relate to the LLVM code base. In the process, you will do the following:

Understand the overall architecture of a compilerLearn how to build Clang from the sourceBe able to decide which components of LLVM you need to build

If you are already familiar with the components of a compiler toolchain and want to jump straight into the action, skip directly to the Building LLVM section.

What is a compiler?

The definition of a compiler means different things for different people. For instance, for a student in their first year of computer science, a compiler may be seen as a tool that translates a source language into executable code. This is a possible definition, but it is also a very coarse-grain one.

When you look closer at a compiler, you will find that it is a collection of different tools, or libraries, working together to achieve this translation. That’s why we talk about a compiler toolchain.

To go back to the previous coarse-grain definition, a compiler, such as Clang, is a compiler driver: it invokes the different tools in the right order and pulls the related dependencies from the standard library to produce the final executable code.

The LLVM code base reflects the composability of these tools. It is organized as a set of libraries that you can use to build a variety of tools and, in particular, a compiler toolchain.

To get a better understanding of which tools are right to build for your particular project, let us see which components are involved with a concrete example: Clang.

Opening Clang’s hood

To build an executable from a C file, Clang, a C/C++ compiler built on top of LLVM, orchestrates three different components: the frontend, the backend, and the linker. Additionally, Clang has to pull in dependencies that are expected by the system/language, such as the standard library, so that the following happens:

The frontend has access to the standard headers, for instance, what the prototype of the printf function is.The linker has access to the standard implementations, for instance, the actual implementation of printf.

The following picture gives a high-level view of the different parts of a compiler and the different LLVM projects involved in building such a compiler.

Figure 1.2: The different components of a compiler

When building a C file, Clang acts as a driver for a series of tools. It invokes the frontend (Clang project in LLVM), then passes down the result to the backend (LLVM project) that produces an object file that gets linked with the standard library (the libc project in LLVM) by the linker (the lld project in LLVM).

The takeaway is building Clang alone will not be enough to have a properly functioning compiler. To get there, you will need to build at least the linker and the standard library, which come respectively under the lld and the libc/libcxx projects in LLVM. Otherwise, your compiler toolchain will have to rely on what the host provides.

Note

You may have noticed that we did not mention the frontend and backend in this list. This is because, when building the Clang project, these are always included.

In any case, the focus of this book is LLVM backends, so, why are we spending so much time on Clang?

The reason is simple: Clang offers a familiar way to interact with LLVM constructs. By using the Clang frontend, you will be able to generate the LLVM intermediate representation (IR) by simply writing C/C++. We believe this is a gentler way to start your journey with LLVM backends.

As we progress through the book, we will have fewer and fewer C/C++ inputs and more and more LLVM IR ones.

Building Clang

As already mentioned, here, we are only interested in Clang’s frontend capabilities. As such, the following instructions focus only on building this part of LLVM. You will learn more about the possible customizations of the build system in the Building LLVM section.

Assuming LLVM_SRC is the path where you want to have the LLVM source code and CLANG_BUILD is the path where you want the build of Clang to happen, please run the following:

$ git clone https://github.com/llvm/llvm/project.git \ ${LLVM_SRC}$ mkdir -p ${CLANG_BUILD}$ cd${CLANG_BUILD}$ cmake -DLLVM_ENABLE_PROJECTS=clang -GNinja -DCMAKE_BUILD_TYPE=Release ${LLVM_SRC}/llvm $ ninja clang

This will check out the LLVM sources from GitHub, create a build directory, move there, configure the build system for building clang with Ninja, and finally, build Clang.

If you run into any issues, make sure you have all the required tools in PATH (see the Installing the Right Tools section).

When the build finishes, you should have a shiny new clang executable at ${CLANG_BUILD}/bin.

Experimenting with Clang

If you ever look deeper into Clang, you will find out that it is composed of many more phases than the frontend, backend, and linker. By playing with Clang’s command-line options, you can expose the intermediate results of some of these phases.

Here is the list of these phases:

Frontend: This validates that the input file is syntactically and semantically correct and produces the LLVM IR.Preprocessor: This expands macros (e.g., #include).Sema: This validates the syntax and semantics of the program.Codegen: This produces the LLVM IR.Backend: This translates the LLVM IR to target specific instructions.Middle-end optimizations: LLVM IR to LLVM IR optimizations.Assembly generation: Target-specific IR to assembly code.Assembler: This translates assembly code to an object file.

Here are the options to inspect their results:

To stop

Command

After the preprocessor

clang -E

After syntax checking

clang -fsyntax-only

After LLVM IR code generation

clang -O0 -emit-llvm -S

After the middle-end optimizations

(pick the level you want)

clang -O<1|2|3|s|z> -emit-llvm -S

After assembly generation

(i.e., see the textual representation of the assembly)

clang -S

After the assembler

(i.e., see the object file representation)

clang -c

Table 1.4: Checking the results after each phase

Note

For the commands using -emit-llvm, you can use -c instead of -S if you want to see the binary representation of the LLVM IR, called bitcode, instead of its textual form.

LLVM also offers different tools to reproduce these steps. These tools have different purposes and levels of control, and we will explore them in due time.

Now, you know which components are involved in a compiler toolchain and which part of the LLVM infrastructure covers which component. You scratched the surface of the LLVM build system by building Clang and, in the process, gained a valuable tool to play with the different compilation stages.

Next, let us dive deeper into the LLVM build system by learning how to build the core of components.

Building LLVM

This is where your journey as a backend developer starts: you will learn how to build the core LLVM project.

Instead of just dropping a bunch of commands for you to run (we will do some of that too, we promise), you will discover the most relevant knobs that you can use to tailor the build process to your needs.

We believe this is important knowledge to gain as it will help you optimize your development process and increase your productivity by focusing on what you need to build/run for your use cases.

To set the context, the core LLVM project contains all the necessary pieces to build an optimizing backend from LLVM IR down to assembly code/an object file for 20+ different architectures. This is a lot of code and chances are that you do not care about all these architectures. Therefore, at the very least, learning how to build only the ones you care about will save you compile time and down the road will improve your development speed.

Configuring the build system

LLVM’s official build system is CMake, and everything you know about CMake applies here. If you do not know about CMake, do not worry, we will cover enough to get you going.

CMake comes with some built-in variables that can be used to customize some key aspects of the build process. You will recognize these because their name starts with CMAKE_. We will not go over all of them but instead mention the most useful ones in this context. You can learn more about their meaning or discover new ones by looking directly at the CMake documentation (https://cmake.org/documentation/).

CMake also supports command-line options, but for all intent and purposes, we will mention only three here:

-D<var>=<value>: This defines the value of a CMake variable.-G<generatorName>: This generates a build system for the specified generator.-C<pathToCacheFile>: This preloads a cache file; cache files are useful for sharing specific configurations and avoiding setting all the variables manually. In a nutshell, this is useful to pre-set some CMake variables.

With this knowledge, here is one of the simplest commands you can run from your build directory to configure the LLVM’s build system:

$ cmake -GNinja –DCMAKE_BUILD_TYPE=Debug ${LLVM_SRC}/llvm

Your system is now ready for development, albeit things are going to be slow:

All the ~20 non-experimental backends will be built.Everything that is built will use the Debug configuration, meaning that the experience is centered around smooth debugging sessions.

Regarding Step 2, building for Debug, it may be exactly what you want while you develop the compiler, but this is not something you want the end users to experience!

Here is a list of knobs, all CMake variables, that you should use to speed things up:

Variable

Value

Meaning

Standard options

CMAKE_BUILD_TYPE

Debug

Build for a smooth debug experience:

Assertions: Enabled

Optimizations: Disabled

Debug info: Enabled

Produces a large and slow compiler.

Release

Build an optimized compiler:

Assertions: Disabled

Optimizations: Enabled

Debug info: Disabled

Produces a smaller and faster compiler.

CMAKE_C_COMPILER

<path>

Specify the path to the C compiler.

This is particularly useful when bootstrapping or cross-compiling the compiler.

We will not cover these topics, but at least you know where to look if you are interested in this.

CMAKE_CXX_COMPILER

<path>

Specify the path to the C++ compiler.

CMAKE_INSTALL_PREFIX

<path>

Specify where to install the final artifacts.

Faster build time

LLVM_TARGETS_TO_BUILD

Target1;...

Specify the list of backends to build (semicolon separated).

Target1, and so on, must match the directory name of one of the backends in ${LLVM_SRC}/llvm/lib/Target.

Default to the all special value, which builds all the ~20 non-experimental LLVM backends.

LLVM_OPTIMIZED_TABLEGEN

BOOL

Specify whether or not to build TableGen in optimized mode.

We will cover TableGen in more detail in the dedicated chapter but the gist of it is unless you are developing a TableGen backend, you will likely want to set this variable to speed up your build.

Notably useful

BUILD_SHARED_LIBS

BOOL

Build libraries as shared libraries.

This avoids the link steps for the different executables, but this means they are not self-contained anymore and you have to “ship” the shared libraries alongside them. For local development, this may be worth it, although the debug experience may not be that great.

LLVM_ENABLE_ASSERTIONS

BOOL

Enable or disable assertions.

Using this option, you can for instance enable the assertions in a release build, which can be useful to diagnose some issues while not paying the price of a full debug build.

LLVM_ENABLE_PROJECTS

Project1;...

Build Project1, and so on, on top of the LLVM core.