29,99 €
The LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files.
You’ll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You’ll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs.
By the end of this book, you’ll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 1056
Veröffentlichungsjahr: 2025
LLVM Code Generation
A deep dive into compiler backend development
Quentin Colombet
LLVM Code Generation
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Kunal Chaudhari
Relationship Lead: Samriddhi Murarka
Project Manager: Ashwin Dinesh Kharwa
Content Engineer: Sujata Tripathi
Technical Editor: Rohit Singh
Copy Editor: Safis Editing
Indexer: Hemangini Bari
Production Designer: Vijay Kamble
Growth Lead: Vinishka Kalra
First published: May 2025
Production reference: 3100725
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83763-778-2
www.packtpub.com
To my wife, Luce, and my three sons, Clovis, Mathias, and Gabriel, for supporting and encouraging me throughout this project. I may not have been the most present dad during this time, and your patience with me has been noticed and appreciated. With love.
– Quentin Colombet
At first glance, it might seem like only a few people ever work on LLVM backends. After all, the number of backends in upstream LLVM is limited, and most of them are already stable and functioning well. So, why would they need significant changes?
In reality, LLVM has become the de facto standard for code generation—not just for CPUs but also for increasingly diverse compute engines such as GPUs and other accelerators. When a new CPU or accelerator needs code generation support, the default choice is often to adopt LLVM and implement a backend.
Furthermore, the existing upstream backends are under constant improvement. They’re regularly updated to support new CPU instructions, refine and enhance optimizations, introduce additional security-hardening features, and much more.
Beyond development in industry and by enthusiasts, LLVM is also a top choice in academia for systems and compiler research. Many innovations in performance tuning, security, and other areas require modifying LLVM backends to enable experimentation.
These are just a few scenarios where someone might need to create or modify LLVM backends—and I’m sure there are many more. But even considering just these three, it’s clear that thousands of developers need to do at least some LLVM backend development, and they need high-quality documentation to do it well.
While the LLVM project offers extensive documentation and tutorials, there continues to be a gap in documenting very clearly everything you need to know to become proficient in backend development.
Effectively working on LLVM backends often requires reverse engineering and internalizing their architecture. Historically, the most efficient way to learn has been to find an expert and engage in long, detailed conversations to piece everything together. Of course, not everyone has access to such experts. Even though the LLVM community makes a huge effort to share expert knowledge—through comprehensive documentation (https://llvm.org/docs/), hundreds of recorded talks and presentations (https://llvm.org/devmtg/), and programs such as office hours and online sync-ups (https://llvm.org/docs/GettingInvolved.html#office-hours)—the learning curve remains steep.
A few years ago, I had a conversation with Quentin at one of the LLVM Developers’ Meetings about this very topic. I thought then (and still do!) that Quentin is one of the most knowledgeable LLVM backend engineers out there. I was thinking out loud, “Wouldn’t it be amazing if all of your knowledge—especially about LLVM backend development—could be made easily available to the entire LLVM community? Just imagine how much easier and faster backend development would become if your insights were accessible in a book...”
I’m thrilled that our conversation helped inspire Quentin to write this fantastic book. It distills deep insights and practical knowledge into a single, well-organized resource—ideal for anyone starting or continuing their journey in LLVM backend development. I hope that everyone working on backends reads it, and that it fuels even more innovation and progress in the LLVM ecosystem.
Thank you, Quentin, for writing all this down!
Kristof Beyls
Senior Technical Director and Fellow, Arm
Quentin Colombet is a veteran LLVM contributor who focuses on the development of backends. He is the architect of the new instruction selection framework (GlobalISel) and code owner of the LLVM register allocators.
He has more than two decades of experience working on different compiler backends for various architectures (GPU, CPU, microcontroller, DSP, and ASIC, among others) and compiler frameworks (Open64, LLVM, IREE, and Glow, to name the main ones). He joined the LLVM project when he started at Apple in 2012 and has worked on the x86, AArch64, and Apple GPU backends and all the products that include these processing units. Since starting on the LLVM infrastructure, he has helped interns and new hires onboard the LLVM infrastructure at Apple, Meta, and Google, as well as, more recently, his own company, Brium, while contributing to the projects using that technology in these companies.
I want to thank Bruno Cardoso Lopes, who inspired me to write this book and introduced me to the Packt team. Thank you to the Packt team for their support and continuous feedback, more specifically, Aditi Chatterjee, Ashwin Dinesh Kharwa, Samriddhi Murarka, and Sujata Tripathi, who all worked closely with me to make this project a reality. Thanks to my technical reviewer, Shuo Niu, who brought a different perspective to the book and helped me clarify the content, resulting in a better experience. And, of course, thank you to my wife, Luce, who encouraged me to get started on this project and supported me along the way.
Shuo Niu holds a Master of Engineering in computer engineering from the University of Toronto. With six years of experience in LLVM compiler development, specializing in middle-end and backend optimizations for FPGA HLS compilers, Shuo is now extending his expertise to building AI compilers for low-power AI chips. Committed to fostering a stronger LLVM community, Shuo also served as a technical reviewer for Learn LLVM 17, Second Edition.
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Part 1: Getting Started with LLVM
Building LLVM and Understanding the Directory Structure
Getting the most out of this book – get to know your free benefits
Technical requirements
Getting ready for LLVM’s world
Prerequisites
Identifying the right version of the tools
Installing the right tools
Building a compiler
What is a compiler?
Opening Clang’s hood
Building Clang
Experimenting with Clang
Building LLVM
Configuring the build system
Crash course on Ninja
Building the core LLVM project
Testing a compiler
Crash course on the Google test infrastructure
Crash course on the LLVM Integrated Tester
Testing in lit
Directives
Describing the RUN command
The lit driver – llvm-lit
Crash course on FileCheck
FileCheck by example
LLVM unit tests
Finding the source of a test
Running unit tests manually
The unit tests pass, what now?
LLVM functional tests
The LLVM test suite
The functional tests fail – what do you do?
Understanding the directory structure
High-level directory structure
Focusing on the core LLVM project
A word on the include files
Private headers
What is the deal with <project>/include/<project>?
What is include/<project>-c?
Overview of some of the LLVM components
Generic LLVM goodness
Working with the LLVM IR
Generic backend infrastructure
Target-specific constructs
Summary
Quiz time
Contributing to LLVM
Reporting an issue
Engaging with the community
Reviewing patches
Contributing patches
Understanding patch contribution in a nutshell
Following up with your contribution
A word on adding tests
Summary
Quiz time
Compiler Basics and How They Map to LLVM APIs
Technical requirements
A word on APIs
Understanding compiler jargon
Target
Host
Lowering
Canonical form
Build time, compile time, and runtime
Backend and middle-end
Application binary interface
Encoding
Working with basic structures
Module
A module at the LLVM IR level
A module at the Machine IR level
Function
A function in the LLVM IR
A function in the Machine IR
Basic block
A basic block in the LLVM IR
A basic block in the Machine IR
Instruction
An instruction in the LLVM IR
An instruction in the Machine IR
Control flow graph
Reverse post-order traversal
Backedge
Critical edge
Irreducible graph
Building your first IRs
Building your first LLVM IR
A walk over the required APIs
Your turn
Building your first Machine IR
A walk over the required APIs
Your turn
Summary
Quiz time
Writing Your First Optimization
Technical requirements
The concept of value
SSA
Constructing the SSA form
Dominance
Def-use and use-def chains
Def-use and use-def chains in the LLVM IR
Def-use and use-def chains in the Machine IR
Tackling optimizations
Legality
Integer overflow/underflow
Fast-math flags
Side effects
Profitability
Instruction lowering – TargetTransformInfo and TargetLowering
Library support – TargetLibraryInfo
Datatype properties – DataLayout
Register pressure
Basic block frequency
More precise instruction properties – scheduling model and instruction description
Transformation jargon
Instcombine
Fixed point
Liveness
Hoisting
Sinking
Folding
Loops
Terminology
Preheader
Header
Exiting block
Latch
Exit block
Where to get loop information
Writing a simple constant propagation optimization
The optimization
Simplifying assumptions
Missing APIs
The Constant class
The APInt class
Creating a constant
Replacing a value
Your turn
Going further
Legality
Profitability
Propagating constants across types
Summary
Quiz time
Dealing with Pass Managers
Technical requirements
What is a pass?
What is a pass manager?
The legacy and new pass manager
Pass managers’ capabilities
Populating a pass manager
Inner workings of pass managers
Creating a pass
Writing a pass for the legacy pass manager
Using the proper base class
Expressing the dependencies of a pass
Preserving analyses
Specificities of the Pass class
Writing a pass for the new pass manager
Implementing the right method
Registering an analysis
Describing the effects of your pass
Inspecting the pass pipeline
Available developer tools
Plumbing up the information you need
Interpreting the logs of pass managers
The pass pipeline structure
Time profile
Your turn
Writing your own pass
Writing your own pass pipeline
Summary
Further reading
Quiz time
TableGen – LLVM Swiss Army Knife for Modeling
Technical requirements
Getting started with TableGen
The TableGen programming language
Types
Programming with TableGen
Defining multiple records at once
Assigning fields
Discovering a TableGen backend
General information on TableGen backends for LLVM
Discovering a TableGen backend
The implementation of intrinsics
The content of a generated file
The source of a TableGen backend
Debugging the TableGen framework
Identifying the failing component
Cracking open a TableGen backend
Summary
Further reading
Quiz time
Part 2: Middle-End: LLVM IR to LLVM IR
Understanding LLVM IR
Technical requirements
Understanding the need for an IR
What an IR is
Why use an IR?
Introducing LLVM IR
Identifiers
Functions
Basic blocks
Instructions
Types
Single-value types
The label type
Aggregate types
Types in the LLVM IR API
Walking through an example
Target-specific elements in LLVM IR
Intrinsic functions
Triple
Function attributes
Data layout
Application binary interface
Textual versus binary format
LLVM IR API – cheat sheet
Summary
Further reading
Quiz time
Survey of the Existing Passes
Technical requirements
How to find the unknown
Leveraging opt
Using the LLVM code base
Starting from the implementation
Survey of the helper passes
The verifier
The printer
Analysis passes
Target transformation information
Loop information
Alias analysis
Block frequency info
Dominator tree information
Value tracking
Canonicalization passes
The instruction combiner
An example of a canonical rewrite
An example of an optimization
How to use instcombine
The memory to register rewriter
The converter to loop-closed-SSA form
Optimization passes
Interprocedural optimizations
Scalar optimizations
Vectorization
Summary
Further reading
Quiz time
Introducing Target-Specific Constructs
Technical requirements
Adding a new backend in LLVM
Connecting your target to the build system
Registering your target with Clang
Adding a new architecture to the Triple class
Populating the Target instance
Plumbing your Target through Clang
Creating your own intrinsics
The pros and cons of intrinsics
Creating an intrinsic in the backend
Defining our intrinsics
Hooking up the TableGen backend
Teaching LLVM IR about our intrinsics
Connecting an intrinsic to Clang
Writing the .def file by hand
Using the TableGen capabilities
Hooking up the built-in information
Establishing the code generation link
Adding a target-specific TargetTransformInfo implementation
Establishing a connection to your target-specific information
Introducing target-specific costs
Customizing the default middle-end pipeline
Using the new pass manager
Using the legacy pass manager
A one-time setup – assembling a codegen pipeline
Faking the instruction selector
Faking the lowering of the object file
Creating a skeleton for the assembly information
Using the right abstraction
Summary
Further reading
Quiz time
Hands-On Debugging LLVM IR Passes
Technical requirements
The logging capabilities in LLVM
Printing the IR between passes
Printing the debug log
Printing high-level information about what happened
Reducing the input IR size
Extracting a subset of the input IR
Shrinking the IR automatically
Using sanitizers
A crash course on LLDB
Starting a debugging session
Controlling the execution
Stopping the program
Command resolution
Resuming the execution
Inspecting the state of a program
The LLVM code base through a debugger
Summary
Further reading
Quiz time
Part 3: Introduction to the Backend
Getting Started with the Backend
Technical requirements
Introducing the Machine IR
Here comes the Machine IR
The Machine IR textual representation
The .mir file format
A primer on the YAML syntax
The semantics of the different fields
Mapping the content of a .mir file to the C++ API
A deep dive into the body of a MachineFunction instance
Working with a .mir file
Generating a .mir file
Running passes
Shrinking a .mir file
The anatomy of a MachineInstr instance
Introducing the MC layer
Working with MachineOperand instances
Unboxing a MachineOperand instance
Dealing with explicit and implicit operands
Understanding the constraints of an operand
Working with registers
The concept of the register class
The concept of sub-registers
The concept of register tuples
The concept of register units
The registers and SSA and non-SSA forms
Interacting with registers in the debugger
Creating MachineInstr objects
Describing registers
Writing the target description
Describing instructions
Summary
Further reading
Quiz time
Getting Started with the Machine Code Layer
Technical requirements
The use of the MC layer
Connecting the MC layer
What instructions to describe
Augmenting the target description with MC information
Defining the MC layer for the registers
Defining the MC layer for the instructions
Enabling MC-based tools
Leveraging TableGen
Implementing the missing pieces
Implementing your own MCInstPrinter class
Implementing your own MCCodeEmitter class
Implementing your own XXXAsmParser class
Summary
Quiz time
The Machine Pass Pipeline
Technical requirements
The Machine pass pipeline at a glance
Injecting passes
Using the generic Machine optimizations
Generic passes worth mentioning
The CodeGenPrepare pass
The PeepholeOptimizer pass
The MachineCombiner pass
Summary
Further reading
Quiz time
Part 4: LLVM IR to Machine IR
Getting Started with Instruction Selection
Technical requirements
Overview of the instruction selection frameworks
How does instruction selection work?
Framework complementarity
Overall differences between the selectors
Compile time
Modularity and testability
Scope
Which selector to use?
FastISel
SDISel
GlobalISel
Selectors’ inner workings
Understanding the DAG representation
Textual representation of the SelectionDAG class
Manipulating a DAG
Understanding the generic Machine IR
Textual representation of generic attributes
Lowering constraints of the generic Machine IR
APIs to work with the generic Machine IR
Groundwork to connect the codegen pipeline
Instantiating the codegen pass pipeline
Providing the key target APIs to the codegen pipeline
Connecting SDISel to the codegen pipeline
Connecting FastISel to the codegen pipeline
Connecting GlobalISel to the codegen pipeline
Choosing between different selectors
Summary
Further reading
Quiz time
Instruction Selection: The IR Building Phase
Technical requirements
Overview of the IR building
Describing the calling convention
Writing your target description of the calling convention
Connecting the gen-callingconv TableGen backend
Anatomy of the CCValAssign class
Lowering the ABI with SDISel
Implementing the lowering of formal arguments
Providing custom description for the SDNode class
Handling of stack locations
Lowering the ABI with FastISel
Lowering the ABI with GlobalISel
Summary
Further reading
Quiz time
Instruction Selection: The Legalization Phase
Technical requirements
Legalization overview
Legalization actions
Legalization in SDISel
Describing your legal types
Describing your legalization actions
Implementing a custom legalization action
Legalization in GlobalISel
Describing your legalization actions with the LegalizeRuleSet class
Custom legalization in GlobalISel
Summary
Quiz time
Instruction Selection: The Selection Phase and Beyond
Technical requirements
Register bank selection
The goal of the register bank selection
Describing the register banks
Implementing your RegisterBankInfo class
Instruction selection
Expressing your selection patterns
Introduction to the selection patterns
Advanced selection patterns
Selection in SDISel
Selection in FastISel
Selection in GlobalISel
Setting up the InstructionSelector class
Importing the selection patterns
Going beyond patterns
Finalizing the selection pipeline
Using custom inserters
Customizing the TargetLowering::finalizeLowering method
Optimizations
Using the DAGCombiner framework
Leveraging the combiner framework
Debugging the selectors
Debugging SDISel
Debugging the GlobalISel match table
Summary
Quiz time
Part 5: Final Lowering and Optimizations
Instruction Scheduling
Technical requirements
Overview of the instruction scheduling framework
The ScheduleDAGInstrs class
Changing the scheduling algorithm
The scheduling model
The scheduling events
The processing units
The scheduling bindings
Gluing everything together
Implementing your scheduling model
Connecting your scheduling model
Describing a processor model
Instantiating your subtarget
Guidelines to get started with your scheduling model
Summary
Quiz time
Register Allocation
Technical requirements
Overview of register allocation in LLVM
Enabling the register allocation infrastructure
Introducing the slot indexes
Introducing the live intervals
Maintaining the live intervals
Summary
Further reading
Quiz time
Lowering of the Stack Layout
Technical requirements
Overview of stack lowering
Handling of stack slots
From frame index to stack slot
The lowering of the stack frame
Introducing the reserved call frame
Implementing the frame-lowering target hooks
The expansion of the frame indices
Introducing register scavenging
Provisioning an emergency spill slot
Expanding the frame indices
Summary
Quiz time
Getting Started with the Assembler
Technical requirements
Overview of the lowering of a textual assembly file
Assembling with the LLVM infrastructure
Implementing an assembler
Providing the MCCodeEmitter class
Handling the fixups with the MCAsmBackend class
Recording the relocations with the MCObjectTargetWriter class
Summary
Further reading
Quiz time
Unlock Your Book’s Exclusive Benefits
How to unlock these benefits in three easy steps
Need help?
Other Books You May Enjoy
Index
Cover
Index
In this part, we start with an introduction to the LLVM ecosystem, its community, and the various parts that make up the LLVM infrastructure.
This part assumes that you have no prior experience with LLVM and little to no experience with compilers.
More specifically, in this part, you will learn the following:
How to set up your environment to build and test the different projects that the LLVM infrastructure offersHow to interact with the LLVM community and, in particular, how to seek help and contributeAbout the basic concepts used in compilers and how to manipulate them through the LLVM application programming interfaces (APIs)How to write your first optimization pass and the things to consider while optimizing your programHow to build and customize your optimization pipelineHow TableGen, LLVM’s domain-specific language (DSL), fits into the LLVM infrastructureBy the end of this part, you will have a complete picture of the overall structure of the LLVM infrastructure and will be ready to dive into its inner workings.
This part of the book includes the following chapters:
Chapter 1, Building LLVM and Understanding the Directory StructureChapter 2, Contributing to LLVMChapter 3, Compiler Basics and How They Map to the LLVM APIsChapter 4, Writing Your First OptimizationChapter 5, Dealing with the Pass ManagersChapter 6, TableGen - The LLVM Swiss Army Knife for ModelingThe LLVM infrastructure provides a set of libraries that can be assembled to create different tools and compilers.
LLVM originally stood for Low-Level Virtual Machine. Nowadays, it is much more than that, as you will shortly learn, and people just use LLVM as a name.
Given the sheer volume of code that makes the LLVM repository, it can be daunting to even know where to start.
In this chapter, we will give you the keys to approach and use this code base confidently. Using this knowledge, you will be able to do the following:
Understand the different components that make a compilerBuild and test the LLVM projectNavigate LLVM’s directory structure and locate the implementation of different componentsContribute to the LLVM projectThis chapter covers the basics needed to get started with LLVM. If you are already familiar with the LLVM infrastructure or followed the tutorial from the official LLVM website (https://llvm.org/docs/GettingStarted.html), you can skip it. You can, however, check the Quiz time section at the end of the chapter to see whether there is anything you may have missed.
Unlock exclusive free benefits that come with your purchase, thoughtfully crafted to supercharge your learning journey and help you learn without limits.
Here’s a quick overview of what you get with this book:
Figure 1.1: Illustration of the next-gen Packt Reader’s features
Our web-based reader, designed to help you learn effectively, comes with the following features:
Multi-device progress sync: Learn from any device with seamless progress sync.
Highlighting and notetaking: Turn your reading into lasting knowledge.
Bookmarking: Revisit your most important learnings anytime.
Dark mode: Focus with minimal eye strain by switching to dark or sepia mode.
Figure 1.2: Illustration of Packt’s AI assistant
Our interactive AI assistant has been trained on the content of this book, so it can help you out if you encounter any issues. It comes with the following features:
Summarize it: Summarize key sections or an entire chapter.
AI code explainers: In the next-gen Packt Reader, click the Explain button above each code block for AI-powered code explanations.
Note: The AI assistant is part of next-gen Packt Reader and is still in beta.
Figure 1.3: Free PDF and ePub
Learn without limits with the following perks included with your purchase:
Learn from anywhere with a DRM-free PDF copy of this book.
Use your favorite e-reader to learn using a DRM-free ePub version of this book.
Take a moment to get the most out of your purchase and enjoy the complete learning experience.
https://www.packtpub.com/unlock/9781837637782
Note: Have your purchase invoice ready before you begin.
To work with the LLVM code base, you need specific tools on your system. In this section, we list the required versions of these tools for the latest major LLVM release: 20.1.0.
Later, in Identifying the right version of the tools, you will learn how to find the version of the tools required to build a specific version of LLVM, including older and newer releases and the LLVM top-of-tree (that is, the actively developed repository). Additionally, you will learn how to install them.
With no further due, here are the versions of the tools required for LLVM 20.1.0:
Tool
Required version
Git
None specified
C/C++ toolchain
>=Clang 5.0
>=Apple Clang 10.0
>=GCC 7.4
>=Visual Studio 2019 16.8
CMake
>=3.20.0
Ninja
None specified
Python
>=3.8
Table 1.1: Tools required for LLVM 20.1.0
Furthermore, this book comes with scripts, examples, and more that will ease your journey with learning the LLVM infrastructure. We will specifically list the relevant content in the related sections, but remember that the repository lives at https://github.com/PacktPublishing/LLVM-Code-Generation.
In the Technical requirement section, we already listed which version of tools you needed to work with LLVM 20.1.0. However, LLVM is a lively project and what is required today may be different than what is required tomorrow. Also, to step back a bit, you may not know why you need these tools to begin with and/or how to get them.
This section addresses these questions, and you will learn the following in the process:
The purpose of each required toolHow to check that your environment has the proper toolsHow to install the proper toolsDepending on how familiar you are with development on Linux/macOS, this setup can be tedious or a walk in the park.
Ultimately, this section aims to teach you how to go beyond a fixed release of LLVM by giving you the knowledge required to find the information you need.
If you are familiar with package managers (e.g., the apt-get command-line tool on Linux and Homebrew (https://brew.sh) on macOS), you can skip this part and directly install Git, Clang, CMake, Ninja, and Python through them. For Windows, if you do not have a package manager, the steps provided here are all manual, meaning that if you pick the related Windows binary distribution of the related tools, it should just work. Now, for Windows again, you may be better off installing these tools through Visual Studio Code (VS Code) (https://code.visualstudio.com) via the VS Code’s extensions.
In any case, you might want to double-check which version of these tools you need by going through the Identifying the right version of the tools section.
As mentioned previously, you need a set of specific tools to build the LLVM code base. This section summarizes what each of these tools does and how they work together to build the LLVM project.
This list of tools is as follows:
Git: The software used for the versioning control of LLVMA C/C++ toolchain: The LLVM code base is in C/C++, and as such, we will need a toolchain to build that type of codeCMake: The software used to configure the build systemNinja: The software used to drive the build systemPython: The scripting language and execution environment used for testingFigure 1.1 illustrates how the different tools work together to build an LLVM compiler:
Figure 1.1: The essential command-line tools to build an LLVM compiler
Breaking this figure down, here are the steps it takes:
Git retrieves the source code.CMake generates the build system for a particular driver, such as Ninja, and a particular C/C++ toolchain.Ninja drives the build process.The C/C++ toolchain builds the compiler.Python drives the execution of the tests.The required version of these tools depends on the version of LLVM you are building. For instance, see the Technical requirements section for the latest major release of LLVM, 20.1.0.
To check the required version for a specific release, check out the Getting Started page of the documentation for this release. To get there, perform the following steps:
Go to https://releases.llvm.org/.Scroll down to the Download section.In the documentation column, click on the link named llvm or docs for the release you are interested in. For instance, release 20.1.0 should bring you to a URL such as https://releases.llvm.org/20.1.0/docs/index.html.Scroll down to the Documentation section.Click on Getting Started/Tutorials.Find the Software and the Host C++ Toolchain[...] sections. For instance, for release 20.1.0, the Software section lives at https://releases.llvm.org/20.1.0/docs/GettingStarted.html#software.To find the requirements for LLVM top-of-tree, simply follow the same steps but with the release named Git. This release should have a release date of Current.
You learned how to identify which version of the tools you need to have to be able to work with LLVM. Now, let’s see how to install these versions.
Note
Ninja is the preferred driver of the build system of LLVM. However, LLVM also supports other drivers such as Makefile (the default), Xcode, and, to some extent, Bazel. Feel free to choose what works best for you.
Depending on your operating system (OS), you may have already all the necessary tools installed. You can use the following commands to check which version of the tools are installed and whether they meet the minimum requirements that we described in the previous section:
Tool
Checking the availability
Git
git –version
C/C++ toolchain (LLVM)
clang –version
CMake
cmake –version
Ninja
ninja –version
Python
python3 –version
Table 1.2: Commands to install the right tools
If any of the commands from this table fails or if any of the versions do not meet the minimum requirements, you will have to install/update the related tools.
Assuming you are missing some of the tools, here are the steps to install them from the official websites. Feel free to use your own package manager if you do not want to do this manually.
In a nutshell, you need to do the following:
Go to the official website for the tool.Go to the Downloads page.Download the proper package for your OS.Unpack/install the package to a location of your choice.The official websites are as follows:
Tool
Where to get it
Git
https://git-scm.com/downloads or https://git-scm.com, and then click on Downloads
C/C++ toolchain (LLVM)
https://releases.llvm.org or https://www.llvm.org, and then click on All Releases
CMake
https://cmake.org/download/ or https://cmake.org/, and then click on Downloads
Ninja
https://github.com/ninja-build/ninja/releases or https://ninja-build.org, and then click on download the Ninja binary
Python
https://www.python.org/downloads/ or https://www.python.org, and then click on Downloads
Table 1.3: Websites where you can find the required tools
Note that, on macOS, Git and Clang come with the Xcode CLI package. To install them on this OS, please run the following command:
$ xcode-select --installQuick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button (1) to quickly copy code into your coding environment, or click the Explain button (2) to get the AI assistant to explain a block of code to you.
The next-gen Packt Reader is included for free with the purchase of this book. Unlock it by scanning the QR code below or visiting https://www.packtpub.com/unlock/9781837637782.
To make things easier, you will find a script that can help you set up the environment for macOS in the ch1 directory of the Git repository of this book.
If you do not have Git, you can get this script with the following command:
$ curl --location https://raw.githubusercontent.com/PacktPublishing/LLVM-Code-Generation/main/ch1/setup_env.sh --output setup_env.shIf you have Git, simply run the following command:
$ git clone https://github.com/PacktPublishing/LLVM-Code-Generation.git $ cd LLVM-Code-Generation/ch1After you get the script one way or another, run the following command:
$ bash setup_env.sh ${INSTALL_PREFIX}INSTALL_PREFIX is the path where you want the tools to be installed.
At this point, you know how to identify the required version of the tools to build LLVM. You also acquired a basic understanding of how these tools interact with each other during the build process.
From this point forward, we will assume that you have all the necessary tools available at one of the directories recorded in the PATH environment variable. In other words, you can use these tools without having to explicitly set their path on the command line.
Now that we have taken care of the setup of the environment, we can start playing with LLVM.
In this section, we will introduce the different parts of what makes a compiler and how they relate to the LLVM code base. In the process, you will do the following:
Understand the overall architecture of a compilerLearn how to build Clang from the sourceBe able to decide which components of LLVM you need to buildIf you are already familiar with the components of a compiler toolchain and want to jump straight into the action, skip directly to the Building LLVM section.
The definition of a compiler means different things for different people. For instance, for a student in their first year of computer science, a compiler may be seen as a tool that translates a source language into executable code. This is a possible definition, but it is also a very coarse-grain one.
When you look closer at a compiler, you will find that it is a collection of different tools, or libraries, working together to achieve this translation. That’s why we talk about a compiler toolchain.
To go back to the previous coarse-grain definition, a compiler, such as Clang, is a compiler driver: it invokes the different tools in the right order and pulls the related dependencies from the standard library to produce the final executable code.
The LLVM code base reflects the composability of these tools. It is organized as a set of libraries that you can use to build a variety of tools and, in particular, a compiler toolchain.
To get a better understanding of which tools are right to build for your particular project, let us see which components are involved with a concrete example: Clang.
To build an executable from a C file, Clang, a C/C++ compiler built on top of LLVM, orchestrates three different components: the frontend, the backend, and the linker. Additionally, Clang has to pull in dependencies that are expected by the system/language, such as the standard library, so that the following happens:
The frontend has access to the standard headers, for instance, what the prototype of the printf function is.The linker has access to the standard implementations, for instance, the actual implementation of printf.The following picture gives a high-level view of the different parts of a compiler and the different LLVM projects involved in building such a compiler.
Figure 1.2: The different components of a compiler
When building a C file, Clang acts as a driver for a series of tools. It invokes the frontend (Clang project in LLVM), then passes down the result to the backend (LLVM project) that produces an object file that gets linked with the standard library (the libc project in LLVM) by the linker (the lld project in LLVM).
The takeaway is building Clang alone will not be enough to have a properly functioning compiler. To get there, you will need to build at least the linker and the standard library, which come respectively under the lld and the libc/libcxx projects in LLVM. Otherwise, your compiler toolchain will have to rely on what the host provides.
Note
You may have noticed that we did not mention the frontend and backend in this list. This is because, when building the Clang project, these are always included.
In any case, the focus of this book is LLVM backends, so, why are we spending so much time on Clang?
The reason is simple: Clang offers a familiar way to interact with LLVM constructs. By using the Clang frontend, you will be able to generate the LLVM intermediate representation (IR) by simply writing C/C++. We believe this is a gentler way to start your journey with LLVM backends.
As we progress through the book, we will have fewer and fewer C/C++ inputs and more and more LLVM IR ones.
As already mentioned, here, we are only interested in Clang’s frontend capabilities. As such, the following instructions focus only on building this part of LLVM. You will learn more about the possible customizations of the build system in the Building LLVM section.
Assuming LLVM_SRC is the path where you want to have the LLVM source code and CLANG_BUILD is the path where you want the build of Clang to happen, please run the following:
$ git clone https://github.com/llvm/llvm/project.git \ ${LLVM_SRC}$ mkdir -p ${CLANG_BUILD}$ cd${CLANG_BUILD}$ cmake -DLLVM_ENABLE_PROJECTS=clang -GNinja -DCMAKE_BUILD_TYPE=Release ${LLVM_SRC}/llvm $ ninja clangThis will check out the LLVM sources from GitHub, create a build directory, move there, configure the build system for building clang with Ninja, and finally, build Clang.
If you run into any issues, make sure you have all the required tools in PATH (see the Installing the Right Tools section).
When the build finishes, you should have a shiny new clang executable at ${CLANG_BUILD}/bin.
If you ever look deeper into Clang, you will find out that it is composed of many more phases than the frontend, backend, and linker. By playing with Clang’s command-line options, you can expose the intermediate results of some of these phases.
Here is the list of these phases:
Frontend: This validates that the input file is syntactically and semantically correct and produces the LLVM IR.Preprocessor: This expands macros (e.g., #include).Sema: This validates the syntax and semantics of the program.Codegen: This produces the LLVM IR.Backend: This translates the LLVM IR to target specific instructions.Middle-end optimizations: LLVM IR to LLVM IR optimizations.Assembly generation: Target-specific IR to assembly code.Assembler: This translates assembly code to an object file.Here are the options to inspect their results:
To stop
Command
After the preprocessor
clang -E
After syntax checking
clang -fsyntax-only
After LLVM IR code generation
clang -O0 -emit-llvm -S
After the middle-end optimizations
(pick the level you want)
clang -O<1|2|3|s|z> -emit-llvm -S
After assembly generation
(i.e., see the textual representation of the assembly)
clang -S
After the assembler
(i.e., see the object file representation)
clang -c
Table 1.4: Checking the results after each phase
Note
For the commands using -emit-llvm, you can use -c instead of -S if you want to see the binary representation of the LLVM IR, called bitcode, instead of its textual form.
LLVM also offers different tools to reproduce these steps. These tools have different purposes and levels of control, and we will explore them in due time.
Now, you know which components are involved in a compiler toolchain and which part of the LLVM infrastructure covers which component. You scratched the surface of the LLVM build system by building Clang and, in the process, gained a valuable tool to play with the different compilation stages.
Next, let us dive deeper into the LLVM build system by learning how to build the core of components.
This is where your journey as a backend developer starts: you will learn how to build the core LLVM project.
Instead of just dropping a bunch of commands for you to run (we will do some of that too, we promise), you will discover the most relevant knobs that you can use to tailor the build process to your needs.
We believe this is important knowledge to gain as it will help you optimize your development process and increase your productivity by focusing on what you need to build/run for your use cases.
To set the context, the core LLVM project contains all the necessary pieces to build an optimizing backend from LLVM IR down to assembly code/an object file for 20+ different architectures. This is a lot of code and chances are that you do not care about all these architectures. Therefore, at the very least, learning how to build only the ones you care about will save you compile time and down the road will improve your development speed.
LLVM’s official build system is CMake, and everything you know about CMake applies here. If you do not know about CMake, do not worry, we will cover enough to get you going.
CMake comes with some built-in variables that can be used to customize some key aspects of the build process. You will recognize these because their name starts with CMAKE_. We will not go over all of them but instead mention the most useful ones in this context. You can learn more about their meaning or discover new ones by looking directly at the CMake documentation (https://cmake.org/documentation/).
CMake also supports command-line options, but for all intent and purposes, we will mention only three here:
-D<var>=<value>: This defines the value of a CMake variable.-G<generatorName>: This generates a build system for the specified generator.-C<pathToCacheFile>: This preloads a cache file; cache files are useful for sharing specific configurations and avoiding setting all the variables manually. In a nutshell, this is useful to pre-set some CMake variables.With this knowledge, here is one of the simplest commands you can run from your build directory to configure the LLVM’s build system:
$ cmake -GNinja –DCMAKE_BUILD_TYPE=Debug ${LLVM_SRC}/llvmYour system is now ready for development, albeit things are going to be slow:
All the ~20 non-experimental backends will be built.Everything that is built will use the Debug configuration, meaning that the experience is centered around smooth debugging sessions.Regarding Step 2, building for Debug, it may be exactly what you want while you develop the compiler, but this is not something you want the end users to experience!
Here is a list of knobs, all CMake variables, that you should use to speed things up:
Variable
Value
Meaning
Standard options
CMAKE_BUILD_TYPE
Debug
Build for a smooth debug experience:
Assertions: Enabled
Optimizations: Disabled
Debug info: Enabled
Produces a large and slow compiler.
Release
Build an optimized compiler:
Assertions: Disabled
Optimizations: Enabled
Debug info: Disabled
Produces a smaller and faster compiler.
CMAKE_C_COMPILER
<path>
Specify the path to the C compiler.
This is particularly useful when bootstrapping or cross-compiling the compiler.
We will not cover these topics, but at least you know where to look if you are interested in this.
CMAKE_CXX_COMPILER
<path>
Specify the path to the C++ compiler.
CMAKE_INSTALL_PREFIX
<path>
Specify where to install the final artifacts.
Faster build time
LLVM_TARGETS_TO_BUILD
Target1;...
Specify the list of backends to build (semicolon separated).
Target1, and so on, must match the directory name of one of the backends in ${LLVM_SRC}/llvm/lib/Target.
Default to the all special value, which builds all the ~20 non-experimental LLVM backends.
LLVM_OPTIMIZED_TABLEGEN
BOOL
Specify whether or not to build TableGen in optimized mode.
We will cover TableGen in more detail in the dedicated chapter but the gist of it is unless you are developing a TableGen backend, you will likely want to set this variable to speed up your build.
Notably useful
BUILD_SHARED_LIBS
BOOL
Build libraries as shared libraries.
This avoids the link steps for the different executables, but this means they are not self-contained anymore and you have to “ship” the shared libraries alongside them. For local development, this may be worth it, although the debug experience may not be that great.
LLVM_ENABLE_ASSERTIONS
BOOL
Enable or disable assertions.
Using this option, you can for instance enable the assertions in a release build, which can be useful to diagnose some issues while not paying the price of a full debug build.
LLVM_ENABLE_PROJECTS
Project1;...
Build Project1, and so on, on top of the LLVM core.