Hands-On Concurrency with Rust - Brian L. Troutwine - E-Book

Hands-On Concurrency with Rust E-Book

Brian L. Troutwine

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Most programming languages can really complicate things, especially with regard to unsafe memory access. The burden on you, the programmer, lies across two domains: understanding the modern machine and your language's pain-points. This book will teach you to how to manage program performance on modern machines and build fast, memory-safe, and concurrent software in Rust. It starts with the fundamentals of Rust and discusses machine architecture concepts. You will be taken through ways to measure and improve the performance of Rust code systematically and how to write collections with confidence. You will learn about the Sync and Send traits applied to threads, and coordinate thread execution with locks, atomic primitives, data-parallelism, and more.

The book will show you how to efficiently embed Rust in C++ code and explore the functionalities of various crates for multithreaded applications. It explores implementations in depth. You will know how a mutex works and build several yourself. You will master radically different approaches that exist in the ecosystem for structuring and managing high-scale systems.

By the end of the book, you will feel comfortable with designing safe, consistent, parallel, and high-performance applications in Rust.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 563

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Concurrency with Rust

 

 

 

 

 

 

 

 

 

 

 

 

Confidently build memory-safe, parallel, and efficient software in Rust

 

 

 

 

 

 

 

 

 

 

Brian L. Troutwine

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On Concurrency with Rust

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Aaron LazarAcquisition Editor: Chaitanya NairContent Development Editor: Akshada IyerTechnical Editor: Mehul SinghCopy Editor: Safis EditingProject Coordinator: Prajakta NaikProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics: Jisha ChirayilProduction Coordinator: Aparna Bhagat

First published: May 2018

Production reference: 1290518

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78839-997-5

www.packtpub.com

For my parents, who taught me how to be. For Hope, with whom life is more joyful than I ever imagined. For Oliver, who was there at the start. He saw what there was to see.
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Brian L. Troutwine is a software engineer with an interest in low-latency and high-scale software. He has worked at Rackspace Hosting, AdRoll, and Postmates. As his first book publishes, he will be starting at Unity Technologies.

He has delivered the following talks:

The Charming Genius of the Apollo Guidance Computer

Getting Uphill on a Candle: Crushed Spines, Detached Retinas, and One Small Step

10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll

Fetching Moths from the Works: Correctness Methods in Software

Build Good Software: Of Politics and Method

I'd like to thank Tom Santero for reading early drafts of this book. I'd like also to thank Miriam Pena, Sonny Scroggin, Irina Guberman, and Brian Mitchell for talking through what the book was and helping me find, thereby, what it could be. Special thanks to Chris Markiewicz for being an excellent roommate in high school. 

About the reviewer

Daniel Durante is an avid coffee drinker/roaster, motorcyclist, archer, welder, and carpenter whenever he isn't programming. From the age of 12 years, he has been involved with web and embedded programming with PHP, Node.js, Golang, Rust, and C. He has worked on text-based browser games that have reached over million active players, created bin-packing software for CNC machines, embedded programming with cortex-m and PIC circuits, high-frequency trading applications, and helped contribute to one of the oldest ORMs of Node.js (SequelizeJS).

He also has reviewed the following books for Packt:

PostgreSQL Developer's Guide

PostgreSQL 9.0 High

Node Cookbook

I would like to thank my parents, my brother, and friends who've all put up with my insanity sitting in front of a computer day in and day out. I would not be here today if it wasn't for their patience, guidance, and love.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Concurrency with Rust

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

Preliminaries – Machine Architecture and Getting Started with Rust

Technical requirements

The machine

The CPU

Memory and caches

Memory model

Getting set up

The interesting part

Debugging Rust programs

Summary

Further reading

Sequential Rust Performance and Testing

Technical requirements

Diminishing returns

Performance

Standard library HashMap

Naive HashMap

Testing with QuickCheck

Testing with American Fuzzy Lop

Performance testing with Criterion

Inspecting with the Valgrind Suite

Inspecting with Linux perf

A better naive HashMap

Summary

Further reading

The Rust Memory Model – Ownership, References and Manipulation

Technical requirements

Memory layout

Pointers to memory

Allocating and deallocating memory

The size of a type

Static and dynamic dispatch

Zero sized types

Boxed types

Custom allocators

Implementations

Option

Cell and RefCell

Rc

Vec

Summary

Further reading

Sync and Send – the Foundation of Rust Concurrency

Technical requirements

Sync and Send

Racing threads

The flaw of the Ring

Getting back to safety

Safety by exclusion

Using MPSC

A telemetry server

Summary

Further reading

Locks – Mutex, Condvar, Barriers and RWLock

Technical requirements

Read many, write exclusive locks – RwLock

Blocking until conditions change – condvar

Blocking until the gang's all here - barrier

More mutexes, condvars, and friends in action

The rocket preparation problem

The rope bridge problem

Hopper—an MPSC specialization

The problem

Hopper in use

A conceptual view of hopper

The deque

The Receiver

The Sender

Testing concurrent data structures

QuickCheck and loops

Searching for crashes with AFL

Benchmarking

Summary

Further reading

Atomics – the Primitives of Synchronization

Technical requirements

Linearizability

Memory ordering – happens-before and synchronizes-with

Ordering::Relaxed

Ordering::Acquire

Ordering::Release

Ordering::AcqRel

Ordering::SeqCst

Building synchronization

Mutexes

Compare and set mutex

An incorrect atomic queue

Options to correct the incorrect queue

Semaphore

Binary semaphore, or, a less wasteful mutex

Summary

Further reading

Atomics – Safely Reclaiming Memory

Technical requirements

Approaches to memory reclamation

Reference counting

Tradeoffs

Hazard pointers

A hazard-pointer Treiber stack

The hazard of Nightly

Exercizing the hazard-pointer Treiber stack

Tradeoffs

Epoch-based reclamation

An epoch-based Treiber stack

crossbeam_epoch::Atomic

crossbeam_epoch::Guard::defer

crossbeam_epoch::Local::pin

Exercising the epoch-based Treiber stack

Tradeoffs

Summary

Further reading

High-Level Parallelism – Threadpools, Parallel Iterators and Processes

Technical requirements

Thread pooling

Slowloris – attacking thread-per-connection servers

The server

The client

A thread-pooling server

Looking into thread pool

The Ethernet sniffer

Iterators

Smallcheck iteration

rayon – parallel iterators

Data parallelism and OS processes – evolving corewars players

Corewars

Feruscore – a Corewars evolver

Representing the domain

Exploring the source

Instructions

Individuals

Mutation and reproduction

Competition – calling out to pMARS

Main

Running feruscore

Summary

Further reading

FFI and Embedding – Combining Rust and Other Languages

Embedding C into Rust – feruscore without processes

The MARS C interface

Creating C-structs from Rust

Calling C functions

Managing cross-language ownership

Running the simulation

Fuzzing the simulation

The feruscore executable

Embedding Lua into Rust

Embedding Rust

Into C

The Rust side

The C side

Into Python

Into Erlang/Elixir

Summary

Further reading

Futurism – Near-Term Rust

Technical requirements

Near-term improvements

SIMD

Hex encoding

Futures and async/await

Specialization

Interesting projects

Fuzzing

Seer, a symbolic execution engine for Rust

The community

Should I use unsafe?

Summary

Further reading

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Welcome. The aim of this book is to teach beginner and moderate Rust programmers how to exploit modern parallel machines in the Rust programming language. This book will contain a variety of information relevant specifically to the Rust programming language, especially with regard to its standard library, but it will also contain information that is more generally applicable but happens to be expressed in Rust. Rust itself is not a terribly inventive language. Its key contribution, from where I sit, is the mainstreaming of affine types with application to memory allocation tracking. In most other respects, it is a familiar systems programming language, one that ought to feel familiar—with a bit of adjustment—to those with a background in GC-less programming languages. This is a good thing, considering our aim here is to investigate concurrency—there is a wealth of information available in the papers and books written about this subject, and we understand and apply their concepts. This book will reference a number of such works, whose contexts are C++ , ATS, ADA, and similar languages.

Who this book is for

If this is your first Rust programming book, I warmly thank you for your enthusiasm, but encourage you to seek out a suitable introduction to the programming language. This book will hit the ground running, and it will likely not be appropriate if you've got questions about the basics. The Rust community has produced excellent documentation, including introductory texts. The Book (https://doc.rust-lang.org/book/first-edition/), first edition, is how many who are already in the community learned the language. The second edition of the book, still in progress at the time of writing, looks to be an improvement over the original text, and it is also recommended. There are many other excellent introductions widely available for purchase, as well.

What this book covers

The material that this book covers is very broad, and it attempts to go into that material at some depth. The material is written so that you can work straight through, but it's also expected that some readers will only be interested in a subset of the content.

Chapter 1, Preliminaries – Machine Architecture and Getting Started with Rust, discusses modern CPU architectures, focusing specifically on x86 and ARM. These two architectures will be the focus of the book. The reader is assumed to be familiar with Rust, but we will also discuss verifying that your installation works as expected.

Chapter 2, Sequential Rust Performance and Testing, introduces inspecting the performance of a Rust program. The details of the underlying computer hardware are especially important in this: cache interactions, memory layout, and exploiting the nature of the problem domain are key to writing fast programs. However, fast doesn't matter if the results are inaccurate. This chapter also focuses on testing in Rust.

Chapter 3, The Rust Memory Model – Ownership, References and Manipulation, discusses the memory model of Rust, focusing specifically on what makes Rust memory safe, how the language is constrained to achieve such safety and how these constraints influence the fundamental types' implementations. The reader will understand the borrow checker and its ways at the close of this chapter.

Chapter 4, Sync and Send – the Foundation of Rust Concurrency, is the first in which notions of concurrency make their appearance. The chapter discusses the Sync and Send traits, both why they exist and their implications. The chapter closes with a concrete demonstration of a multithreaded Rust program. Not the last, either.

Chapter 5, Locks – Mutex, Condvar, Barriers and RWLock, introduces the coarse synchronization methods available to the Rust programmer. Each is examined in turn and demonstrated in context of an industrial Rust project, hopper. The coarse synchronization methods are elaborated on in more detail in a series of smaller projects and data structures.

Chapter 6, Atomics – the Primitives of Synchronization, introduces fine synchronization in terms of atomic primitives available on all modern CPUs. This is an exceedingly difficult topic, and a deep investigation into atomic programming and its methods is carried out. The chapter lock-free, atomic data structures, and production-grade codebases. The reader will construct many of the coarse synchronization mechanisms seen in Chapter 5, Locks – Mutex, Condvar, Barriers and RWLock.

Chapter 7, Atomics – Safely Reclaiming Memory, discusses at length one of the key difficulties of atomic programming in any language—safely deallocating memory. The main three methods—reference counting, hazard pointers, epoch-based reclamation—are each discussed in great detail, and production-worthy codebases are investigated. Crossbeam, especially, is discussed in great detail.

Chapter 8, High-Level Parallelism – Threadpools, Parallel Iterators and Processes, motivates and explains the implementation of thread pooling. Having this knowledge in hand, the Rayon project is investigated and subsequently used in a complex project that benefits greatly from simple data parallelism.

Chapter 9, FFI and Embedding – Combining Rust and Other Languages, extends the final project of Chapter 8, High-Level Parallelism – Threadpools, Parallel Iterators, and Processes by embedding C code into it. The rlua project, a convenient library to extend Rust programs with lua programs, is discussed. The chapter closes by compiling Rust for embedding into C, Python, and Erlang projects.

Chapter 10, Futurism – Near-Term Rust, closes the book with a discussion of the near-term changes to the language that are apropos to parallel programmers, as well as a few miscellaneous remarks.

To get the most out of this book

This book covers a deep topic in a relatively short space. Throughout the text, the reader is expected to be comfortable with the Rust programming language, have access to a Rust compiler and a computer on which to compile, and execute Rust programs. What additional software appears in this book is covered in Chapter 1, Preliminaries – Machine Architecture and Getting Started with Rust, and it is recommended for use but not mandatory. 

The basic premise of this book is as follows—parallel programming is difficult but not impossible. Parallel programming can be done, and done well, when attention is paid to the computing environment and care is put into the validation of the produced program. To that end, each chapter has been written with an eye toward imparting a solid foundation to the reader of the chapter's subject. Once a chapter is digested, the reader will, hopefully, have a solid path into the existing body of literature on that subject. In that, this is a book of beginnings, not ends. 

I strongly encourage the reader to take an active role in reading this book, download the source code, investigate the projects as you read through independent of the book's take, and probe the running programs with the tools available on your operating system. Like anything else, parallel programming ability is a skill that is acquired and honed by practice. 

One final note—allow the process of learning to proceed at its own pace. If one chapter's subject doesn't immediately settle in your mind, that's okay. For me, the process of writing the book was a confluence of flashes of insight and knowledge that unfolded slowly like a flower. I imagine that the process of reading the book will proceed analogously. 

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Concurrency-with-Rust. In case there's an update to the code, it will be updated on the existing GitHub repository.

You can also download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

 

www.packtpub.com

.

Select the

 

SUPPORT

 

tab.

Click on

 

Code Downloads & Errata

.

Enter the name of the book in the

 

Search

 

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The seeds are for XorShiftRng, move to previous line max_in_memory_bytes and max_disk_bytes are for hopper."

A block of code is set as follows:

fn main() { println!("Apollo is the name of a space program but also my dog.");}

Any command-line input or output is written as follows:

> cat hello.rs

fn main() {

println!("Apollo is the name of a space program but also my dog.");

}

> rustc -C opt-level=2 hello.rs

> ./hello

Apollo is the name of a space program but also my dog.

Bold: Indicates a new term, an important word, or words that you see onscreen. 

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected]

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please do leave a review for the book on the site that you purchased it from. Reviews help us at Packt understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Preliminaries – Machine Architecture and Getting Started with Rust

In this chapter, we'll work through the preliminaries of the book, making sure to cover the basics necessary to frame the remainder of the book. This book is about parallel programming in the Rust programing language. It's essential, then, to understand modern computing hardware at a high level. We'll need a sense of how a modern CPU operates, how memory buses influence the CPU's ability to perform work and what it means for a computer to be able to do multiple things at once. In addition, we'll discuss validating your Rust installation, and cover generating executables for the two CPU architectures that this book will be concerned with. 

By the close of this chapter, we will have:

Discussed a high-level model of CPU operations

Discussed a high-level model of computer memory

Had a preliminary discussion of the Rust memory model

Investigated generating runnable Rust programs for x86 and ARM 

Investigated debugging these programs

Technical requirements

This chapter requires any kind of modern computer on which Rust can be installed. The exact details are covered in depth below. The interested reader might choose to invest in an ARM development board. I have used a Raspberry Pi 3 Model B while writing. The Valgrind suite of tools are used below. Many operating systems bundle Valgrind packages, but you can find further installation instructions for your system at valgrind.org. The gdb and lldb tools are used, often installed along with the gcc and llvm compiler toolchains. 

You can find the source code for this book's projects on GitHub: https://github.com/PacktPublishing/Hands-On-Concurrency-with-Rust. This chapter has no relevant source code. 

The machine

In this book, independent of the specific Rust techniques, we'll attempt to teach a kind of mechanical sympathy with the modern parallel computer. There are two model kinds of parallelism we'll touch on—concurrent memory operations and data parallelism. We'll spend most of our time in this book on concurrent memory operations, the kind of parallelism in which multiple CPUs contend to manipulate a shared, addressable memory. Data parallelism, where the CPU is able to operate with a single or multiple instructions on multiple words at a time concurrently, will be touched on, but the details are CPU specific, and the necessary intrinsics are only now becoming available in the base language as this book goes to press. Fortunately, Rust, as a systems language with modern library management, will easily allow us to pull in an appropriate library and emit the correct instructions, or we could inline the assembly ourselves.

Literature on the abstract construction of algorithms for parallel machines must choose a machine model to operate under. The parallel random access machine (PRAM) is common in the literature. In this book, we will focus on two concrete machine architectures:

x86

ARM

These machines were chosen because they are common and because they each have specific properties that will be important when we get to Chapter 6, Atomics – the Primitives of Synchronization. Actual machines deviate from the the PRAM model in important ways. Most obviously, actual machines have a limited number of CPUs and a bounded amount of RAM. Memory locations are not uniformly accessible from each CPU; in fact, cache hierarchies have a significant impact on the performance of computer programs. None of this is to say that PRAM is an absurd simplification, nor is this true of any other model you'll find in literature. What should be understood is that as we work, we'll need to draw lines of abstraction out of necessity, where further detail does not improve our ability to solve problems well. We also have to understand how our abstractions, suited to our own work, relate to the abstractions of others so that we can learn and share. In this book, we will concern ourselves with empirical methods for understanding our machines, involving careful measurement, examination of assembly code, and experimentation with alternative implementations. This will be combined with an abstract model of our machines, more specific to today's machines than PRAM, but still, in the details, focusing on total cache layers, cache sizes, bus speeds, microcode versions, and so forth. The reader is encouraged to add more specificity should the need arise and should they feel so emboldened.

The CPU

The CPU is a device that interprets a stream of instructions, manipulating storage and other devices connected to it in the process. The simplest model of a CPU, and the one that is often introduced first in undergrad computer science, is that a CPU receives an instruction from some nebulous place, performs the interpretation, receives the next instruction, interprets that, and so forth. The CPU maintains an internal pace via an oscillator circuit, and all instructions take a defined number of oscillator pulses—or clock cycles—to execute. In some CPU models, every instruction will take the same number of clock cycles to execute, and in others the cycle count of instructions will vary. Some CPU instructions modify registers, have very low latency with specialized but exceedingly finite memory locations built into the CPU. Other CPU instructions modify the main memory, RAM. Other instructions move or copy information between registers and RAM or vice versa. RAM—whose read/write latency is much higher than that of registers but is much more plentiful—and other storage devices, such as SSDs, are connected to the CPU by specialized buses.

The exact nature of these buses, their bandwidth and transmission latency, varies between machine architectures. On some systems, every location in the RAM is addressable—meaning that it can be read or written to—in constant time from the available CPUs. In other systems, this is not the case—some RAM is CPU-local and some is CPU-remote. Some instructions control special hardware interrupts that cause memory ranges to be written to other bus-connected storage devices. Mostly, these devices are exceedingly slow, compared to the RAM, which is itself slow compared to the registers.

All of this is to explain that in the simplest model of a CPU, where instructions are executed serially, instructions may well end up stalling for many clock cycles while waiting for reads or writes to be executed. To that end, it's important to understand that almost all CPUs—and especially the CPUs we'll concern ourselves with in this book—perform out-of-order executions of their instructions. Just so long as a CPU can prove that two sequences of instructions access the memory distinctly—that is, they do not interfere with each other—then the CPU is free and will probably reorder instructions. This is one of the things that makes C's undefined behavior concerning uninitialized memory so interesting. Perhaps your program's future has already filled in the memory, or perhaps not. Out-of-order execution makes reasoning about a processor's behavior difficult, but its benefit is that CPUs can execute programs much faster by deferring a sequence of instructions that is stalled on some kind of memory access.

In the same spirit, most modern CPUs—and especially the CPUs we'll concern ourselves with in this book—perform branch prediction. Say that branches in our programs tend to branch the same way at execution time—for example, say we have a feature-flag test that is configured to be enabled for the lifetime of a program. CPUs that perform branch prediction will speculatively execute one side of a branch that tends to branch in a certain way while they wait on other stalled instruction pipelines' instructions. When the branch instruction sequence catches up to its branch test, and if the test goes the predicted way, there's already been a great deal of work done and the instruction sequence can skip well ahead. Unfortunately, if the branch was mispredicted, then all this prior work must be torn down and thrown away, and the correct branch will have to be computed, which is quite expensive. It's for this reason that you'll find that programmers who worry about the nitty-gritty performance characteristics of their programs will tend to fret about branches and try to remove them.

All of this is quite power-hungry, reordering instructions to avoid stalls or racing ahead to perform computations that may be thrown away. Power-hungry implies hot, which implies cooling, which implies more power expenditure. All of which is not necessarily great for the sustainability of technological civilization, depending on how the electricity for all this is generated and where the waste heat is dumped. To that end, many modern CPUs integrate some kind of power-scaling features. Some CPUs will lengthen the time between their clock pulses, meaning they execute fewer instructions in a certain span of time than they might otherwise have. Other CPUs race ahead as fast as they normally would and then shut themselves off for a spell, drawing minimal electricity and cooling in the meantime. The exact method by which to build and run a power-efficient CPU is well beyond the scope of this book. What's important to understand is that, as a result of all this, your program's execution speed will vary from run to run, all other things being equal, as the CPU decides whether or not it's time to save power. We'll see this in the next chapter when we manually set power-saving settings.

Memory model

The details of a processor's handling of memory is both complicated and very specific to that processor. Programming languages—and Rust is no exception here—invent a memory model to paper over the details of all supported processors while, ideally, leaving the programmer enough freedom to exploit the specifics of each processor. Systems languages also tend to allow absolute freedom in the form of escape hatches from the language memory model, which is exactly what Rust has in terms of unsafe.

With regard to its memory model, Rust is very much inspired by C++. The atomic orderings exposed in Rust are those of LLVM's, which are those of C++11. This is a fine thing—any literature to do with either C++ or LLVM will be immediately applicable to Rust. Memory order is a complex topic, and it's often quite helpful to lean on material written for C++ when learning. This is especially important when studying up on lock-free/wait-free structures—which we'll see later in this book—as literature on those topics often deals with it in terms of C++. Literature written with C11 in mind is also suitable, if maybe a little less straightforward to translate.

Now, this is not to say that the C++ programmer will be immediately comfortable in concurrent Rust. The digs will be familiar, but not quite right. This is because Rust's memory model also includes a notion of reference consumption. In Rust, a thing in memory must be used zero or one times, but no more. This, incidentally, is an application of a version of linear-type theory called affine typing, if you'd like to read up more on the subject. Now, the consequence of restricting memory access in this way is that Rust is able to guarantee at compile-time safe memory access—threads cannot reference the same location in memory at the same time without coordination; out-of-order access in the same thread are not allowed, and so forth. Rust code is memory safe without relying on a garbage collector. In this book's estimation, memory safety is a good win, even though the restrictions that are introduced do complicate implementing certain kinds of structures that are more straightforward to build in C++ or similar languages.

This is a topic we'll cover in much greater detail in Chapter 3, The Rust Memory Model – Ownership, References and Manipulation.

The interesting part

Let's create a default cargo project and confirm that we can emit the appropriate assembly for our machines. Doing this will be one of the important pillars of this book. Now, choose a directory on disk to place the default project and navigate there. This example will use ~/projects, but the exact path doesn't matter. Then, generate the default project:

~projects > cargo new --bin hello_world Created binary (application) `hello_world` project

Go ahead and reward yourself with some x86 assembler:

hello_world > RUSTFLAGS="--emit asm" cargo build --target=x86_64-unknown-linux-gnu Compiling hello_world v0.1.0 (file:///home/blt/projects/hello_world) Finished dev [unoptimized + debuginfo] target(s) in 0.33 secs hello_world > file target/x86_64-unknown-linux-gnu/debug/deps/hello_world-6000abe15b385411.s target/x86_64-unknown-linux-gnu/debug/deps/hello_world-6000abe15b385411.s: assembler source, ASCII text

Please be aware that if your compilation of a Rust binary on OS X x86 will not run on Linux x86, and vice versa. This is due to the differences in the interfaces of the operating systems themselves. You're better off compiling on your x86 Linux machine or your x86 OS X machine and running the binaries there. That's the approach I take with the material presented in this book.

That said, reward yourself with some ARMv7 assembler:

hello_world > RUSTFLAGS="--emit asm" cargo build --target=armv7-unknown-linux-gnueabihf Compiling hello_world v0.1.0 (file:///home/blt/projects/hello_world) Finished dev [unoptimized + debuginfo] target(s) in 0.45 secs hello_world > file target/armv7-unknown-linux-gnueabihf/debug/deps/hello_world-6000abe15b385411.s target/armv7-unknown-linux-gnueabihf/debug/deps/hello_world-6000abe15b385411.s: assembler source, ASCII text

Of course, if you want to build release versions, you need only to give cargo the --release flag:

hello_world > RUSTFLAGS="--emit asm" cargo build --target=armv7-unknown-linux-gnueabihf --release Compiling hello_world v0.1.0 (file:///home/blt/projects/hello_world) Finished release [optimized] target(s) in 0.45 secs hello_world > wc -l target/armv7-unknown-linux-gnueabihf/ debug/ release/ hello_world > wc -l target/armv7-unknown-linux-gnueabihf/*/deps/hello_world*.s 1448 target/armv7-unknown-linux-gnueabihf/debug/deps/hello_world-6000abe15b385411.s 101 target/armv7-unknown-linux-gnueabihf/release/deps/hello_world-dd65a12bd347f015.s 1549 total

It's interesting—and instructive!—to compare the differences in the generated code. Notably, the release compilation will strip debugging entirely. Speaking of which, let's talk debuggers.

Summary

In this chapter, we've discussed the preliminaries of this book, the machine architecture of modern CPUs, and the way loads and stores of memory may be interleaved for surprising results, illustrating the need for some kind of synchronization for our computations. That need for synchronization will drive much of the discussion of this book. We also discussed installing Rust, the channels of the compiler itself, and our intention to focus on the stable channel. We also compiled a simple program for both x86 and ARM systems. We also discussed debugging and performance analysis of our simple program, mostly as a proof-of-concept of the tools. The next chapter, Chapter 2, Sequential Rust Performance and Testing, will explore this area in much more depth.

Further reading

At the end of each chapter, we'll include a list of bibliographic materials, things that are warmly recommended for readers wishing to dive further into the topic discussed in the chapter, links to relevant Rust community discussions, or links to the documentation of important tools. Bibliographic material may appear in multiple chapters because if something's important, it bears repeating.

An Introduction to Parallel Algorithms

, 1992, Joseph JaJa. A fine textbook that introduces important abstract models. The book is significantly focused on abstract implementations of algorithms from an era when cache coherency and instruction pipelines were less important, so do be aware of that if you pull a copy.

What Every Programmer Should Know About Memory

, 2006, Ulrich Drepper.

 

A classic, detailed description of how memory works in modern computers, despite being twelve years old at the time of writing this book.

Computer Architecture: A Quantitative Approach

, 2011, John Hennessy and David Patterson. A classic somewhat more geared toward computer architects than software engineers. Still, this is well worth studying in depth, even if you do skip over the circuit diagrams here and there.

The C11 and C++11 Concurrency Model

, 2014, Mark Batty. Batty formalizes the C++11/C11 memory model, which if you can get up to speed with his logic language, is an excellent way to learn the memory model and its consequences.

LLVM Atomic Instructions and Concurrency Guide

, available at

https://llvm.org/docs/Atomics.html

.

 

Rust has specifically documented its concurrency memory model as being that of LLVM. This guide—and the documentation it links to—will be well-trod territory for any Rust programmer reading this book.

Cache Speculation Side-Channels

, 2018, ARM. Speculative execution of branches leads to surprising information leaks, it turns out. This paper by ARM gives a very clear discussion of speculative execution on ARM, as well as tidy examples.

std::memory_order

, available at

http://en.cppreference.com/w/cpp/atomic/memory_order

. While this document covers the memory order defined in C++, its examples of the consequences of the C++ memory-ordering guarantees are both straightforward and illustrative.

Valgrind User Manual

, available at

http://valgrind.org/docs/manual/manual.html

. We'll be making extensive use of

Valgrind

, and it's well worth it for any systems programmer to be familiar with these tools. The documentation necessarily touches on some of the same material as this book, and may help illuminate things under a different light.

Compiler Explorer

, available at

https://rust.godbolt.org/

Compiler Explorer

is not a paper so much as a well-designed web tool. It allows easy cross-compilation, and refers to simplified explanations of instructions.

Sequential Rust Performance and Testing

"Make it work, then make it beautiful, then if you really, really have to, make it fast."
                                                                                                         - Joe Armstrong

In the previous chapter, we discussed the basics of modern computer architectures—the CPU and its function, memory hierarchies, and their interplay. We left off with a brief introduction to debugging and performance analysis of Rust programs. In this chapter, we'll continue that discussion, digging into the performance characteristics of sequential Rust programs, deferring, for now, considerations of concurrent performance. We'll also be discussing testing techniques for demonstrating the fitness for purpose of a Rust program. Why, in a book about parallel programming, would we wish to devote an entire chapter to just sequential programs? The techniques we'll discuss in this sequential setting are applicable and vital to a parallel setting. What we gain here is the meat of the concern—being fast and correct—without the complication that parallel programming brings, however, we'll come to that in good time. It is also important to understand that the production of fast parallel code comes part and parcel with the production of fast sequential code. That's on account of there being a cold, hard mathematical reality that we'll deal with throughout the book.

By the close of the chapter, we will: 

Have learned about 

Amdahl's and Gustafson's laws

Have investigated the internals of the Rust standard library

HashMap

Be able to use QuickCheck to perform randomized validating of an alternative HashMap implementation

Be able to use American Fuzzy Lop to demonstrate the lack of crashes in the same

Have used Valgrind and Linux Perf to examine the performance of Rust software

Technical requirements

This chapter requires a working Rust installation. The details of verifying your installation are covered in Chapter 1, Preliminaries – Machine Architecture and Getting Started with Rust. The Valgrind suite of tools will be used here. Many operating systems bundle valgrind packages but you can find further installation instructions for your system at valgrind.org. Linux Perf is used and is bundled by many Linux distributions. Any other software required for this chapter is installed as a part of the text.

You can find the source code for this book's projects on GitHub: https://github.com/PacktPublishing/Hands-On-Concurrency-with-Rust. The source code for this chapter is under Chapter02.

Diminishing returns

The hard truth is that there's a diminishing return when applying more and more concurrent computational resources to a problem. Performing parallel computations implies some coordination overhead—spawning new threads, chunking data, and memory bus issues in the presence of barriers or fences, depending on your CPU. Parallel computing is not free. Consider this Hello, world! program:

fn main() { println!("GREETINGS, HUMANS"); }

Straightforward enough, yeah? Compile and run it 100 times:

hello_worlds > rustc -C opt-level=3 sequential_hello_world.rs hello_worlds > time for i in {1..100}; do ./sequential_hello_world > /dev/null; done real 0m0.091s user 0m0.004s sys 0m0.012s

Now, consider basically the same program but involving the overhead of spawning a thread:

use std::thread; fn main() { thread::spawn(|| println!("GREETINGS, HUMANS")) .join() .unwrap(); }

Compile and run it 100 times:

hello_worlds > rustc -C opt-level=3 parallel_hello_world.rs hello_worlds > time for i in {1..100}; do ./parallel_hello_world > /dev/null; done real 0m0.115s user 0m0.004s sys 0m0.012s