Modern Computer Architecture and Organization - Jim Ledin - E-Book

Modern Computer Architecture and Organization E-Book

Jim Ledin

0,0
56,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Are you a software developer, systems designer, or computer architecture student looking for a methodical introduction to digital device architectures but overwhelmed by their complexity? This book will help you to learn how modern computer systems work, from the lowest level of transistor switching to the macro view of collaborating multiprocessor servers. You'll gain unique insights into the internal behavior of processors that execute the code developed in high-level languages and enable you to design more efficient and scalable software systems.
The book will teach you the fundamentals of computer systems including transistors, logic gates, sequential logic, and instruction operations. You will learn details of modern processor architectures and instruction sets including x86, x64, ARM, and RISC-V. You will see how to implement a RISC-V processor in a low-cost FPGA board and how to write a quantum computing program and run it on an actual quantum computer. By the end of this book, you will have a thorough understanding of modern processor and computer architectures and the future directions these architectures are likely to take.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 746

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Modern Computer Architecture and Organization

Learn x86, ARM, and RISC-V architectures and the design of smartphones, PCs, and cloud servers


Jim Ledin

BIRMINGHAM—MUMBAI

Modern Computer Architecture and Organization

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Kunal Chaudhari

Acquisition Editor: Denim Pinto

Senior Editor: Afshaan Khan

Content Development Editor: Tiksha Lad

Technical Editor: Gaurav Gala

Copy Editor: Safis Editing

Project Coordinator: Francy Puthiry

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Aparna Bhagat

First published: April 2020

Production reference: 1290420

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-83898-439-7

www.packt.com

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionalsImprove your learning with Skill Plans built especially for youGet a free eBook or video every monthFully searchable for easy access to vital informationCopy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Jim Ledin is the CEO of Ledin Engineering, Inc. Jim is an expert in embedded software and hardware design, development, and testing. He is also accomplished in embedded system cybersecurity assessment and penetration testing. He has a B.S. degree in aerospace engineering from Iowa State University and an M.S. degree in electrical and computer engineering from Georgia Institute of Technology. Jim is a registered professional electrical engineer in California, a Certified Information System Security Professional (CISSP), a Certified Ethical Hacker (CEH), and a Certified Penetration Tester (CPT).

About the reviewer

Roger Spears has over 15 years' experience in the academic field and over 20 years' experience in IT. He has a B.S. in technology from Bowling Green State University and an M.S. in information assurance, specializing in network defense, from Capella University. He has developed and facilitated individual courses for networking, programming, and databases. He was a member of the Protect and Detect working group on the National Cyberwatch Center Curriculum Standards Panel for the Cybersecurity Foundation Series. Roger has been awarded over $750,000 in cybersecurity grants from the government for various academic and industrial customers. He holds certificates from Microsoft and Oracle. He also holds a CCNA and CySA+ certification.

I would like to acknowledge my wife (Leann) and children (Maverick and Sierra) for providing me the opportunity to grow with them and for the freedom to pursue various academic and vocational endeavors. Your understanding and tolerance meant more than I can say with words.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Preface

Section 1: Fundamentals of Computer Architecture

Chapter 1: Introducing Computer Architecture

The evolution of automated computing devices4

Charles Babbage's Analytical Engine4

ENIAC6

IBM PC7

The iPhone11

Moore's law12

Computer architecture16

Binary and hexadecimal numbers16

The 6502 microprocessor20

The 6502 instruction set23

Summary25

Exercises26

Chapter 2: Digital Logic

Electrical circuits30

The transistor31

Logic gates33

Latches 37

Flip-flops40

Registers42

Adders43

Propagation delay44

Clocking46

Sequential logic47

Hardware description languages48

VHDL49

Summary53

Exercises54

Chapter 3: Processor Elements

A simple processor56

Control unit57

Arithmetic logic unit60

Registers65

The instruction set67

Addressing modes68

Immediate addressing mode68

Absolute addressing mode69

Absolute indexed addressing mode70

Indirect indexed addressing mode72

Instruction categories 74

Memory load and store instructions74

Register-to-register data transfer instructions74

Stack instructions74

Arithmetic instructions75

Logical instructions76

Branching instructions76

Subroutine call and return instructions77

Processor flag instructions77

Interrupt-related instructions77

No operation instruction78

Interrupt processing78

IRQ processing78

NMI processing79

BRK instruction processing80

Input/output operations82

Programmed I/O84

Interrupt-driven I/O84

Direct memory access85

Summary86

Exercises86

Chapter 4: Computer System Components

Technical requirements90

Memory subsystem90

Introducing the MOSFET91

Constructing DRAM circuits with MOSFETs94

The capacitor94

The DRAM bit cell95

DDR4 SDRAM97

Graphics DDR100

Prefetching100

I/O subsystem101

Parallel and serial data buses101

PCI Express104

SATA105

M.2106

USB106

Thunderbolt107

Graphics displays108

VGA109

DVI109

HDMI110

DisplayPort110

Network interface111

Ethernet111

Wi-Fi112

Keyboard and mouse 113

Keyboard113

Mouse114

Modern computer system specifications115

Summary117

Exercises117

Chapter 5: Hardware-Software Interface

Device drivers120

The parallel port121

PCIe device drivers123

Device driver structure124

BIOS126

UEFI128

The boot process129

BIOS boot130

UEFI boot130

Embedded devices132

Operating systems132

Processes and threads134

Scheduling algorithms and process priority137

Multiprocessing141

Summary142

Exercises142

Chapter 6: Specialized Computing Domains

Real-time computing144

Real-time operating systems145

Digital signal processing149

ADCs and DACs149

DSP hardware features152

Signal processing algorithms154

GPU processing159

GPUs as data processors161

Examples of specialized architectures164

Summary166

Exercises167

Section 2: Processor Architectures and Instruction Sets

Chapter 7: Processor and Memory Architectures

Technical Requirements172

The von Neumann, Harvard, and modified Harvard architectures172

The von Neumann architecture172

The Harvard architecture174

The modified Harvard architecture175

Physical and virtual memory176

Paged virtual memory180

Page status bits184

Memory pools185

Memory management unit186

Summary189

Exercises190

Chapter 8: Performance-Enhancing Techniques

Cache memory194

Multilevel processor caches196

Static RAM197

Level 1 cache199

Direct-mapped cache199

Set associative cache203

Fully associative cache204

Processor cache write policies205

Level 2 and level 3 processor caches206

Instruction pipelining208

Superpipelining212

Pipeline hazards213

Micro-operations and register renaming214

Conditional branches216

Simultaneous multithreading217

SIMD processing218

Summary220

Exercises221

Chapter 9: Specialized Processor Extensions

Technical requirements224

Privileged processor modes224

Handling interrupts and exceptions225

Protection rings228

Supervisor mode and user mode231

System calls232

Floating-point mathematics233

The 8087 floating-point coprocessor236

The IEEE 754 floating-point standard238

Power management239

Dynamic voltage frequency scaling240

System security management241

Summary245

Exercises245

Chapter 10: Modern Processor Architectures and Instruction Sets

Technical requirements248

x86 architecture and instruction set248

The x86 register set250

x86 addressing modes254

x86 instruction categories256

x86 instruction formats262

x86 assembly language263

x64 architecture and instruction set266

The x64 register set268

x64 instruction categories and formats269

x64 assembly language269

32-bit ARM architecture and instruction set272

The ARM register set274

ARM addressing modes275

ARM instruction categories278

ARM assembly language281

64-bit ARM architecture and instruction set284

64-bit ARM assembly language285

Summary288

Exercises288

Chapter 11: The RISC-V Architecture and Instruction Set

Technical requirements292

The RISC-V architecture and features292

The RISC-V base instruction set296

Computational instructions296

Control flow instructions297

Memory access instructions297

System instructions298

Pseudo-instructions299

Privilege levels302

RISC-V extensions303

The M extension304

The A extension304

C extension305

The F and D extensions306

Other extensions306

64-bit RISC-V 307

Standard RISC-V configurations308

RISC-V assembly language309

Implementing RISC-V in an FPGA310

Summary315

Exercises316

Section 3: Applications of Computer Architecture

Chapter 12: Processor Virtualization

Technical requirements320

Introducing virtualization320

Types of virtualization321

Categories of processor virtualization324

Virtualization challenges330

Unsafe instructions330

Shadow page tables331

Security331

Virtualizing modern processors332

x86 processor virtualization332

ARM processor virtualization334

RISC-V processor virtualization334

Virtualization tools335

VirtualBox336

VMware Workstation336

VMware ESXi337

KVM337

Xen337

QEMU338

Virtualization and cloud computing338

Summary340

Exercises340

Chapter 13: Domain-Specific Computer Architectures

Technical requirements342

Architecting computer systems to meet unique requirements342

Smartphone architecture 343

iPhone X344

Personal computer architecture348

Alienware Aurora Ryzen Edition gaming desktop348

Ryzen 9 3950X branch prediction349

Nvidia GeForce RTX 2080 Ti GPU349

Aurora subsystems352

Warehouse-scale computing architecture353

WSC hardware354

Rack-based servers356

Hardware fault management359

Electrical power consumption359

The WSC as a multilevel information cache360

Neural networks and machine learning architectures361

Intel Nervana neural network processor361

Summary365

Exercises365

Chapter 14: Future Directions in Computer Architectures

The ongoing evolution of computer architectures368

Extrapolating from current trends370

Moore's law revisited370

The third dimension371

Increased device specialization372

Potentially disruptive technologies373

Quantum physics373

Spintronics374

Quantum computing376

Carbon nanotubes381

Building a future-tolerant skill set382

Continuous learning382

College education384

Conferences and literature386

Summary386

Exercises389

Answers to Exercises

Chapter 1: Introducing Computer Architecture391

Exercise 1391

Answer391

Exercise 2394

Answer394

Exercise 3398

Answer398

Exercise 4403

Answer403

Exercise 5404

Answer404

Exercise 6406

Answer406

Chapter 2: Digital Logic408

Exercise 1408

Answer408

Exercise 2408

Answer409

Exercise 3409

Answer409

Exercise 4411

Answer411

Exercise 5412

Answer412

Exercise 6413

Answer413

Chapter 3: Processor Elements414

Exercise 1414

Answer414

Exercise 2414

Answer415

Exercise 3415

Answer415

Exercise 4416

Answer416

Exercise 5418

Answer418

Exercise 6420

Answer420

Chapter 4: Computer System Components427

Exercise 1427

Answer427

Exercise 2427

Answer427

Chapter 5: Hardware-Software Interface428

Exercise 1428

Answer428

Exercise 2429

Answer429

Chapter 6: Specialized Computing Domains430

Exercise 1430

Answer430

Exercise 2431

Answer431

Exercise 3432

Answer433

Chapter 7: Processor and Memory Architectures434

Exercise 1434

Answer434

Exercise 2434

Answer435

Exercise 3435

Answer436

Chapter 8: Performance-Enhancing Techniques438

Exercise 1438

Answer438

Exercise 2438

Answer438

Exercise 3439

Answer439

Chapter 9: Specialized Processor Extensions439

Exercise 1439

Answer439

Exercise 2442

Answer442

Exercise 3447

Answer447

Exercise 4448

Answer448

Exercise 5448

Answer448

Exercise 6448

Answer448

Exercise 7449

Answer449

Exercise 8449

Answer449

Chapter 10: Modern Processor Architectures and Instruction Sets450

Exercise 1450

Answer450

Exercise 2454

Answer454

Exercise 3460

Answer460

Exercise 4463

Answer463

Exercise 5469

Answer470

Exercise 6472

Answer472

Exercise 7478

Answer478

Exercise 8484

Answer484

Chapter 11: The RISC-V Architecture and Instruction Set491

Exercise 1491

Answer491

Exercise 2492

Answer492

Exercise 3493

Answer493

Exercise 4496

Answer497

Chapter 12: Processor Virtualization500

Exercise 1500

Answer500

Exercise 2501

Answer502

Exercise 3504

Answer504

Chapter 13: Domain-Specific Computer Architectures505

Exercise 1505

Answer506

Exercise 2506

Answer506

Chapter 14: Future Directions in Computer Architectures508

Exercise 1508

Answer508

Exercise 2509

Answer509

Exercise 3510

Answer510

Exercise 4512

Answer512

Other Books You May Enjoy

Leave a review - let other readers know what you think517

Preface

This book presents the key technologies and components employed in modern processor and computer architectures and discusses how various architectural decisions result in computer configurations optimized for specific needs.

To understate the situation quite drastically, modern computers are complicated devices. Yet, when viewed in a hierarchical manner, the functions of each level of complexity become clear. We will cover a great many topics in these chapters and will only have the space to explore each of them to a limited degree. My goal is to provide a coherent introduction to each important technology and subsystem you might find in a modern computing device and explain its relationship to other system components.

I will not be providing a lengthy list of references for further reading. The Internet is your friend in this regard. If you can manage to bypass the clamor of political and social media argumentation on the Internet, you will find yourself in an enormous, cool, quiet library containing a vast quantity of accumulated human knowledge. Learn to use the advanced features of your favorite search engine. Also, learn to differentiate high-quality information from uninformed opinion. Check multiple sources if you have any doubts about the information you're finding. Consider the source: if you are looking for information about an Intel processor, search for documentation published by Intel.

By the end of this book, you will have gained a strong grasp of the computer architectures currently used in a wide variety of digital systems. You will also have developed an understanding of the relevant trends in architectural technology currently underway, as well as some possibly disruptive advances in the coming years that may drastically influence the architectural development of computing systems.

Who this book is for

This book is intended for software developers, computer engineering students, system designers, computer science professionals, reverse engineers, and anyone else seeking to understand the architecture and design principles underlying all types of modern computer systems from tiny embedded devices to smartphones to warehouse-sized cloud server farms. Readers will also explore the directions these technologies are likely to take in the coming years. A general understanding of computer processors is helpful but is not required.

What this book covers

The information in this book is presented in the following sequence:

Chapter 1, Introducing Computer Architecture, begins with a brief history of automated computing devices and describes the significant technological advances that drove leaps in capability. This is followed by a discussion of Moore's law, with an assessment of its applicability over previous decades and the implications for the future. The basic concepts of computer architecture are introduced in the context of the 6502 microprocessor.

Chapter 2, Digital Logic, introduces transistors as switching elements and explains their use in constructing logic gates. We will then see how flip-flops and registers are developed by combining simple gates. The concept of sequential logic, meaning logic that contains state information, is introduced, and the chapter ends with a discussion of clocked digital circuits.

Chapter 3, Processor Elements, begins with a conceptual description of a generic processor. We will examine the concepts of the instruction set, register set, and instruction loading, decoding, execution, and sequencing. Memory load and store operations are also discussed. The chapter includes a description of branching instructions and their use in looping and conditional processing. Some practical considerations are introduced that lead to the necessity for interrupt processing and I/O operations.

Chapter 4, Computer System Components, discusses computer memory and its interface to the processor, including multilevel caching. I/O requirements including interrupt handling, buffering, and dedicated I/O processors are described. We will discuss some specific requirements for I/O devices including the keyboard and mouse, the video display, and the network interface. The chapter ends with descriptive examples of these components in modern computer applications, including smart mobile devices, personal computers, gaming systems, cloud servers, and dedicated machine learning systems.

Chapter 5, Hardware-Software Interface, discusses the implementation of the high-level services a computer operating system must provide, including disk I/O, network communications, and interactions with users. This chapter describes the software layers that implement these features starting at the level of the processor instruction set and registers. Operating system functions, including booting, multiprocessing, and multithreading, are also described.

Chapter 6, Specialized Computing Domains, explores domains of computing that tend to be less directly visible to most users, including real-time systems, digital signal processing, and GPU processing. We will discuss the unique requirements associated with each of these domains and look at examples of modern devices implementing these features.

Chapter 7, Processor and Memory Architectures, takes an in-depth look at modern processor architectures, including the von Neumann, Harvard, and modified Harvard variants. The chapter discusses the implementation of paged virtual memory. The practical implementation of memory management functionality within the computer architecture is introduced and the functions of the memory management unit are described.

Chapter 8, Performance-Enhancing Techniques, discusses a number of performance-enhancing techniques used routinely to reach peak execution speed in real-world computer systems. The most important techniques for improving system performance, including the use of cache memory, instruction pipelining, instruction parallelism, and SIMD processing, are the subjects of this chapter.

Chapter 9, Specialized Processor Extensions, focuses on extensions commonly implemented at the processor instruction set level to provide additional system capabilities beyond generic data processing requirements. The extensions presented include privileged processor modes, floating-point mathematics, power management, and system security management.

Chapter 10, Modern Processor Architectures and Instruction Sets, examines the architectures and instruction set features of modern processor designs including the x86, x64, and ARM processors. One challenge that arises when producing a family of processors over several decades is the need to maintain backward compatibility with code written for earlier-generation processors. The need for legacy support tends to increase the complexity of the later-generation processors. This chapter will examine some of the attributes of these processor architectures that result from supporting legacy requirements.

Chapter 11, The RISC-V Architecture and Instruction Set, introduces the exciting new RISC-V (pronounced risk five) processor architecture and its instruction set. RISC-V is a completely open source, free-to-use specification for a reduced instruction set computer architecture. A complete user-mode (non-privileged) instruction set specification has been released and a number of hardware implementations of this architecture are currently available. Work is ongoing to develop specifications for a number of instruction set extensions. This chapter covers the features and variants available in the RISC-V architecture and introduces the RISC-V instruction set. We will also discuss the applications of the RISC-V architecture in mobile devices, personal computers, and servers.

Chapter 12, Processor Virtualization, introduces the concepts involved in processor virtualization and explains the many benefits resulting from the use of virtualization. The chapter includes examples of virtualization based on open source tools and operating systems. These tools enable the execution of instruction-set-accurate representations of various computer architectures and operating systems on a general-purpose computer. We will also discuss the benefits of virtualization in the development and deployment of real-world software applications.

Chapter 13, Domain-Specific Computer Architectures, brings together the topics discussed in previous chapters to develop an approach for architecting a computer system design to meet unique user requirements. We will discuss some specific application categories, including mobile devices, personal computers, gaming systems, Internet search engines, and neural networks.

Chapter 14, Future Directions in Computer Architectures, looks at the road ahead for computer architectures. This chapter reviews the significant advances and ongoing trends that have resulted in the current state of computer architectures and extrapolates these trends in possible future directions. Potentially disruptive technologies are discussed that could alter the path of future computer architectures. In closing, I will propose some approaches for professional development for the computer architect that should result in a future-tolerant skill set.

To get the most out of this book

Each chapter in this book includes a set of exercises at the end. To get the most from the book, and to cement some of the more challenging concepts in your mind, I recommend you try to work through each exercise. Complete solutions to all exercises are provided in the book and are available online at https://github.com/PacktPublishing/Modern-Computer-Architecture-and-Organization.

In case there's an update to the code examples and answers to the exercises, updates will appear on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Code in Action

Code in Action videos for this book can be viewed at (https://bit.ly/2UWc6Ov). Code in Action videos provide dynamic demonstrations of many of the examples and exercises from this book.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Subtraction using the SBC instruction tends to be a bit more confusing to novice 6502 assembly language programmers."

A block of code is set as follows:

; Add four bytes together using immediate addressing mode

LDA #$04

CLC

ADC #$03

ADC #$02

ADC #$01

Any command-line input or output is written as follows:

C:\>bcdedit

Windows Boot Manager

--------------------

identifier {bootmgr}

Bold: Indicates a new term, an important word, or words that you see onscreen. Here is an example: "Because there are now four sets, the Set field in the physical address reduces to two bits and the Tag field increases to 24 bits."

Tips or important notes

Appear like this.

Get in touch

Any errors in this book are the fault of the author, me. I appreciate receiving feedback on the book including bug reports on the contents. Please submit bug reports on GitHub at https://github.com/PacktPublishing/Modern-Computer-Architecture-and-Organization/issues. Feedback from readers is always welcome.

As necessary, errata will be made available online at https://github.com/PacktPublishing/Modern-Computer-Architecture-and-Organization.

Piracy: If you come across any illegal copies of this work in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, I ask you to leave a review on the site that you purchased it from. Potential readers can then see and use your unbiased opinion to make purchase decisions. Thank you!

For more information about Packt, please visit packt.com.

Section 1: Fundamentals of Computer Architecture

In this section, we will begin at the transistor level and work our way up to the computer system level. You will develop an understanding of the key components of modern computer architectures.

This section comprises the following chapters:

Chapter 1, Introducing Computer ArchitectureChapter 2, Digital LogicChapter 3, Processor ElementsChapter 4, Computer System ComponentsChapter 5, Hardware–Software InterfaceChapter 6, Specialized Computing Domains

Chapter 1: Introducing Computer Architecture

The architecture of automated computing devices has evolved from mechanical systems constructed nearly two centuries ago to the broad array of modern electronic computing technologies we use directly and indirectly every day. Along the way, there have been stretches of incremental technological improvement interspersed with disruptive advances that have drastically altered the trajectory of the industry. These trends can be expected to continue into the future.

In past decades, the 1980s, for example, students and technical professionals eager to learn about computing devices had a limited range of subject matter available for this purpose. If they had a computer of their own, it might have been an IBM PC or an Apple II. If they worked for an organization with a computing facility, they might have used an IBM mainframe or a Digital Equipment Corporation VAX minicomputer. These examples, and a limited number of similar systems, encompassed most people's exposure to computer systems of the time.

Today, numerous specialized computing architectures exist to address widely varying user needs. We carry miniature computers in our pockets and purses that can place phone calls, record video, and function as full participants on the Internet. Personal computers remain popular in a format outwardly similar to the PCs of past decades. Today's PCs, however, are orders of magnitude more capable than the first generations of PCs in terms of computing power, memory size, disk space, graphics performance, and communication capability.

Companies offering web services to hundreds of millions of users construct vast warehouses filled with thousands of closely coordinated computer systems capable of responding to a constant stream of requests with extraordinary speed and precision. Machine learning systems are trained through the analysis of enormous quantities of data to perform complex activities, such as driving automobiles.

This chapter begins by presenting a few key historical computing devices and the leaps in technology associated with them. This chapter will examine modern-day trends related to technological advances and introduce the basic concepts of computer architecture, including a close look at the 6502 microprocessor. These topics will be covered:

The evolution of automated computing devicesMoore's lawComputer architecture

The evolution of automated computing devices

This section reviews some classic machines from the history of automated computing devices and focuses on the major advances each embodied. Babbage's Analytical Engine is included here because of the many leaps of genius contained in its design. The other systems are discussed because they embodied significant technological advances and performed substantial real-world work over their lifetimes.

Charles Babbage's Analytical Engine

Although a working model of the Analytical Engine was never constructed, the detailed notes Charles Babbage developed from 1834 until his death in 1871 described a computing architecture that appeared to be both workable and complete. The Analytical Engine was intended to serve as a general-purpose programmable computing device. The design was entirely mechanical and was to be constructed largely of brass. It was designed to be driven by a shaft powered by a steam engine.

Borrowing from the punched cards of the Jacquard loom, the rotating studded barrels used in music boxes, and the technology of his earlier Difference Engine (also never completed in his lifetime, and more of a specialized calculating device than a computer), the Analytical Engine design was, otherwise, Babbage's original creation.

Unlike most modern computers, the Analytical Engine represented numbers in signed decimal form. The decision to use base-10 numbers rather than the base-2 logic of most modern computers was the result of a fundamental difference between mechanical technology and digital electronics. It is straightforward to construct mechanical wheels with ten positions, so Babbage chose the human-compatible base-10 format because it was not significantly more technically challenging than using some other number base. Simple digital circuits, on the other hand, are not capable of maintaining ten different states with the ease of a mechanical wheel.

All numbers in the Analytical Engine consisted of 40 decimal digits. The large number of digits was likely selected to reduce problems with numerical overflow. The Analytical Engine did not support floating-point mathematics.

Each number was stored on a vertical axis containing 40 wheels, with each wheel capable of resting in ten positions corresponding to the digits 0-9. A 41st number wheel contained the sign: any even number on this wheel represented a positive sign and any odd number represented a negative sign. The Analytical Engine axis was somewhat analogous to the register used in modern processors except the readout of an axis was destructive. If it was necessary to retain an axis's value after it had been read, another axis had to store a copy of the value. Numbers were transferred from one axis to another, or used in computations, by engaging a gear with each digit wheel and rotating the wheel to read out the numerical value. The axes serving as system memory were referred to collectively as the store.

The addition of two numbers used a process somewhat similar to the method of addition taught to schoolchildren. Assume a number stored on one axis, let's call it the addend, was to be added to a number on another axis, let's call it the accumulator. The machine would connect each addend digit wheel to the corresponding accumulator digit wheel through a train of gears. It would then simultaneously rotate each addend digit downward to zero while driving the accumulator digit an equivalent rotation in the increasing direction. If an accumulator digit wrapped around from nine to zero, the next most significant accumulator digit would increment by one. This carry operation would propagate across as many digits as needed (think of adding 1 to 999,999). By the end of the process, the addend axis would hold the value zero and the accumulator axis would hold the sum of the two numbers. The propagation of carries from one digit to the next was the most mechanically complex part of the addition process.

Operations in the Analytical Engine were sequenced by music box-like rotating barrels in a construct called the mill, which is analogous to the processing component of a modern CPU. Each Analytical Engine instruction was encoded in a vertical row of locations on the barrel where the presence or absence of a stud at a particular location either engaged a section of the Engine's machinery or left the state of that section unchanged. Based on Babbage's hypothesized execution speed, the addition of two 40-digit numbers, including the propagation of carries, would take about three seconds.

Babbage conceived several important concepts for the Engine that remain relevant today. His design supported a degree of parallel processing that accelerated the computation of series of values for output as numerical tables. Mathematical operations such as addition supported a form of pipelining, in which sequential operations on different data values overlapped in time.

Babbage was well aware of the complexities associated with mechanical devices such as friction, gear backlash, and wear over time. To prevent errors caused by these effects, the Engine incorporated mechanisms called lockings that were applied during data transfers across axes. The lockings forced the number wheels into valid positions and prevented accumulated errors from allowing a wheel to drift to an incorrect value. The use of lockings is analogous to the amplification of potentially weak input signals to produce stronger outputs by the digital logic gates in modern processors.

The Analytical Engine was programmed using punched cards and supported branching operations and nested loops. The most complex program for the Analytical Engine was developed by Ada Lovelace to compute the Bernoulli numbers.

Babbage constructed a trial model of a portion of the Analytical Engine mill, which is currently on display at the Science Museum in London.

ENIAC

ENIAC, the Electronic Numerical Integrator and Computer, was completed in 1945 and was the first programmable general-purpose electronic computer. The system consumed 150 kilowatts of electricity, occupied 1,800 square feet of floor space, and weighed 27 tons.

The design was based on vacuum tubes, diodes, and relays. ENIAC contained over 17,000 vacuum tubes that functioned as switching elements. Similar to the Analytical Engine, it used base-10 representation of ten-digit decimal numbers implemented using ten-position ring counters (the ring counter will be discussed in Chapter 2, Digital Logic). Input data was received from an IBM punch-card reader and the output of computations was sent to a card punch machine.

The ENIAC architecture was capable of complex sequences of processing steps including loops, branches, and subroutines. The system had 20 ten-digit accumulators that were similar to registers in modern computers. However, it did not initially have any memory storage beyond the accumulators. If intermediate values were required for use in later computations, they had to be written to punch cards and read back in when needed. ENIAC could perform about 385 multiplications per second.

ENIAC programs consisted of plugboard wiring and switch-based function tables. Programming the system was an arduous process that often took the team of talented female programmers weeks to complete. Reliability was a problem, as vacuum tubes failed regularly, requiring troubleshooting on a day-to-day basis to isolate and replace failed tubes.

In 1948, ENIAC was improved by adding the ability to program the system via punch cards rather than plugboards. This improvement greatly enhanced the speed with which programs could be developed. As a consultant for this upgrade, John von Neumann proposed a processing architecture based on a single memory region containing program instructions and data, a processing component with an arithmetic logic unit and registers, and a control unit with an instruction register and a program counter. Many modern processors continue to implement this general structure, now known as the von Neumann architecture.

Early applications of ENIAC included analyses related to the development of the hydrogen bomb and the computation of firing tables for long-range artillery.

IBM PC

In the years following the construction of ENIAC, several technological breakthroughs resulted in remarkable advances in computer architectures:

The invention of the transistor in 1947 by John Bardeen, Walter Brattain, and William Shockley delivered a vast improvement over the vacuum tube technology prevalent at the time. Transistors were faster, smaller, consumed less power, and, once production processes had been sufficiently optimized, were much more reliable than the failure-prone tubes.The commercialization of integrated circuits in 1958, led by Jack Kilby of Texas Instruments, began the process of combining large numbers of formerly discrete components onto a single chip of silicon.In 1971, Intel began production of the first commercially available microprocessor, the Intel 4004. The 4004 was intended for use in electronic calculators and was specialized to operate on 4-bit binary coded decimal digits.

From the humble beginning of the Intel 4004, microprocessor technology advanced rapidly over the ensuing decade by packing increasing numbers of circuit elements onto each chip and expanding the capabilities of the microprocessors implemented on the chips.

The 8088 microprocessor

IBM released the IBM PC in 1981. The original PC contained an Intel 8088 microprocessor running at a clock frequency of 4.77 MHz and featured 16 KB of RAM, expandable to 256 KB. It included one or, optionally, two floppy disk drives. A color monitor was also available. Later versions of the PC supported more memory, but because portions of the address space had been reserved for video memory and read-only memory, the architecture could support a maximum of 640 KB of RAM.

The 8088 contained fourteen 16-bit registers. Four were general purpose registers (AX, BX, CX, and DX.) Four were memory segment registers (CS, DS, SS, and ES) that extended the address space to 20 bits. Segment addressing functioned by adding a 16-bit segment register value, shifted left by four bit positions, to a 16-bit offset contained in an instruction to produce a physical memory address within a one megabyte range.

The remaining 8088 registers were the Stack Pointer (SP), the Base Pointer (BP), the Source Index (SI), the Destination Index (DI), the Instruction Pointer (IP), and the Status Flags (FLAGS). Modern x86 processers employ an architecture remarkably similar to this register set (Chapter 10, Modern Processor Architectures and Instruction Sets, will cover the details of the x86 architecture). The most obvious differences between the 8088 and x86 are the extension of the register widths to 32 bits in x86 and the addition of a pair of segment registers (FS and GS) that are used today primarily as data pointers in multithreaded operating systems.

The 8088 had an external data bus width of 8 bits, which meant it took two bus cycles to read or write a 16-bit value. This was a performance downgrade compared to the earlier 8086 processor, which employed a 16-bit external bus. However, the use of the 8-bit bus made the PC more economical to produce and provided compatibility with lower-cost 8-bit peripheral devices. This cost-sensitive design approach helped to reduce the purchase price of the PC to a level accessible to more potential customers.

Program memory and data memory shared the same address space, and the 8088 accessed memory over a single bus. In other words, the 8088 implemented the von Neumann architecture. The 8088 instruction set included instructions for data movement, arithmetic, logical operations, string manipulation, control transfer (conditional and unconditional jumps and subroutine call and return), input/output, and additional miscellaneous functions. The processor required about 15 clock cycles per instruction on average, resulting in an execution speed of 0.3 million instructions per second (MIPS).

The 8088 supported nine distinct modes for addressing memory. This variety of modes was needed to efficiently implement methods for accessing a single item at a time as well as for iterating through sequences of data.

The segment registers in the 8088 architecture provided a clever way to expand the range of addressable memory without increasing the length of most instructions referencing memory locations. Each segment register allowed access to a 64-kilobyte block of memory beginning at a physical memory address defined at a multiple of 16 bytes. In other words, the 16-bit segment register represented a 20-bit base address with the lower four bits set to zero. Instructions could then reference any location within the 64-kilobyte segment using a 16-bit offset from the address defined by the segment register.

The CS register selected the code segment location in memory and was used in fetching instructions and performing jumps and subroutine calls and returns. The DS register defined the data segment location for use by instructions involving the transfer of data to and from memory. The SS register set the stack segment location, which was used for local memory allocation within subroutines and for storing subroutine return addresses.

Programs that required less than 64-kilobyte in each of the code, data, and stack segments could ignore the segment registers entirely because those registers could be set once at program startup (compilers would do this automatically) and remain unchanged through execution. Easy!

Things got quite a bit more complicated when a program's data size increased beyond 64-kilobyte. Compilers for the 8088 architecture distinguished between near and far references to memory. A near pointer represented a 16-bit offset from the current segment register base address. A far pointer contained 32 bits of addressing information: a 16-bit segment register value and a 16-bit offset. Far pointers obviously required 16 bits of extra data memory and they required additional processing time. Making a single memory access using a far pointer involved the following steps:

Save the current segment register contents to a temporary location.Load the new segment value into the register.Access the data (read or write as needed) using an offset from the segment base.Restore the original segment register value.

When using far pointers, it was possible to declare data objects (for example, an array of characters) up to 64 KB in size. If you needed a larger structure, you had to work out how to break it into chunks no larger than 64 KB and manage them yourself. As a result of these segment register manipulations, programs that required extensive access to data larger than 64 KB were susceptible to code size bloat and slower execution.

The IBM PC motherboard also contained a socket for an optional Intel 8087 floating-point coprocessor. The designers of the 8087 invented data formats and processing rules for 32-bit and 64-bit floating point numbers that became enshrined in 1985 as the IEEE 754 floating-point standard, which remains in near-universal use today. The 8087 could perform about 50,000 floating-point operations per second. We will look at floating-point processors in detail in Chapter 9,Specialized Processor Extensions.

The 80286 and 80386 microprocessors

The second generation of the IBM PC, the PC AT, was released in 1984. AT stood for Advanced Technology and referred to several significant enhancements over the original PC that mostly resulted from the use of the Intel 80286 processor.

Like the 8088, the 80286 was a 16-bit processor, and it maintained backward compatibility with the 8088: 8088 code could run unmodified on the 80286. The 80286 had a 16-bit data bus and 24 address lines supporting a 16-megabyte address space. The external data bus width was 16 bits, improving data access performance over the 8-bit bus of the 8088. The instruction execution rate (instructions per clock cycle) was about double the 8088 in many applications. This meant that at the same clock speed the 80286 would be twice as fast as the 8088. The original PC AT clocked the processor at 6 MHz and a later version operated at 8 MHz. The 6 MHz variant of the 80286 achieved an instruction execution rate of about 0.9 MIPS.

The 80286 implemented a protected virtual address mode intended to support multiuser operating systems and multitasking. In protected mode, the processor enforced memory protection to ensure one user's programs could not interfere with the operating system or with other users' programs. This groundbreaking technological advance remained little used for many years, mainly because of the prohibitive cost of adding sufficient memory to a computer system to make it useful in a multiuser, multitasking context.

The next generation of the x86 processor line was the 80386, introduced in 1985. The 80386 was a 32-bit processor with support for a flat 32-bit memory model in protected mode. The flat memory model allowed programmers to address up to 4 GB directly, without the need to manipulate segment registers. Compaq introduced an IBM PC-compatible personal computer based on the 80386 called the DeskPro in 1986. The DeskPro shipped with a version of Microsoft Windows targeted to the 80386 architecture.

The 80386 maintained a large degree of backward compatibility with the 80286 and 8088 processors. The design implemented in the 80386 remains the current standard x86 architecture. Much more about this architecture will be covered in Chapter 10, Modern Processor Architectures and Instruction Sets.

The initial version of the 80386 was clocked at 33 MHz and achieved about 11.4 MIPS. Modern implementations of the x86 architecture run several hundred times faster than the original as the result of higher clock speeds, performance enhancements such as extensive use of cache memory, and more efficient instruction execution at the hardware level.

The iPhone

In 2007, Steve Jobs introduced the iPhone to a world that had no idea it had any use for such a device. The iPhone built upon previous revolutionary advances from Apple Computer including the Macintosh computer in 1984 and the iPod music player in 2001. The iPhone combined the functions of the iPod, a mobile telephone, and an Internet-connected computer.

The iPhone did away with the hardware keyboard that was common on smartphones of the time and replaced it with a touchscreen capable of displaying an on-screen keyboard or any other type of user interface. The screen was driven by the user's fingers and supported multi-finger gestures for actions such as zooming a photo.

The iPhone ran the OS X operating system, the same OS used on the flagship Macintosh computers of the time. This decision immediately enabled the iPhone to support a vast range of applications already developed for Macs and empowered software developers to rapidly introduce new applications tailored to the iPhone, once Apple began allowing third-party application development.

The iPhone 1 had a 3.5" screen with a resolution of 320x480 pixels. It was 0.46 inches thick (thinner than other smartphones), had a 2-megapixel camera built in, and weighed 4.8 oz. A proximity sensor detected when the phone was held to the user's ear and turned off screen illumination and touchscreen sensing during calls. It had an ambient light sensor to automatically set the screen brightness and an accelerometer detected whether the screen was being held in portrait or landscape orientation.

The iPhone 1 included 128 MB of RAM, 4 GB, 8 GB, or 16 GB of flash memory, and supported Global System for Mobile communications (GSM) cellular communication, Wi-Fi (802.11b/g), and Bluetooth.

In contrast to the abundance of openly available information about the IBM PC, Apple was notoriously reticent about releasing the architectural details of the iPhone's construction. Apple released no information about the processor or other internal components of the first iPhone, simply calling it a closed system.

Despite the lack of official information from Apple, other parties have enthusiastically torn down the various iPhone models and attempted to identify the phone's components and how they interconnect. Software sleuths have devised various tests to attempt to determine the specific processor model and other digital devices implemented in the iPhone. These reverse engineering efforts are subject to error, so descriptions of the iPhone architecture in this section should be taken with a grain of salt.

The iPhone 1 processor was a 32-bit ARM11 manufactured by Samsung running at 412 MHz. The ARM11 was an improved variant of previous generation ARM processors and included an 8-stage instruction pipeline and support for Single Instruction-Multiple Data (SIMD) processing to improve audio and video performance. The ARM processor architecture will be discussed further in Chapter 10, Modern Processor Architectures and Instruction Sets.

The iPhone 1 was powered by a 3.7V lithium-ion polymer battery. The battery was not intended to be replaceable, and Apple estimated it would lose about 20 percent of its original capacity after 400 charge and discharge cycles. Apple quoted up to 250 hours of standby time and 8 hours of talk time on a single charge.

Six months after the iPhone was introduced, Time magazine named the iPhone the "Invention of the Year" for 2007. In 2017, Time ranked the 50 Most Influential Gadgets of All Time. The iPhone topped the list.

Moore's law

For those working in the rapidly advancing field of computer technology, it is a significant challenge to make plans for the future. This is true whether the goal is to plot your own career path or for a giant semiconductor corporation to identify optimal R&D investments. No one can ever be completely sure what the next leap in technology will be, what effects from it will ripple across the industry and its users, or when it will happen. One technique that has proven useful in this difficult environment is to develop a rule of thumb, or empirical law, based on experience.

Gordon Moore co-founded Fairchild Semiconductor in 1957 and was later the chairman and CEO of Intel. In 1965, Moore published an article in Electronics magazine in which he offered his prediction of the changes that would occur in the semiconductor industry over the following ten years. In the article, he observed that the number of formerly discrete components such as transistors, diodes, and capacitors that could be integrated onto a single chip had been doubling approximately yearly and the trend was likely to continue over the subsequent ten years. This doubling formula came to be known as Moore's law. This was not a scientific law in the sense of the law of gravity. Rather, it was based on observation of historical trends, and he believed this formulation had some ability to predict the future.

Moore's law turned out to be impressively accurate over those ten years. In 1975, he revised the predicted growth rate for the following ten years to doubling the number of components per integrated circuit every two years rather than yearly. This pace continued for decades, up until about 2010. In more recent years, the growth rate has appeared to decline slightly. In 2015, Brian Krzanich, Intel CEO, stated that the company's growth rate had slowed to doubling about every two and a half years.

Despite the fact that the time to double integrated circuit density is increasing, the current pace represents a phenomenal rate of growth that can be expected to continue into the future, just not quite as rapidly as it once progressed.

Moore's law has proven to be a reliable tool for evaluating the performance of semiconductor companies over the decades. Companies have used it to set goals for the performance of their products and to plan their investments. By comparing the integrated circuit density increases for a company's products against prior performance, and against other companies, it is possible for semiconductor executives and industry analysts to evaluate and score company performance. The results of these analyses have fed directly into decisions to build enormous new fabrication plants and to push the boundaries of ever-smaller integrated circuit feature sizes.

The decades since the introduction of the IBM PC have seen tremendous growth in the capability of single-chip microprocessors. Current processor generations are hundreds of times faster, operate on 32-bit and 64-bit data natively, have far more integrated memory resources, and unleash vastly more functionality, all packed into a single integrated circuit.

The increasing density of semiconductor features, as predicted by Moore's law, has enabled all of these improvements. Smaller transistors run at higher clock speeds due to the shorter connection paths between circuit elements. Smaller transistors also, obviously, allow more functionality to be packed into a given amount of die area. Being smaller and closer to neighboring components allows the transistors to consume less power and generate less heat.

There was nothing magical about Moore's law. It was an observation of the trends in progress at the time. One trend was the steadily increasing size of semiconductor dies. This was the result of improving production processes that reduced the density of defects, hence allowing acceptable production yield with larger integrated circuit dies. Another trend was the ongoing reduction in the size of the smallest components that could be reliably produced in a circuit. The final trend was what Moore referred to as the "cleverness" of circuit designers in making increasingly efficient and effective use of the growing number of circuit elements placed on a chip.

Traditional semiconductor manufacturing processes have begun to approach physical limits that will eventually put the brakes on growth under Moore's law. The smallest features on current commercially available integrated circuits are around 10 nanometers (nm). For comparison, a typical human hair is about 50,000 nm thick and a water molecule (one of the smallest molecules) is 0.28 nm across. There is a point beyond which it is simply not possible for circuit elements to become smaller as the sizes approach atomic scale.

In addition to the challenge of building reliable circuit components from a small number of molecules, other physical effects with names such as Abbe diffraction limit become significant impediments to single-digit nanometer-scale circuit production. We won't get into the details of these phenomena; it's sufficient to know the steady increase in integrated circuit component density that has proceeded for decades under Moore's law is going to become a lot harder to continue over the next few years.

This does not mean we will be stuck with processors essentially the same as those that are now commercially available. Even as the rate of growth in transistor density slows, semiconductor manufacturers are pursuing several alternative methods to continue growing the power of computing devices. One approach is specialization, in which circuits are designed to perform a specific category of tasks extremely well rather than performing a wide variety of tasks merely adequately.

Graphical Processing Units (GPUs) are an excellent example of specialization. Original GPUs focused exclusively on improving the speed at which three-dimensional graphics scenes could be rendered, mostly for use in video gaming. The calculations involved in generating a three-dimensional scene are well defined and must be applied to thousands of pixels to create a single frame. The process must be repeated for each subsequent frame, and frames may need to be redrawn at a 60 Hz or higher rate to provide a satisfactory user experience. The computationally demanding and repetitive nature of this task is ideally suited for acceleration via hardware parallelism. Multiple computing units within a GPU simultaneously perform essentially the same calculations on different input data to produce separate outputs. Those outputs are combined to generate the final scene. Modern GPU designs have been enhanced to support other domains, such as training neural networks on massive amounts of data. GPUs will be covered in detail in Chapter 6, Specialized Computing Domains.

As Moore's law shows signs of beginning to fade over the coming years, what advances might take its place to kick off the next round of innovations in computer architectures? We don't know for sure today, but some tantalizing options are currently under intense study. Quantum computing is one example of these technologies. We will cover that technology in Chapter 14, Future Directions in Computer Architectures.

Quantum computing takes advantage of the properties of subatomic particles to perform computations in a manner that traditional computers cannot. A basic element of quantum computing is the qubit, or quantum bit. A qubit is similar to a regular binary bit, but in addition to representing the states 0 and 1, qubits can attain a state that is a superposition of the 0 and 1 states. When measured, the qubit output will always be 0 or 1, but the probability of producing either output is a function of the qubit's quantum state prior to being read. Specialized algorithms are required to take advantage of the unique features of quantum computing.

Another possibility is that the next great technological breakthrough in computing devices will be something that we either haven't thought of, or if we did think about it, we may have dismissed the idea out of hand as unrealistic. The iPhone, discussed in the preceding section, is an example of a category-creating product that revolutionized personal communication and enabled use of the Internet in new ways. The next major advance may be a new type of product, a surprising new technology, or some combination of product and technology. Right now, we don't know what it will be or when it will happen, but we can say with confidence that such changes are coming.

Computer architecture

The descriptions of a small number of key architectures from the history of computing mentioned in the previous section included some terms that may or may not be familiar to you. This section will provide an introduction to the building blocks used to construct modern-day processors and related computer subsystems.

One ubiquitous feature of modern computers is the use of voltage levels to indicate data values. In general, only two voltage levels are recognized: a low level and a high level. The low level is often assigned the value zero and the high level assigned the value one. The voltage at any point in a circuit (digital or otherwise) is analog in nature and can take on any voltage within its operating range. When changing from the low level to the high level, or vice versa, the voltage must pass through all voltages in between. In the context of digital circuitry, the transitions between low and high levels happen quickly and the circuitry is designed to not react to voltages between the high and low levels.

Binary and hexadecimal numbers

The circuitry within a processor does not work directly with numbers, in any sense. Processor circuit elements obey the laws of electricity and electronics and simply react to the inputs provided to them. The inputs that drive these actions result from the code developed by programmers and from the data provided as input to the program. The interpretation of the output of a program as, say, numbers in a spreadsheet, or characters in a word processing program, is a purely human interpretation that assigns meaning to the result of the electronic interactions within the processor. The decision to assign zero to the low voltage and one to the high voltage is the first step in the interpretation process.

The smallest unit of information in a digital computer is a binary digit, called a bit, which represents a discrete data element containing the value zero or one. A number of bits can be placed together to enable representation of a greater range of values. A byte is composed of eight bits placed together to form a single value. The byte is the smallest unit of information that can be read from or written to memory by most modern processors.

A single bit can take on two values: 0 and 1. Two bits placed together can take on four values: 00, 01, 10, and 11. Three bits can take on eight values: 000, 001, 010, 011, 100, 101, 110, and 111. In fact, any number of bits, n, can take on 2n values, where 2n indicates multiplying n copies of two together. An 8-bit byte, therefore, can take on 28 or 256 different values.

The binary number format is not most people's first choice when it comes to performing arithmetic, and working with numbers such as 11101010 can be confusing and error prone, especially when dealing with 32- and 64-bit values. To make working with these numbers somewhat easier, hexadecimal numbers are often used instead. The term hexadecimal is often shortened to hex. In the hexadecimal number system, binary numbers are separated into groups of four bits. Since there are four bits in the group, the number of possible values is 24, or 16. The first ten of these 16 numbers are assigned the digits 0-9. The last six are assigned the letters A-F. Table 1.1 shows the first 16 binary values starting at zero along with the corresponding hexadecimal digit and the decimal equivalent to the binary and hex values.

Table 1.1: Binary, hexadecimal, and decimal numbers

The binary number 11101010 can be represented more compactly by breaking it into two 4-bit groups (1110 and 1010) and writing them as the hex digits EA. Because binary digits can take on only two values, binary is a base-2 number system. Hex digits can take on 16 values, so hexadecimal is base-16. Decimal digits can have ten values, therefore decimal is base-10.

When working with these different number bases, it is possible for things to become confusing. Is the number written as 100 a binary, hexadecimal, or decimal value? Without additional information, you can't tell. Various programming languages and textbooks have taken different approaches to remove this ambiguity. In most cases, decimal numbers are unadorned, so the number 100 is usually decimal. In programming languages such as C and C++, hexadecimal numbers are prefixed by 0x so the number 0x100 is 100 hex. In assembly languages, either the prefix character $, or the suffix h might be used to indicate hexadecimal numbers. The use of binary values in programming is less common, mostly because hexadecimal is preferred due to its compactness. Some compilers support the use of 0b as a prefix for binary numbers.

Hexadecimal number representation

This book uses either the prefix $ or the suffix h to represent hexadecimal numbers, depending on the context. The suffix b will represent binary numbers, and the absence of a prefix or suffix indicates decimal numbers.

Bits are numbered individually within a binary number, with bit zero as the rightmost, least significant bit. Bit numbers increase in magnitude leftward. Some examples should make this clear: In Table 1.1, the binary value 0001b (1 decimal) has bit number zero set and the remaining three bits are cleared. In 0010b (2 decimal), bit 1 is set and the other bits are cleared. In 0100b (4 decimal), bit 2 is set and the other bits are cleared.

Set versus cleared

A bit that is set has the value 1. A bit that is cleared has the value 0.

An 8-bit byte can take on values from $00h to $FF, equivalent to the decimal range 0-255. When performing addition at the byte level, it is possible for the result to exceed 8 bits. For example, adding $01 to $FF results in the value $100. When using 8-bit registers, this represents a carry, which must be handled appropriately.

In unsigned arithmetic, subtracting $01 from $00 results in a value of $FF. This constitutes a wraparound to $FF. Depending on the computation being performed, this may or may not be the desired result.

When desired, negative values can be represented using binary numbers. The most common signed number format in modern processors is two's complement. In two's complement, 8-bit signed numbers span the range from -128 to 127. The most significant bit of a two's complement data value is the sign bit: a 0 in this bit represents a positive value and a 1 represents a negative value. A two's complement number can be negated (multiplied by -1) by inverting all of the bits, adding 1, and ignoring any carry. Inverting a bit means changing a 0 bit to 1 and a 1 bit to 0.

Table 1.2: Negation operation examples

Note that negating zero returns a result of zero, as you would expect mathematically.

Two's complement arithmetic

Two's complement arithmetic is identical to unsigned arithmetic at the bit level. The manipulations involved in addition and subtraction are the same whether the input values are intended to be signed or unsigned. The interpretation of the result as signed or unsigned depends entirely on the intent of the user.

Table 1.3: Signed and unsigned 8-bit numbers

Signed and unsigned representations of binary numbers extend to larger integer data types. 16-bit values can represent unsigned integers from 0 to 65,535 and signed integers in the range -32,768 to 32,767. 32-bit, 64-bit, and even larger integer data types are commonly available in modern programming languages.

The 6502 microprocessor

This section will introduce the architecture of a processor with a relatively simple design compared to more powerful modern processors. The intent here is to provide a whirlwind introduction to some basic concepts shared by processors spanning the spectrum from the very low end to sophisticated modern processors.

The 6502 processor was introduced by MOS Technology in 1975. The 6502 found widespread use in its early years in video game consoles from Atari and Nintendo and in computers marketed by Commodore and Apple. The 6502 continues in widespread use today in embedded systems, with estimates of between five and ten billion (yes, billion) units produced as of 2018. In popular culture, both Bender the robot in Futurama and the T-800 robot in The Terminator appear to have employed the 6502, based on onscreen evidence.