50,99 €
The long-awaited x64 edition of the bestselling introduction to Intel assembly language In the newly revised fourth edition of x64 Assembly Language Step-by-Step: Programming with Linux, author Jeff Duntemann delivers an extensively rewritten introduction to assembly language with a strong focus on 64-bit long-mode Linux assembler. The book offers a lighthearted, robust, and accessible approach to a challenging technical discipline, giving you a step-by-step path to learning assembly code that's engaging and easy to read. x64 Assembly Language Step-by-Step makes quick work of programmable computing basics, the concepts of binary and hexadecimal number systems, the Intel x86/x64 computer architecture, and the process of Linux software development to dive deep into the x64 instruction set, memory addressing, procedures, macros, and interface to the C-language code libraries on which Linux is built. You'll also find: * A set of free and open-source development and debugging tools you can download and put to use immediately * Numerous examples woven throughout the book to illustrate the practical implementation of the ideas discussed within * Practical tips on software design, coding, testing, and debugging A one-stop resource for aspiring and practicing Intel assembly programmers, the latest edition of this celebrated text provides readers with an authoritative tutorial approach to x64 technology that's ideal for self-paced instruction.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1098
Veröffentlichungsjahr: 2023
Cover
Table of Contents
Title Page
Introduction
CHAPTER 1: It's All in the Plan
Another Pleasant Valley Saturday
Had This Been the Real Thing …
Assembly Language Programming As a Square Dance
Assembly Language Programming As a Board Game
CHAPTER 2: Alien Bases
The Return of the New Math Monster
Octal: How the Grinch Stole Eight and Nine
Hexadecimal: Solving the Digit Shortage
From Hex to Decimal and from Decimal to Hex
Practice. Practice! PRACTICE!
Arithmetic in Hex
Binary
Hexadecimal as Shorthand for Binary
Prepare to Compute
CHAPTER 3: Lifting the Hood
RAXie, We Hardly Knew Ye
Switches, Transistors, and Memory
The Shop Supervisor and the Assembly Line
The Box That Follows a Plan
What vs. How: Architecture and Microarchitecture
Enter the Plant Manager
CHAPTER 4: Location, Location, Location
The Joy of Memory Models
The Nature of Segments
Segment Registers
The Four Major Assembly Programming Models
64-Bit Long Mode
CHAPTER 5: The Right to Assemble
The Nine and Sixty Ways to Code
Files and What's Inside Them
Text In, Code Out
The Assembly Language Development Process
Linking the Object Code File
Taking a Trip Down Assembly Lane
CHAPTER 6: A Place to Stand, with Access to Tools
Integrated Development Environments
Introducing SASM
Linux and Terminals
Using Linux Make
Debugging with SASM
CHAPTER 7: Following Your Instructions
Build Yourself a Sandbox
Instructions and Their Operands
Source and Destination Operands
Rally Round the Flags, Boys!
Signed and Unsigned Values
Implicit Operands and MUL
Reading and Using an Assembly Language Reference
NEG Negate (Two's Complement; i.e., Multiply by −1)
CHAPTER 8: Our Object All Sublime
The Bones of an Assembly Language Program
Last In, First Out via the Stack
Using Linux Kernel Services Through Syscall
Designing a Nontrivial Program
Going Further
CHAPTER 9: Bits, Flags, Branches, and Tables
Bits Is Bits (and Bytes Is Bits)
Shifting Bits
Bit-Bashing in Action
Flags, Tests, and Branches
X64 Long Mode Memory Addressing in Detail
Character Table Translation
Tables Instead of Calculations
CHAPTER 10: Dividing and Conquering
Boxes within Boxes
Calling and Returning
Local Labels and the Lengths of Jumps
Building External Procedure Libraries
The Art of Crafting Procedures
Simple Cursor Control in the Linux Console
Creating and Using Macros
CHAPTER 11: Strings and Things
The Notion of an Assembly Language String
REP STOSB, the Software Machine Gun
The Semiautomatic Weapon: STOSB Without REP
MOVSB: Fast Block Copies
Storing Data to Discontinuous Strings
Command-Line Arguments, String Searches, and the Linux Stack
The Stack, Its Structure, and How to Use It
CHAPTER 12: Heading Out to C
What's GNU?
Linking to the Standard C Library
Formatted Text Output with printf()
Data In with fgets() and scanf()
Be a Linux Time Lord
Understanding AT&T Instruction Mnemonics
Generating Random Numbers
How C Sees Command-Line Arguments
Simple File I/O
Conclusion: Not the End, But Only the Beginning
Where to Now?
The Art of 64-bit Assembly
by Randall Hyde (No Starch Press, 2022)
Modern x86 Assembly Language Programming
by David Kusswurm (Apress, 2018)
Stepping off Square One
APPENDIX A: The Return of the Insight Debugger
Insight's Shortcomings
Opening a Program Under Insight
Setting Command-Line Arguments with Insight
Running and Stepping a Program
The Memory Window
Showing the Stack in Insight's Memory View
Examining the Stack with Insight's Memory View
Learn gdb!
APPENDIX B: Partial x64 Instruction Reference
What's Been Removed from x64
Flag Results
Size Specifiers
Instruction Index
ADC: Arithmetic Addition with Carry
ADD: Arithmetic Addition
AND: Logical AND
BT: Bit Test
CALL: Call Procedure
CLC: Clear Carry Flag (CF)
CLD: Clear Direction Flag (DF)
CMP: Arithmetic Comparison
DEC: Decrement Operand
DIV: Unsigned Integer Division
INC: Increment Operand
J??: Jump If Condition Is Met
JECXZ: Jump if ECX=0
JRCXZ: Jump If RCX=0
JMP: Unconditional Jump
LEA: Load Effective Address
LOOP: Loop Until CX/ECX/RCX=0
LOOPNZ/LOOPNE: Loop Until CX/ECX/RCX=0 and ZF=0
LOOPZ/LOOPE: Loop Until CX/ECX/RCX=0 and ZF=1
MOV: Copy Right Operand into Left Operand
MOVS: Move String
MOVSX: Copy with Sign Extension
MUL: Unsigned Integer Multiplication
NEG: Negate (Two's Complement; i.e., Multiply by −1)
NOP: No Operation
NOT: Logical NOT (One's Complement)
OR: Logical OR
POP: Copy Top of Stack into Operand
POPF/D/Q: Copy Top of Stack into Flags Register
PUSH: Push Operand onto Top of Stack
PUSHF/D/Q: Push Flags Onto the Stack
RET: Return from Procedure
ROL/ROR: Rotate Left/Rotate Right
SBB: Arithmetic Subtraction with Borrow
SHL/SHR: Shift Left/Shift Right
STC: Set Carry Flag (CF)
STD: Set Direction Flag (DF)
STOS/B/W/D/Q: Store String
SUB: Arithmetic Subtraction
SYSCALL: Fast System Call into Linux
XCHG: Exchange Operands
XLAT: Translate Byte Via Table
XOR: Exclusive OR
APPENDIX C: Character Set Charts
Index
Copyright
Dedication
About the Author
About the Technical Editor
Acknowledgments
End User License Agreement
Chapter 2
Table 2.1: Counting in Martian, Base Fooby
Table 2.2: Powers of Fooby
Table 2.3: Counting in Octal, Base 8
Table 2.4: Octal Columns as Powers of Eight
Table 2.5: Counting in Hexadecimal, Base 16
Table 2.6: Hexadecimal Columns as Powers of 16
Table 2.7: Binary Columns as Powers of 2
Chapter 4
Table 4.1: Collective Terms for Memory
Chapter 6
Table 6.1: The Three Standard Unix Files
Chapter 7
Table 7.1: MOV and Its Operands
Table 7.2: The Ranges of Signed Values
Table 7.3: The MOVSX Instruction
Table 7.4: The MUL Instruction
Table 7.5: The DIV Instruction
Chapter 8
Table 8.1: System Call Conventions for the System V ABI
Chapter 9
Table 9.1: The AND Truth Table for Formal Logic
Table 9.2: The AND Truth Table for Assembly Language
Table 9.3: The OR Truth Table for Assembly Language
Table 9.4: The XOR Truth Table for Assembly Language
Table 9.5: The NOT Truth Table for Assembly Language
Table 9.6: Jump Instruction Mnemonics and Their Synonyms
Table 9.7: Arithmetic Tests Useful After a CMP Instruction
Table 9.8: 64-Bit Long Mode Memory-Addressing Schemes
Chapter 12
Table 12.1: Printf() Formatting Codes
Table 12.2: The Values Contained in the tm Structure
Table 12.3: File Access Codes for Use with
fopen()
Chapter 1
Figure 1.1: The Game of Assembly Language
Chapter 2
Figure 2.1: The anatomy of ∩≡ ⌠ Θ ≡
Figure 2.2: The anatomy of 76225 octal
Figure 2.3: The anatomy of 3C0A9H
Chapter 3
Figure 3.1: Transistor switches and memory cells
Figure 3.2: A RAM chip
Figure 3.3: A simple 1-megabyte memory system
Figure 3.4: The CPU and memory
Figure 3.5: The idea of multitasking
Figure 3.6: A mature protected-mode operating system
Chapter 4
Figure 4.1: The 8080 memory model
Figure 4.2: The 8080 memory model inside an 8086 memory system
Figure 4.3: Seeing a megabyte through 64 KB blinders
Figure 4.4: Memory addresses versus segment addresses
Figure 4.5: Segments and offsets
Figure 4.6: Registers inside registers
Figure 4.7: 8-bit, 16-bit, 32-bit, and 64-bit registers
Figure 4.8: Real-mode flat model
Figure 4.9: The real-mode segmented model
Figure 4.10: 32-bit protected mode flat model
Chapter 5
Figure 5.1: Displaying a Linux text file with the GHex editor
Figure 5.2: Displaying a Windows text file with the GHex editor
Figure 5.3: A Linux text file displayed under Windows
Figure 5.4: Differences in display order versus differences in evaluation or...
Figure 5.5: Big endian versus little endian for a 16-bit value
Figure 5.6: Big endian versus little endian for a 32-bit value
Figure 5.7: What the assembler does
Figure 5.8: The assembler and linker
Figure 5.9: The assembly language development process
Figure 5.10: The Linux Mint Software Manager
Figure 5.11: The anatomy of a NASM command line
Figure 5.12: The anatomy of an
ld
command line
Chapter 6
Figure 6.1: The SASM Build dialog
Figure 6.2: The full SASM window in debug mode
Figure 6.3: Changing Konsole's character encoding to IBM-850
Figure 6.4: I/O redirection
Figure 6.5: Adding a key binding to Konsole
Chapter 7
Figure 7.1: Character strings as immediate data
Figure 7.2: The x64 RFlags register
Chapter 8
Figure 8.1: The stack
Figure 8.2: The stack in program memory
Figure 8.3: How the stack works
Figure 8.4: The “off by one” error
Chapter 9
Figure 9.1: Bit numbering
Figure 9.2: The anatomy of an AND instruction
Figure 9.3: Using XOR to zero a register
Figure 9.4: How the rotate instructions work
Figure 9.5: How the rotate through carry instructions work
Figure 9.6: Using a lookup table
Figure 9.7: A table of 16 three-byte entries
Figure 9.8: Multiplying by shifting
Figure 9.9: x64 long mode memory addressing
Figure 9.10: How address scaling works
Chapter 10
Figure 10.1: Calling a procedure and returning
Figure 10.2: Local labels and the globals that own them
Figure 10.3: Connecting globals and externals
Figure 10.4: How macros work
Chapter 11
Figure 11.1: Using MOVSB on overlapping memory blocks
Figure 11.2: How to access parameters from within SASM
Figure 11.3: The Linux stack at program execution
Chapter 12
Figure 12.1: How
gcc
builds Linux executables
Figure 12.2: The structure of a hybrid C-assembly program
Figure 12.3: A stack frame
Figure 12.4: Accessing command-line arguments from the x64
main()
function
Appendix A
Figure A.1: Insight's memory display of a
.data
section
Figure A.2: Command-line arguments in Insight's memory view
Cover
Title Page
Copyright
Dedication
About the Author
Acknowledgments
Introduction
Table of Contents
Begin Reading
Conclusion: Not the End, But Only the Beginning
APPENDIX A: The Return of the Insight Debugger
APPENDIX B: Partial x64 Instruction Reference
APPENDIX C: Character Set Charts
Index
End User License Agreement
iii
xxix
xxx
xxxi
xxxii
xxxiii
xxxiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
iv
v
vii
ix
598
4TH Edition
Jeff Duntemann
It was 1985, and I was in a chartered bus in New York City, heading for a press reception with a bunch of other restless media egomaniacs. I was only beginning my tech journalist career (as technical editor for PC Tech Journal), and my first book was still months in the future. I happened to be sitting next to an established programming writer/guru, with whom I was impressed and to whom I was babbling about one thing or another. I would like to eliminate this statement; it adds little to the book, and as annoying as he is, even though we don’t name him, I now understand why he’s so annoying: He lives and works in a completely different culture than I do.
During our chat, I happened to let slip that I was a Turbo Pascal fanatic, and what I really wanted to do was learn how to write Turbo Pascal programs that made use of the brand new Microsoft Windows user interface. He wrinkled his nose and grimaced wryly, before speaking the Infamous Question:
“Why would you want to do that?”
I had never heard the question before (though I would hear it many times thereafter), and it took me aback. Why? Because, well, because…I wanted to know how it worked.
“Heh. That's what C is for.”
Further discussion got me nowhere in a Pascal direction. But some probing led me to understand that you couldn't write Windows apps in Turbo Pascal. It was impossible. Or…the programming writer/guru didn't know how. Maybe both. I never learned the truth as it stood in 1985. (Delphi answered the question once and for all in 1995.) But I did learn the meaning of the Infamous Question.
Note well: When somebody asks you, “Why would you want to do that?” what it really means is this: “You've asked me how to do something that is either impossible using tools that I favor or completely outside my experience, but I don't want to lose face by admitting it. So…how 'bout those Blackhawks?”
I heard it again and again over the years:
Q: How can I set up a C string so that I can read its length without scanning it?
A: Why would you want to do
that?
Q: How can I write an assembly language subroutine callable from Turbo Pascal?
A: Why would you want to do
that?
Q: How can I write Windows apps in assembly language?
A: Why would you want to do
that?
You get the idea. The answer to the Infamous Question is always the same, and if the weasels ever ask it of you, snap back as quickly as possible: because I want to know how it works.
That is a completely sufficient answer. It's the answer I've used every single time, except for one occasion a considerable number of years ago, when I put forth that I wanted to write a book that taught people how to program in assembly language as their first experience in programming.
Q: Good grief, why would you want to do
that?
A: Because it's the best way there is to build the skills required to understand how
all the rest
of the programming universe works.
Being a programmer is one thing above all else: It is understanding how things work. Learning to be a programmer, furthermore, is almost entirely a process of learning how things work. This can be done at various levels, depending on the tools you're using. If you're programming in Visual Basic, you have to understand how certain things work, but those things are by and large confined to Visual Basic itself. A great deal of machinery is hidden by the layer that Visual Basic places between the programmer and the computer. (The same is true of Delphi, Lazarus, Java, Python, and many other very high-level programming environments.) If you're using a C compiler, you're a lot closer to the machine, so you see a lot more of that machinery—and must, therefore, understand how it works to be able to use it. However, quite a bit remains hidden, even from the hardened C programmer.
If, on the other hand, you're working in assembly language, you're as close to the machine as you can get. Assembly language hides nothing, and withholds no power. The flipside, of course, is that no magical layer between you and the machine will absolve any ignorance and “take care of” things for you. If you don't understand how something works, you're dead in the water—unless you know enough to be able to figure it out on your own.
That's a key point: My goal in creating this book is not entirely to teach you assembly language per se. If this book has a prime directive at all, it is to impart a certain disciplined curiosity about the underlying machine, along with some basic context from which you can begin to explore the machine at its very lowest levels—that, and the confidence to give it your best shot. This is difficult stuff, but it's nothing you can't master given some concentration, patience, and the time it requires—which, I caution, may be considerable.
In truth, what I'm really teaching you here is how to learn.
To program as I intend to teach, you're going to need a 64-bit Intel computer running a 64-bit distribution of Linux. The one I used in preparing this book is Linux Mint Cinnamon V20. 3 Una. “Una” here is a code name for this version of Linux Mint. It's nothing more than a short way of saying “Linux Mint 20.3.” I recommend Mint; it's thrown me fewer curves than any other distro I've ever used—and I've used Linux here and there ever since it first appeared. I don't think which graphical shell you use matters a great deal. I like Cinnamon, but you can use whatever you like or are familiar with.
You need to be reasonably proficient with Linux at the user level. I can't teach you how to install, configure, and run Linux in this book. If you're not already familiar with Linux, get a tutorial text and work through it. There are many such online.
You'll need a piece of free software called SASM, which is a simple interactive development environment (IDE) for programming in assembly. Basically, it consists of an editor, a build system, and a front end to the standard Linux debugger gdb. You'll also need a free assembler called NASM.
You don't have to know how to download, install, and configure these tools in advance because, at the appropriate times, I’ll cover all necessary tool installation and configuration.
Do note that other Unix implementations not based on the Linux kernel may not function precisely the same way under the hood. BSD Unix uses different conventions for making system calls, for example, and other Unix versions like Solaris are outside my experience.
Remember that this book is about the x64 architecture. To the extent that x64 contains x86, I will also be teaching elements of the x86 architecture. The gulf between 32-bit x86 and 64-bit x64 is a lot narrower than the gulf between 16-bit x86 and 32-bit x86. If you already have a firm grounding in 32-bit x86, you'll breeze through most of this book at a gallop. If you can do that, cool—just please remember that the book is for those who are just starting out in programming on Intel CPUs.
Also remember that this book is limited in size by its publisher: Paper, ink, and cover stock aren't free. That means I have to narrow the scope of what I teach and explain within those limits. I wish I had the space to cover the AVX math subsystem. I don't. But I'll bet that once you go through this book, you can figure much of it out by yourself.
This book starts at the beginning, and I mean the beginning. Maybe you're already there, or well past it. I respect that. I still think that it wouldn't hurt to start at the first chapter and read through all the chapters in order. Review is useful, and hey—you may realize that you didn't know quite as much as you thought you did. (Happens to me all the time!)
But if time is at a premium, here's the cheat sheet:
If you already understand the fundamental ideas of computer programming, skip
Chapter 1
.
If you already understand the ideas behind number bases other than decimal (especially hexadecimal and binary), skip
Chapter 2
.
If you already have a grip on the nature of computer internals (memory, CPU architectures, and so on) skip
Chapter 3
.
If you already understand x64 memory addressing, skip
Chapter 4
.
No. Stop. Scratch that. Even if you already understand x64 memory addressing,
read
Chapter 4
.
The last bullet is there, and emphatic, for a reason: Assembly language programming is about memory addressing. If you don't understand memory addressing, nothing else you learn in assembly will help you one…bit. So, don't skip Chapter 4 no matter what else you know or think you know. Start from there, and see it through to the end. Memory addressing comes up regularly throughout the rest of the book. It's really the heart of the topic.
Load every example program, assemble each one, and run them all. Strive to understand every single line in every program. Take nothing on faith. Furthermore, don't stop there. Change the example programs as things begin to make sense to you. Try different approaches. Try things that I don't mention. Be audacious. Nay, go nuts—bits don't have feelings, and the worst thing that can happen is that Linux throws a segmentation fault, which may hurt your program but does not hurt Linux. The only catch is that when you do try something, strive to understand why it doesn't work as clearly as you understand all the other things that do. Single-step your way through a program in the SASM debugger, even when the program works. Take notes.
That is, ultimately, what I'm after: to show you the way to understand what every however distant corner of your machine is doing and how all its many pieces work together. This doesn't mean I'll explain every corner of it myself—no one will live long enough to do that because computing isn't simple anymore—but if you develop the discipline of patient research and experimentation, you can probably work it out for yourself. Ultimately, that's the only way to learn it: by yourself. The guidance you find—in friends, on the Net, in books like this—is only guidance and grease on the axles. You have to decide who's to be the master, you or the machine, and make it so. Assembly programmers are the only programmers who can truly claim to be masters, which is a truth worth meditating on.
Assembly language is peculiar among programming languages in that there is no universal standard for case-sensitivity. In the C language, all identifiers are case-sensitive, and I have seen assemblers that do not recognize differences in case at all. NASM, the assembler I'm presenting in this book, is case-sensitive only for programmer-defined identifiers. The instruction mnemonics and the names of registers, however, are not case sensitive.
There are customs in the literature on assembly language, and one of those customs is to treat CPU instruction mnemonics as uppercase in the chapter text and in lowercase in source code files and code snippets interspersed within the text. I'll be following that custom here. Within discussion text, I'll speak of MOV and CALL and CMP. In example code, it will be mov and call and cmp. Code snippets and listings will be in a monospace Courier-style font. When mentioned in the text, registers will be in uppercase but not in the Courier font and lowercase in snippets and listings.
There are two reasons for this:
In text discussions, the mnemonics need to stand out. It's too easy to lose track of them amid a torrent of ordinary mixed-case words.
To read and learn from existing documents and source code outside of this one book, you need to be able to easily read assembly language whether it's in uppercase, lowercase, or mixed case. Getting comfortable with different ways of expressing the same things is important.
Anyway. Wherever you choose to start the book, it's time to get underway. Just remember that whatever gets in your face, be it the weasels, the machine, or your own inexperience, the thing to keep in the forefront of your mind is this: You're in it to figure out how it works.
Let's go.
Jeff Duntemann
Scottsdale, Arizona
May 24, 2023
The author’s listings that accompany this book are available from the author website at www.contrapositivediary.com under his heading “My Assembly Language Books.”
“Quick, Mike, get your sister and brother up; it's past 7. Nicky's got Little League at 9, and Dione's got ballet at 10. Give Max his heartworm pill! (We're out of them, Mom, remember?) Your father picked a great weekend to go fishing …. Here, let me give you 10 bucks and go get more pills at the vet's …. My God, that's right, Hank needed gas money and left me broke. There's a teller machine over by Kmart, and if I go there, I can take that stupid toilet seat back and get the right one.
“I guess I'd better make a list ….”
It's another Pleasant Valley Saturday, and 30-odd million suburban homemakers sit down with a pencil and pad at the kitchen table to try to make sense of a morning that would kill and pickle any lesser being. In her mind she thinks of the dependencies and traces the route:
“Drop Nicky at Rand Park, go back to Dempster, and it's about 10 minutes to Golf Mill Mall. Do I have gas? I'd better check first—if not, stop at Del's Shell or I won't make it to Milwaukee Avenue. Milk the teller machine at Golf Mill; then cross the parking lot to Kmart to return the toilet seat that Hank bought last weekend without checking what shape it was. Gotta remember to throw the toilet seat in back of the van—write that at the top of the list.
“By then it'll be half past, maybe later. Ballet is all the way down Greenwood in Park Ridge. No left turn from Milwaukee—but there's the sneak path around behind the mall. I have to remember not to turn right onto Milwaukee like I always do—jot that down. While I'm in Park Ridge, I can check and see if Hank's new glasses are in—should call, but they won't even be open until 9:30. Oh, and groceries—can do that while Dione dances. On the way back I can cut over to Oakton and get the dog's pills.”
In about 90 seconds flat the list is complete:
Throw toilet seat in van.
Check gas––if empty, stop at Del's Shell.
Drop Nicky at Rand Park.
Stop at Golf Mill teller machine.
Return toilet seat at Kmart.
Drop Dione at ballet (remember the sneak path to Greenwood).
See if Hank's glasses are at Pearle Vision—if they are, make double sure they remembered the extra scratch coating.
Get groceries at Jewel.
Pick up Dione.
Stop at vet for heartworm pills.
Drop off groceries at home.
If it's time, pick up Nicky. If not, collapse for a few minutes; then pick up Nicky.
Collapse!
What we often call a “laundry list” (whether it involves laundry or not) is the perfect metaphor for a computer program. Without realizing it, our intrepid homemaker has written herself a computer program and then set out (with herself acting as the computer) to execute it and be done before noon.
Computer programming is nothing more than this: you the programmer write a list of steps and tests. The computer then performs each step and test in sequence. When the list of steps has been executed, the computer stops.
A computer program is a list of steps and tests, nothing more.
Think for a moment about what I call a test in the preceding laundry list. A test is the sort of either/or decision we make dozens or hundreds of times on even the most placid of days, sometimes nearly without thinking about it.
Our homemaker performed a test when she jumped into the van to get started on her adventure. She looked at the gas gauge. The gas gauge would tell her one of two things: (1) she has enough gas, or (2) she doesn't. If she has enough gas, she takes a right and heads for Rand Park. If she doesn't have enough gas, she takes a left down to the corner and fills the tank at Del's Shell. (Del takes credit cards.) Then, with a full tank, she continues the program by making a U-turn and heading for Rand Park.
In the abstract, a test consists of these two parts:
First, you take a look at something that can go one of two ways.
Then you do one of two things, depending on what you saw when you took a look.
Toward the end of the program, our homemaker got home, took the groceries out of the van, and looked at the clock. If it isn't time to get Nicky back from Little League, she has a moment to collapse on the couch in a nearly empty house. If it is time to get Nicky, there's no rest for the ragged: she sprints for the van and heads back to Rand Park.
(Any guesses as to whether she really gets to rest when the program finishes running?)
You might object, saying that many or most tests involve more than two alternatives. Sorry, you're wrong––in every case. Read this twice: except for totally impulsive or psychotic behavior, every human decision comes down to the choice between two alternatives.
What you have to do is look a little more closely at what goes through your mind when you make decisions. The next time you buzz down to Chow Now for fast Chinese, observe yourself while you're poring over the menu. The choice might seem, at first, to be of one item out of 26 Cantonese main courses. Not so—the choice, in fact, is between choosing one item and not choosing that one item. Your eyes rest on chicken with cashews. Naw, too bland. That was a test. You slide down to the next item. Chicken with black mushrooms. Hmmm, no, had that last week. That was another test. Next item: kung pao chicken. Yeah, that's it! That was a third test.
The choice was not among chicken with cashews, chicken with black mushrooms, and chicken with kung pao. Each dish had its moment, poised before the critical eye of your mind, and you turned thumbs up or thumbs down on it, individually. Eventually, one dish won, but it won in that same game of “to eat or not to eat.”
Let me give you another example. Many of life's most complicated decisions come about because 99.99867 percent of us are not nudists. You've been there: you're standing in the clothes closet in your underwear, flipping through your rack of pants. The tests come thick and fast. This one? No. This one? No. This one? No. This one? Yeah. You pick a pair of blue pants, say. (It's a Monday, after all, and blue would seem an appropriate color.) Then you stumble over to your sock drawer and take a look. Whoops, no blue socks. That was a test. So you stumble back to the clothes closet, hang your blue pants back on the pants rack, and start over. This one? No. This one? No. This one? Yeah. This time it's brown pants, and you toss them over your arm and head back to the sock drawer to take another look. Nertz, out of brown socks, too. So it's back to the clothes closet ….
What you might consider a single decision, or perhaps two decisions inextricably tangled (like picking pants and socks of the same color, given stock on hand), is actually a series of small decisions, always binary in nature: pick 'em or don't pick 'em. Find 'em or don't find 'em. The Monday morning episode in the clothes closet is a good analogy of a programming structure called a loop: you keep doing a series of things until you get it right, and then you stop (assuming you're not the kind of geek who wears blue socks with brown pants). But whether you get everything right always comes down to a sequence of simple either/or decisions.
I can almost hear what you're thinking: “Sure, it's a computer book, and he's trying to get me to think like a computer.” Not at all. Computers think like us. We designed them; how else could they think? No, what I'm trying to do is get you to take a long, hard look at how you think. We run on automatic for so much of our lives that we literally do most of our thinking without really thinking about it.
The best model for the logic of a computer program is the same logic we use to plan and manage our daily affairs. No matter what we do, it comes down to a matter of confronting two alternatives and picking one. What we might think of as a single large and complicated decision is nothing more than a messy tangle of many smaller decisions. The skill of looking at a complex decision and seeing all the little decisions in its tummy will serve you well in learning how to program. Observe yourself the next time you have to decide something. Count up the little decisions that make up the big one. You'll be surprised.
And, surprise! You'll be a programmer.
Do not be alarmed. What you have just experienced was a metaphor. It was not the real thing. (The real thing comes later.)
I use metaphors a lot in this book. A metaphor is a loose comparison drawn between something familiar (such as a Saturday morning laundry list) and something unfamiliar (such as a computer program). The idea is to anchor the unfamiliar in terms of the familiar so that when I begin tossing facts at you, you'll have someplace comfortable to lay them down.
The most important thing for you to do right now is keep an open mind. If you know a little bit about computers or programming, don't pick nits. Yes, there are important differences between a homemaker following a scribbled laundry list and a computer executing a program. I'll mention those differences all in good time.
For now, it's still Chapter 1. Take these initial metaphors on their own terms. Later, they'll help a lot.
Carol and I have a certain fondness for “called” dances, the most prevalent type being square dances. There are others, like New England contra dances, which are a lot like square dances but with better music. In a called dance, the caller person at the front of the hall calls out movements, and the dancers perform those movements. The music provides a beat, like the ticking of a clock. The sequence of movements taken together is the dance, and the dance usually has a name.
The first time Carol and I attended a contra dance, I was poleaxed: this was like assembly language programming! The caller called out “allemande left,” and we performed the movement known as “allemande left.” The caller called out “forward and back,” and we executed the “forward and back” movement. The caller called out “box the gnat,” and, well, we boxed the gnat. (I am not making this up!) There are a reasonable number of movements, and to be good at that sort of dancing, you have to memorize them all by name. Otherwise, if the caller calls a movement that you don't know, the dance might stumble or grind to a halt. (Bluescreen!)
At its deepest level, a computer understands a collection of individual operations called instructions. These perform arithmetic, execute logic like AND and OR, move data around, and do many other things. Each instruction is performed inside the CPU chip. Just as a set of dance movements are the individual atoms of motion making up a square dance, instructions are the atoms of a computer program. The program is like the dance as a whole: a sequence of instructions executed in order. The couples taking part in the dance execute the dance/program as the caller moves down the list of movements, calling out each one in turn. The couples, then, are the computer on which the dance runs.
That's about as far as the square dance metaphor goes. Once you get the knack of assembly language, hey, go take square dance or contra dance lessons somewhere and see if you don't come to the same conclusion that I did.
Board games were a really big deal when I was a kid, when board games were actually printed on a species of board. (OK, cardboard.) Monopoly was one that almost everybody had. There was a sort of pathway around the edge of the board divided into squares. You had a game piece that advanced from square to square according to dice throws, and when your piece landed on a square, you could do one of several things: buy property that hadn't been bought yet, pay rent on property owned by other players, pull a card from the Chance stack, or—eek!—go to jail. You had a pile of Monopoly money to spend, and when another player had to pay rent, you got more.
The specifics of the Monopoly game aren't important here. What matters is that you progress through a series of steps, and at each step, something happens. Your pile of money grows or shrinks. Assembly language is a little like that: a program is like the game board. Each step in the program does something. There are places where you can store numbers. The numbers change as you move through the program.
Now that you're thinking in terms of board games, take a look at Figure 1.1. What I've drawn is actually a fair approximation of assembly language as it was used on some of our simpler computers 50 or 60 years ago. The column marked “Program Instructions” is the main path around the edge of the board, of which only a portion can be shown here. This is the assembly language computer program, the actual series of steps and tests that, when executed, causes the computer to do something useful. Setting up this series of program instructions is what programming in assembly language actually is.
Figure 1.1: The Game of Assembly Language
Everything else is odds and ends in the middle of the board that serve the game in progress. Most of these are storage locations that contain your data. You're probably noticing (perhaps with sagging spirits) that there are a lot of numbers involved. (They're weird numbers, too. What, for example, does “004B” mean? I deal with that issue in Chapter 2, “Alien Bases.”) I'm sorry, but that's simply the way the game is played. Assembly language, at its deepest level, is nothing but numbers, and if you hate numbers the way most people hate anchovies, you're going to have a rough time of it. (I like anchovies, which is part of my legend. Learn to like numbers. They're not as salty.) Higher-level programming languages such as Pascal or Python disguise the numbers by treating them symbolically. But assembly language, well, it's just you and the numbers.
I should caution you that the Game of Assembly Language in Figure 1.1 represents no real computer processor, like the Intel Core i5. Also, I've made the names of instructions more clearly understandable than the names of the instructions in Intel assembly language actually are. In the real world, instruction names are typically short things like LAHF, STC, INC, SHRX, and other crypticisms that cannot be understood without considerable explanation. We're easing into this stuff sidewise, and in this chapter I have to sugarcoat certain things a little to draw the metaphors clearly.
Like most board games, the assembly language board game consists of two broad categories of elements: game steps and places to store things. The “game steps” are the steps and tests I've been speaking of all along. The places to store things are just that: cubbyholes into which you can place numbers, with the confidence that those numbers will remain where you put them until you take them out or change them somehow.
In programming terms, the game steps are called code, and the numbers in their cubbyholes (as distinct from the cubbyholes themselves) are called data. The cubbyholes themselves are usually called storage. (The difference between the places you store information and the information you store in them is crucial. Don't confuse them.) Consider an instruction in the Game of Assembly Language that says ADD 32 to A. An ADD instruction in the code alters a data value stored in a cubbyhole named Register A.
Code and data are two very different kinds of critters, but they interact in ways that make the game interesting. The code includes steps that place data into storage (MOVE instructions) and steps that alter data that is already in storage (INCREMENT and DECREMENT instructions, and ADD instructions, among others). Most of the time you'll think of code as being the master of data, in that the code writes data values into storage. Data does influence code as well, however. Among the tests that the code makes are tests that examine data in storage, the COMPARE instructions. If a given data value exists in storage, the code may do one thing; if that value does not exist in storage, the code will do something else, as in the JUMP BACK and JUMP AHEAD instructions.
The short block of instructions marked PROCEDURE is a detour off the main stream of instructions. At any point in the program you can duck out into the procedure, perform its steps and tests, and then return to the very place from which you left. This allows a sequence of steps and tests that is generally useful and used frequently to exist in only one place rather than exist as separate copies everywhere it's needed.
Another critical concept lies in the funny numbers at the left side of the program step locations and data locations. Each number is unique, in that a location tagged with that number appears only once inside the computer. This location is called an address. Data is stored and retrieved by specifying the data's address in the machine. Procedures are called by specifying the address at which they begin.
The little box (which is also a storage location) marked “Program Counter” keeps the address of the next instruction to be performed. The number inside the program counter is increased by one (incremented) each time an instruction is performed unless the instruction tells the program counter to do something else. For example, notice the JUMP BACK 9 instruction at address 004B. When this instruction is performed, the program counter will “back up” by nine locations. This is analogous to the “go back three spaces” concept in most board games.
That's about as much explanation of the Game of Assembly Language as I'm going to offer for now. This is still Chapter 1, and we're still in metaphor territory. People who have had some exposure to computers will recognize and understand some of what Figure 1.1 is doing. People with no exposure to computer innards at all shouldn't feel left behind for being utterly lost. I created the Game of Assembly Language solely to put across the following points:
The individual steps are very simple
. One single instruction rarely does more than move a single value from one storage cubbyhole to another, perform very elementary arithmetic like addition or subtraction, or compare the value contained in one storage cubbyhole to a value contained in another. This is good news, because it allows you to concentrate on the simple task accomplished by a single instruction without being overwhelmed by complexity. The bad news, however, is the next point.
It takes a lot of steps to do anything useful
. You can often write a useful program in such languages as Pascal or BASIC in five or six lines. You can actually create useful programs in visual programming systems like Visual Basic, Delphi, or Lazarus
without writing any code at all
. (The code is still there … but the code is “canned” and all you're really doing is choosing which chunks of canned code in a collection of many such chunks will run.) A useful assembly language program cannot be implemented in fewer than about 50 lines, and anything challenging takes hundreds or thousands—or tens of thousands—of lines. The skill of assembly language programming lies in structuring these hundreds or thousands of instructions so that the program both operates correctly and can still be read and understood by other programmers—and yourself—six months later.
The key to assembly language is understanding memory addresses
. In such languages as Pascal and BASIC, the compiler takes care of where something is located—you simply have to give that something a symbolic name and call it by that name whenever you want to look at it or change it. In assembly language, you must always be cognizant of where things are in your computer's memory or register set. So, in working through this book, pay special attention to the concept of memory addressing, which is nothing more than the art of specifying where something is. The Game of Assembly Language is peppered with addresses and instructions that work with addresses (such as
MOVE data at B to C
, which means move the data stored at the address specified by register B to register C). Addressing is by far the trickiest part of assembly language, but master it and you've got most of the whole thing in your hip pocket.
Everything I've said so far has been orientation. I've tried to give you a taste of the big picture of assembly language and how its fundamental principles relate to the life you've been living all along. Life is a sequence of steps and tests, as are square dances and board games—and so is assembly language. Keep those metaphors in mind as we proceed to get real by confronting the nature of computer numbers.
The year was 1966. Perhaps you were there. (I was 13 and in eighth grade.) New Math burst upon the grade-school curricula of the nation, and homework became a turmoil of number lines, sets, and alternate bases. Middle-class parents scratched their heads with their children over questions like, “What is 17 in Base 5?” and “Which sets does the Null Set belong to?” In very short order (I recall a period of about two months), the whole thing was tossed in the trash as quickly as it had been concocted by bored educrats with too little to do.
This was a pity actually. What nobody seemed to realize at the time was that, granted, we were learning New Math—except that Old Math had never been taught at the grade-school level either. We kept wondering of what possible use it was to know what the intersection of the set of squirrels and the set of mammals was. The truth, of course, was that it was no use at all. Mathematics in America has always been taught as applied mathematics—arithmetic—heavy on the word problems. If it won't help you balance your checkbook or proportion a recipe, it ain't real math, man. Little or nothing of the logic of mathematics has ever made it into the elementary classroom, in part because elementary school in America has historically been a sort of trade school for everyday life. Getting the little beasts fundamentally literate is difficult enough. Trying to get them to appreciate the beauty of alternate number systems simply went over the line for practical middle-class America.
Nerdball that I was, I actually enjoyed fussing with math in the New-Age style back in 1966, but I gladly laid it aside when the whole thing blew over. I didn't have to pick it up again until 1976, when, after working like a maniac with a wire-wrap gun for several weeks, I fed power to my COSMAC ELF microcomputer and was greeted by an LED display of a pair of numbers in base 16!
Mon dieu, New Math redux.
This chapter exists because at the assembly language level, your computer does not understand numbers in our familiar base 10. Computers, in a slightly schizoid fashion, work in base 2 and base 16—all at the same time. If you're willing to confine yourself to higher-level languages such as Basic or Pascal, you can ignore these alien bases altogether, or perhaps treat them as an advanced topic once you get the rest of the language down pat. Not here. Everything in assembly language depends on your thorough understanding of these two number bases. So before we do anything else, we're going to learn how to count all over again—in Martian.
There is intelligent life on Mars.
That is, the Martians are intelligent enough to know from watching our TV programs these past 90 years or so that a thriving tourist industry would not be to their advantage. So they've remained in hiding, emerging only briefly to carve big rocks into the shape of Elvis's face to help the National Enquirer ensure that no one will ever take Mars seriously again. The Martians do occasionally communicate with science fiction writers like me, knowing full well that nobody has ever taken us seriously. That's the reason for the information in this section, which involves the way Martians count.
Martians have three fingers on one hand, and only one finger on the other. Male Martians have their three fingers on the left hand, while females have their three fingers on the right hand. This makes waltzing and certain other things easier.
Like human beings and any other intelligent race, Martians started counting by using their fingers. Just as we used our 10 fingers to set things off in groups and powers of 10, the Martians used their four fingers to set things off in groups and powers of four. Over time, our civilization standardized on a set of 10 digits to serve our number system. The Martians, similarly, standardized on a set of four digits for their number system. The four digits follow, along with the names of the digits as the Martians pronounce them: Θ (xip), ⌠ (foo), ∩ (bar), ≡ (bas).
Like our zero, xip is a placeholder representing no items, and while Martians sometimes count from xip, they usually start with foo, representing a single item. So they start counting: foo, bar, bas ….
Now what? What comes after bas? Table 2.1 demonstrates how the Martians count to what we here on Earth would call 25.
Table 2.1: Counting in Martian, Base Fooby
MARTIAN NUMERALS
MARTIAN PRONUNCIATION
EARTH EQUIVALENT
Θ
Xip
0
⌠
Foo
1
∩
Bar
2
≡
Bas
3
⌠ Θ
Fooby
4
⌠ ⌠
Fooby-foo
5
⌠ ∩
Fooby-bar
6
⌠ ≡
Fooby-bas
7
∩ Θ
Barby
8
∩ ⌠
Barby-foo
9
∩∩
Barby-bar
10
∩≡
Barby-bas
11
≡ Θ
Basby
12
≡ ⌠
Basby-foo
13
≡ ∩
Basby-bar
14
≡ ≡
Basby-bas
15
⌠ ΘΘ
Foobity
16
⌠ Θ ⌠
Foobity-foo
17
⌠ Θ ∩
Foobity-bar
18
⌠ Θ ≡
Foobity-bas
19
⌠ ⌠ Θ
Foobity-fooby
20
⌠ ⌠ ⌠
Foobity-fooby-foo
21
⌠ ⌠ ∩
Foobity-fooby-bar
22
⌠ ⌠ ≡
Foobity-fooby-bas
23
⌠ ∩ Θ
Foobity-barby
24
⌠ ∩ ⌠
Foobity-barby-foo
25
With only four digits (including the one representing zero) the Martians can count only to bas without running out of digits. The number after bas has a new name, fooby. Fooby is the base of the Martian number system and probably the most important number on Mars. Fooby is the number of fingers a Martian has. We would call it four.
The most significant thing about fooby is the way the Martians write it out in numerals: ⌠ Θ. Instead of a single column, fooby is expressed in two columns. Just as with our decimal system, each column has a value that is a power of fooby. This only means that as you move from the rightmost column toward the left, each column represents a value fooby times the column to its right.