x86 Software Reverse-Engineering, Cracking, and Counter-Measures - Stephanie Domas - E-Book

x86 Software Reverse-Engineering, Cracking, and Counter-Measures E-Book

Stephanie Domas

0,0
46,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A crystal-clear and practical blueprint to software disassembly x86 Software Reverse-Engineering, Cracking, and Counter-Measures is centered around the world of disassembling software. It will start with the basics of the x86 assembly language, and progress to how that knowledge empowers you to reverse-engineer and circumvent software protections. No knowledge of assembly, reverse engineering, or software cracking is required. The book begins with a bootcamp on x86, learning how to read, write, and build in the assembly that powers a massive amount of the world's computers. Then the book will shift to reverse engineering applications using a handful of industry favorites such as IDA, Ghidra, Olly, and more. Next, we move to cracking with techniques such as patching and key generation, all harnessing the power of assembly and reverse engineering. Lastly, we'll examine cracking from a defensive perspective. Providing learners with techniques to be a better defender of their own software, or knowledge to crack these techniques more effectively. * Assembly: computer Architecture, x86, system calls, building and linking, ASCII, condition codes, GDB, control flow, stack, calling conventions * Reverse Engineering: reconnaissance, strings, RE strategy, stripping, linking, optimizations, compilers, industry tools * Cracking: patching, key checkers, key generators, resource hacking, dependency walking * Defense: anti-debugging, anti-tamper, packing, cryptors/decryptors, whitelist, blacklist, RASP, code signing, obfuscation A practical and hands-on resource for security professionals to hobbyists, this book is for anyone who wants to learn to take apart, understand, and modify black-box software. x86 Software Reverse-Engineering, Cracking, and Counter-Measures is a vital resource for security researchers, reverse engineers and defenders who analyze, research, crack or defend software applications.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 372

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Introduction

Who Should Read This Book

What to Expect from This Book

History

Legal

Chapter 1: Decompilation and Architecture

Decompilation

Lab 1: Decompiling

Architecture

Summary

Chapter 2: x86 Assembly: Data, Modes, Registers, and Memory Access

Introduction to x86

Assembly Syntax

Data Representation

Registers

Memory Access

Addressing Modes

Summary

Chapter 3: x86 Assembly: Instructions

x86 Instruction Format

x86 Instructions

Putting It All Together

Common x86 Instruction Mistakes

Summary

Chapter 4: Building and Running Assembly Programs

Output

System Calls

Building and Linking

objdump

Lab: Hello World

ASCII

Summary

Chapter 5: Understanding Condition Codes

Condition Codes

Summary

Chapter 6: Analyzing and Debugging Assembly Code

Binary Analysis

Breakpoints

gdb

Segmentation Faults

Lab: Shark Sim 3000

Tuning Out the Noise

Summary

Chapter 7: Functions and Control Flow

Control Flow

Logic Constructs in x86

Stack

Function Calls and Stack Frames

Summary

Chapter 8: Compilers and Optimizers

Finding Starting Code

Compilers

Summary

Chapter 9: Reverse Engineering: Tools and Strategies

Lab: RE Bingo

Basic REconnaissance

Reverse Engineering Strategy

Summary

Chapter 10: Cracking: Tools and Strategies

Key Checkers

Key Generators

Lab: Introductory Keygen

Procmon

Resource Hacker

Patching

Other Debuggers

Debugging with Immunity

Lab: Cracking with Immunity

Summary

Chapter 11: Patching and Advanced Tooling

Patching in 010 Editor

CodeFusion Patching

Cheat Engine

Lab: Cracking LaFarge

IDA Introduction

IDA Patching

Lab: IDA Logic Flows

Ghidra

Lab: Cracking with IDA

Summary

Chapter 12: Defense

Obfuscation

Lab: Obfuscation

Anti-Debugging

Lab: Anti-Debugging

Summary

Chapter 13: Advanced Defensive Techniques

Tamper-Proofing

Packing

Lab: Detecting and Unpacking

Virtualization

Cryptors/Decryptors

Summary

Chapter 14: Detection and Prevention

CRC

Code Signing

RASP

Allowlisting

Blocklisting

Remote Authentication

Lab: ProcMon

Summary

Chapter 15: Legal

U.S. Laws Affecting Reverse Engineering

Summary

Chapter 16: Advanced Techniques

Timeless Debugging

Summary

Chapter 17: Bonus Topics

Stack Smashing

Connecting C and x86

Summary

Conclusion

Index

Copyright

About the Authors

About the Technical Writer

About the Technical Editor

End User License Agreement

List of Tables

Chapter 4

Table 4.1: x86 Ports

Table 4.2:

sys_write

Table 4.3:

sys_exit

Chapter 7

Table 7.1: x86 conditional jump instructions

Table 7.2: Pushing a variable onto the stack

Table 7.3: Popping a variable from the stack

Table 7.4: Stack trace examples

Table 7.5: Function calls and the stack

Table 7.6: Program stack after calling

a

Table 7.7: Program stack after calling

b

Table 7.8: Program stack after calling

c

Table 7.9: Stack in add function

Table 7.10: Effects of function prologue on stack

Table 7.11: Effects of function epilogue on stack

Table 7.12: Stack locations for common values

Table 7.13: Stack content at points 1, 2, and 3 in the program

Table 7.14: Stack frame of one_up program

Table 7.15: Stack locations for local variables

Table 7.16: Complete function stack frame

Table 7.17: Two types of prologues

Table 7.18: Two types of epilogues

List of Illustrations

Chapter 1

Figure 1.1: JetBrains dotPeek .NET decompiler

Figure 1.2: Obfuscation in JetBrains dotPeek

Figure 1.3: Computer architecture

Figure 1.4: Intel Core 2 architecture

Chapter 2

Figure 2.1: Hexadecimal

Figure 2.2: Base conversions in the Windows calculator

Figure 2.3: Bit and byte significance labels

Figure 2.4: Endianness

Figure 2.5: x86 registers

Figure 2.6: Pieces of the

eax

register

Figure 2.7: Common x64 registers

Figure 2.8: Pieces of the

r8

register

Figure 2.9: Comparing differently sized

mov

instructions

Chapter 3

Figure 3.1:

mov

instructions

Chapter 4

Figure 4.1: Binary wristwatch

Figure 4.2: ASCII table

Figure 4.3: Program output

Figure 4.4: ASCII uppercase and lowercase values

Chapter 5

Figure 5.1: Effects of

add al,bl

with various inputs

Figure 5.2: Effects of

sub al, bl

with various inputs

Figure 5.3:

cmp

truth table

Chapter 6

Figure 6.1: The

gdb

command

Figure 6.2: Disassembly in

gdb

Figure 6.3: Setting a breakpoint in

gdb

Figure 6.4:

gdb info files

command

Figure 6.5:

gdb info register

command

Figure 6.6:

gdb info variable

command

Figure 6.7:

gdb stepi

command

Figure 6.8:

gdb x

command

Figure 6.9: Printing 10 bytes with the

gdb x

command

Chapter 7

Figure 7.1: Example jump table

Figure 7.2: Stack address growth

Figure 7.3: Stack frames for hack and drink functions

Chapter 8

Figure 8.1: Application without debugging symbols in

gdb

Figure 8.2:

.text

disassembly in

gdb

Figure 8.3: Main function disassembly in

gdb

Figure 8.4: Unoptimized code in a disassembler

Figure 8.5: Speed and space-optimized code in a disassembler

Figure 8.6: Space-optimized code in a disassembler

Figure 8.7: Application debugging symbols

Figure 8.8: Linked libraries in “hello world” program

Chapter 9

Figure 9.1:

objump

options

Figure 9.2: Sample

objdump

output

Figure 9.3:

strace output

for

echo hello!

Figure 9.4: Kitten cursor applications

Figure 9.5: Examining registry modifications in Dependency Walker

Chapter 10

Figure 10.1: Halting Process Monitor

Figure 10.2: Filtering events in Procmon

Figure 10.3: Defining a filter in Procmon

Figure 10.4: Filtering on Registry events in Procmon

Figure 10.5: Including and excluding event categories in Procmon

Figure 10.6: Notepad font change registry event

Figure 10.7: Event properties in Procmon

Figure 10.8: Stack view in Procmon's Properties window

Figure 10.9: Stack trace for

notepad.exe

Figure 10.10: File operations in Procmon

Figure 10.11: Security Registry queries in Procmon

Figure 10.12: Sample application in Resource Hacker

Figure 10.13: Password window

Figure 10.14: String search in Resource Hacker

Figure 10.15: Identifying a dialog box in Resource Hacker

Figure 10.16: Microsoft Calculator

Figure 10.17: Searching for

Calculator

in ResHack

Figure 10.18: Calculator window in Resource Hacker

Figure 10.19: Compiling the modified application

Figure 10.20: Modified window in Resource Hacker

Figure 10.21: Saving the modified application in ResHack

Figure 10.22: Immunity debugger window

Figure 10.23: Assembly code in Immunity debugger

Figure 10.24: Executable modules in the Immunity debugger

Figure 10.25: Strings in Immunity debugger

Figure 10.26: String references in Immunity debugger

Figure 10.27: Launching an executable in Immunity debugger

Figure 10.28: Single-stepping in Immunity debugger

Figure 10.29: Stepping over instructions in Immunity debugger

Figure 10.30: Exceptions in Immunity debugger

Figure 10.31:

nop

ing out code in Immunity debugger

Figure 10.32:

nop

ed code in Immunity debugger

Figure 10.33: Reverting modified code in Immunity debugger

Figure 10.34: Saving a modified file in Immunity debugger

Chapter 11

Figure 11.1: Viewing a file in 010 Editor

Figure 11.2: Inspector pane in 010 Editor

Figure 11.3: Searching in 010 Editor

Figure 11.4: Jumping to an address in 010 Editor

Figure 11.5: CodeFusion start screen

Figure 11.6: Loading a file in CodeFusion

Figure 11.7: Adding patch information in CodeFusion

Figure 11.8: Launching the patched executable in CodeFusion

Figure 11.9: Opening a process in Cheat Engine

Figure 11.10: Viewing memory in Cheat Engine

Figure 11.11: Memory Viewer pane in Cheat Engine

Figure 11.12: String references in Cheat Engine

Figure 11.13:

nop

ing out instruction...

Figure 11.14: Reverting changes in Cheat Engine

Figure 11.15: Copying bytes in Cheat Engine

Figure 11.16: Loading a file in IDA

Figure 11.17: IDA graph view

Figure 11.18: Opening strings view in IDA

Figure 11.19: Strings view in IDA

Figure 11.20: String cross-references in IDA

Figure 11.21: Strings in IDA code view

Figure 11.22: Basic blocks in IDA

Figure 11.23: Function arguments in IDA

Figure 11.24: Local variables in IDA

Figure 11.25: Local variables and function arguments in IDA

Figure 11.26: IDA comment window

Figure 11.27: Searching for comments in IDA

Figure 11.28: Search results in IDA

Figure 11.29: Code paths in IDA

Figure 11.30: Showing opcode bytes in IDA

Figure 11.31: Password-checking code in IDA

Figure 11.32: IDA Patch Bytes window

Figure 11.33: Password-checking logic in IDA after patching

Chapter 12

Figure 12.1: Control flow flattening in IDA

Figure 12.2: Opaque predicates in IDA

Chapter 13

Figure 13.1: Packed code in IDA

Figure 13.2: Identifying packers with PEiD

Chapter 14

Figure 14.1: Windows warning of unverified program

Chapter 17

Figure 17.1: Function stack frame before

strcpy

Figure 17.2: Function stack after

strcpy

Guide

Cover

Title Page

Copyright

About the Authors

About the Technical Writer

About the Technical Editor

Introduction

Table of Contents

Begin Reading

Conclusion

Index

End User License Agreement

Pages

iii

xxiii

xxiv

xxv

xxvi

xxvii

xxviii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

iv

v

vii

ix

285

x86 Software Reverse-Engineering, Cracking, and Counter-Measures

 

Stephanie Domas

Christopher Domas

 

 

 

 

 

Introduction

Reverse engineering and software cracking are disciplines with a long, rich history. For decades, software developers have attempted to build defenses into their applications to protect intellectual property or to prevent modifications to the program code. The art of cracking has been around nearly as long as reverse engineers have been examining and modifying code for fun or profit.

Before diving into the details of how reverse engineering works, it is useful to understand the context in which these disciplines reside. This chapter describes what to expect from this book and dives into the history and legal considerations of software reverse engineering and cracking.

Who Should Read This Book

From security professionals to hobbyists, this book is for anyone who wants to learn to take apart, understand, and modify black-box software. This book takes a curious security-minded individual behind the curtain to how software cracking and computers work. Learning how an x86 computer works is not only powerful from a reverse-engineering and cracking perspective, but will make each reader a stronger developer, with advanced knowledge they can apply to code optimization, efficiency, debugging, compiler settings and chip selection. Then the curtain continues to pull back as readers learn how software cracking happens. Readers will learn about tools and techniques that real-world software crackers use, and they will set their newfound knowledge to the test by cracking real-world applications of their own in numerous hands-on labs. We then circle back to understand defensive techniques for combating software cracking. By learning both the offensive and defensive techniques, readers will walk away as strong software crackers or software defenders.

What to Expect from This Book

This book is based on these three core tenets of reverse engineering:

There is no such thing as uncrackable software.

The goal in offense is to try to go faster.

The goal in defense is to try to slow down.

Based on this philosophy, any software can be reverse engineered and have its secrets stolen and protections circumvented. It's just a matter of time.

Like other areas of cybersecurity, both offensive and defensive reverse engineers benefit from having a similar set of skills. This book is designed to provide an introduction to these three interrelated skill sets:

Reverse engineering:

Reverse engineering is the process of taking software apart and figuring out how it works.

Cracking:

Cracking builds on reverse engineering by manipulating a program's internals to get it to do something that it was not intended to.

Defense:

While all software is crackable, defenses can make a program more difficult and time-consuming to crack.

Both offensive and defensive reverse engineers benefit from the same set of skills. Without an understanding of reverse engineering and cracking, a defender can't craft effective protections. On the other hand, an attacker can more effectively bypass and overcome these protections if they can understand and manipulate how a program works.

Structure of the Book

This book is organized based on these three core capabilities and skill sets. The structure is as follows:

PART

TOPICS

GOAL

Part 1: Background

History and legal considerations x86 crash course

Understand x86 and learn to move quickly.

Part 2: Software Reverse Engineering

Reconnaissance Key checkers Key generators Process monitoring Resource manipulation Static analysis Dynamic analysis Writing key gens Cracking software

Master the tools, approaches, and mindset required to take software apart and understand its inner workings.

Part 3: Software Cracking

Manual patching Automated patchers Advanced dynamic analysis Execution tracing Advanced static analysis Trial periods Nag screens More key gens More cracks

Master the tools, approaches, and mindset necessary to isolate behavior and modify software.

Part 4: Defenses, Countermeasures, and Advanced Topics

Obfuscation/deobfuscation Anti-debugging/anti-anti-debugging Packing/unpacking Cryptors/decryptors Architectural defenses Legal Timeless debugging Binary instrumentation Intermediate representations Decompiling Automatic structure recovery Visualization Theorem provers Symbolic analysis Cracking extravaganza

Master defenses and counter-defenses. Evaluate defensive posture and tradeoffs. Explore advanced topics. Exercise reverse engineering and cracking tools, techniques, and mindset.

Hands-On Experience and Labs

The best way to learn reverse engineering and software cracking is by doing it. For this reason, this book will include several hands-on labs that demonstrate the concepts described in the text.

The goal of this book isn't to teach a particular set of tools and techniques. While the focus is on x86 software running in Windows, many of the approaches and techniques will translate to other platforms. This book will attempt to demonstrate a wide range of tools, including open-source, freeware, shareware, and commercial solutions. With an understanding of what tools are available and their relative strengths and weaknesses, you can more effectively select the right tool for the job.

Hands-on labs and exercises will also focus on reverse engineering and cracking a variety of different targets, including the following:

Real software:

Some exercises will use real-world software carefully selected to avoid copyright violations.

Manufactured examples:

Software written specifically for this book to illustrate concepts that are impractical to demonstrate with real-world examples.

Crackmes:

Manufactured software developed by crackers to illustrate a concept or challenge others.

Companion Download Files

The book mentions some additional files, such as labs or tools. These items are available for download from https://github.com/DazzleCatDuo/X86-SOFTWARE-REVERSE-ENGINEERING-CRACKING-AND-COUNTER-MEASURES.

History

Before diving into the nitty-gritty details of cracking and reverse engineering, it is useful to understand its history. Software protections and the tricks and techniques used to overcome them have been evolving for decades.

The First Software Protections

The first software copy protections emerged in the 1970s. Some of the early movers in the space were as follows:

Apple II:

The Apple II incorporated proprietary disk drivers that would allow writing at half-tracks, writing extra rings, and staggering and overlapping sectors. The purpose of this was to make the disks unusable by non-Apple machines and software that wouldn't know to read and write at these odd offsets.

Atari 800:

Atari 800 systems would intentionally include bad sectors in their disks and attempt to load these sectors. If these loads didn't return a “bad sector” error, then the software knew it wasn't a valid disk and would halt execution.

Commodore 64:

Legitimate Commodore 64 software was distributed only on read-only disks. The software would attempt to overwrite the disk, and, if it succeeded, it knew the disk was counterfeit.

These protections all depended on unusual behavior by the software, such as the use of invalid memory or attempting to overwrite the program's own code. Defeating these protections required an understanding of how the software worked.

The Rise of Cracking and Reverse Engineering

The rise of cracking and reverse engineering began in the 1980s. However, these early crackers weren't in it for the money. Cracking was a contest to determine who could figure out and bypass software protections the quickest.

Over the next several decades, the reverse engineering and cracking scene evolved. These are some of the key dates in the history of reverse engineering:

1987:

Fairlight's formation in 1987 by Bacchus defines one of the first operational groups. Fairlight will later come to prominence in FBI crackdowns of the early 2000s. For more historic details visit

www.fairlight.to

and

csdb.dk

.

1990:

Elliot J. Chikofsky and James H. Cross II defined reverse engineering as “the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction. (“Reverse Engineering and Design Recovery: A Taxonomy.”

IEEE Software

, Vol. 7, Issue 1, Jan 1990).

1997:

Old Red Cracker (handle +ORC) founds the Internet-based High Cracking University (+HCU) to allow everyone to learn about cracking. +ORC released “how to crack” lessons online and authored academic papers. +HCU students had handles that began with an +.

1997–2009:

The “warez scene” emerges with groups competing to be the first to release copyrighted material. Insiders (aka “suppliers”) provided early access to their groups, “crackers” broke the protections, and “couriers” distributed cracked software to FTP sites. Between 2003 and 2009, approximately 3,164 active groups were on “the scene”, competing primarily for pride and bragging rights, not money.

2004:

The FBI and other countries begin raids against “the scene”. Operation Fastlink (2004) led to the conviction of 60 warez members, and Operation Site Down (2005) took down 25 warez groups.

The arms race between software protections and crackers continues to rage, and reverse engineering is an invaluable skill set on both sides. Crackers need to understand how a program works to manipulate it and bypass defenses. On the defensive side, it's important to understand the latest cracking techniques to develop defenses that protect intellectual property and other sensitive data.

Legal

The best way to learn is by doing. This is why this book includes labs and exercises with real-world software as well as manufactured examples and crackmes. We are not lawyers, and those with concerns should consult a lawyer. We recommend the Electronic Frontier Foundation (www.eff.org). Chapter 15 covers legal topics because we feel it's important for everyone to understand the US-based laws that affect this area. There are two main laws to be aware of: the Copyright Act and the Digital Millennium Copyright Act (DMCA).

The Fair Use Clause of the Copyright Act (Copyright Act, 17 U.S.C. § 107) states that reverse engineering falls under “fair use” when done for “…purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research….” This exception is balanced against “the effect of the use upon the potential market for or value of the copyrighted work.” In essence, reverse engineering used for educational purposes is legal if you don't share or sell the cracked software.

In October 2016, the DMCA also added an exception for good faith security research. It states, “accessing a computer program solely for purposes of good-faith testing, …where such activity is carried out in a controlled environment designed to avoid any harm to individuals or the public, …and is not used or maintained in a manner that facilitates copyright infringement.”

The software examined in this book and used in exercises was carefully selected to fall under the fair use and DMCA exceptions. If you are planning to reverse engineer and crack software for anything other than self-education, you should consult a lawyer. The legal considerations of reverse engineering will also be explored in greater detail in a later chapter.

Software reverse engineering and cracking have a rich history, and this skill set has both offensive and defensive applications. However, it is important to understand the laws around these disciplines and ensure that your activities fall under the good-faith testing and fair use exemptions.

This book is designed to provide a strong foundation in the skills and tools used for software reverse engineering and cracking. Beginning with the fundamentals, the book will move on through sections on software reverse engineering and cracking to end with an exploration of advanced offensive and defensive techniques.

CHAPTER 1Decompilation and Architecture

An effective reverse engineer or cracker is one who understands the systems they are analyzing. Software is designed to run in a particular environment, and if you don't understand how that environment works, you will struggle to understand the software.

This chapter explores the steps necessary to get started reverse engineering an application. Decompilation is crucial to transforming an application from machine code to something that can be read and understood by humans. To actually analyze the resulting code, it is also necessary to understand the architecture of the computers that it is designed to run on.

Decompilation

Most programmers write using a higher-level programming language like C/C++ or Java, which is designed to be human-readable. However, computers are designed to run machine code, which represents instructions in binary.

Compilation is the process of converting a programming language to machine code. This means decompilation would be the process of taking machine code back to the original programming language, recovering the original source code. When available, this is the easiest approach to reverse engineering because source code is designed to be read and interpreted by a human. The majority of this book will focus on the more typical case when decompilation is not possible. But for the purposes of learning, it is important to understand that sometimes you can decompile back to the source code, and when that is an option, you should take it.

When Is Decompilation Useful?

For many programming languages, full decompilation is impossible. These languages build code directly to machine code, and some information, such as variable names, is lost in the process. While some advanced decompilers can build pseudocode for these languages, the process isn't perfect.

However, some programming languages use what's called just-in-time (JIT) compilation. When programs written in JIT languages are “built,” they are converted from the source code into an intermediate language (IL), not machine code. JIT compilers store a copy of the code in this IL until the program is run, at which point the code is converted to machine code. Examples of JIT languages include Java, Dalvik (Android), and .NET.

For example, Java is well-known for being largely platform-agnostic, and the reason for this is its use of an IL (Java bytecode) and the Java Virtual Machine (JVM). By distributing the program code as bytecode and compiling it only at runtime, Java's JVM translates from the Java IL to machine code specific to the machine it's running on. While this approach can negatively impact file size and performance, it pays off in portability.

JIT compilation also makes reverse engineering these applications much easier. These intermediate languages are similar enough to the original source code that they can be decompiled or converted back into usable source code. Source code is designed to be human-readable, making it far easier to understand the application's logic and identify software protections or other embedded secrets.

Decompiling JIT Programming Languages

For JIT languages like .NET, several free decompilers are available. One widely used .NET decompiler is JetBrains dotPeek, which is available from www.jetbrains.com/decompiler. Figure 1.1 shows an example of .NET code decompiled in dotPeek.

As shown in the figure, the .NET code is easily readable after decompilation because the intermediate language encodes a wealth of information as metadata, enabling more accurate reconstruction of the source code. Any sensitive information or trade secrets contained within the code are easily accessible to a reverse engineer.

Figure 1.1: JetBrains dotPeek .NET decompiler

Defending JIT Languages

Unlike true machine code programs, JIT-compiled programs can often be converted to source code. Lowering the bar for reverse engineering the code makes many of the x86 anti-reverse engineering defenses discussed in later chapters unnecessary and overkill.

For decompilable languages, a commonly used defense against reverse engineering is obfuscation. Figure 1.2 shows an example of a .NET application before and after obfuscation.

The top half of the figure contains code before obfuscation occurs, where the function and variable names and strings are easily readable. The information in these variable names makes it easier for a reverse engineer to understand the purpose of each function and how the application works as a whole.

In the bottom half of the image, we see the obfuscated version of the same code. Now, function names, variable names, and strings are all mangled, making it much harder to understand the purpose of the function shown, let alone the application as a whole.

Another important security best practice is to avoid writing security or privacy-critical code in JIT languages where reverse engineering is easy. Instead, write this code in an assembled language, such as C/C++, where reverse engineering is significantly more difficult. This code can be included in DLLs that are linked to the executable containing the nonsensitive code written in a JIT language.

Figure 1.2: Obfuscation in JetBrains dotPeek

Lab 1: Decompiling

This is the first hands-on lab for this book. Labs and all associated instructions can be found in their corresponding folder here:

https://github.com/DazzleCatDuo/X86-SOFTWARE-REVERSE-ENGINEERING-CRACKING-AND-COUNTER-MEASURES

For this lab, please locate Lab Decompiling and follow the provided instructions.

Skills to Practice

Every lab in this book is designed to teach and provide hands-on experience with certain skills. This lab's skills to practice include the following:

Decompiling

Performing introductory reverse engineering

To learn these skills, you'll be using JetBrains dotPeek to reverse engineer and modify a .NET application.

Takeaways

Decompiling is a powerful and easy approach to understanding and modifying a program. However, it doesn't work on every program. While programs written in languages such as C/C++ can be decompiled using tools such as IDA's Hex-Rays Decompiler or Ghidra, the result is often low-quality and difficult to use.

When developing applications that contain sensitive information or that you don't want modified, it's better to use a language that isn't easily decompiled. For example, C/C++ is a better choice for sensitive functionality than a .NET language such as C#.

Architecture

Decompilation is the easy approach to reverse engineering because it gets you back to higher-level languages and logic structures. However, this easy path is not often available. For languages that build to machine code, we need to go deeper and understand how computer architectures and machine and assembly code work.

Computer Architecture

It's generally thought that the average programmer doesn't need an in-depth understanding of how computers work. When writing a program in a procedural language, the operating system handles all of the low-level operations. A program is displayed as a process that has access to the processor, memory, and file system whenever it needs them. Processes appear to have their own contiguous memory spaces, and files are just a sequence of bytes to read and write.

However, none of this is actually true, and your operating system has been abstracting the truth from you (to make it easier to program). A solid understanding of how computer architecture actually works is essential for a reverse engineer. Figure 1.3 shows the main components that make up a computer, including the central processing unit, bridge, memory, and peripherals.

Figure 1.3: Computer architecture

The Central Processing Unit

The central processing unit (CPU) is where processing occurs on a computer. Inside the CPU are the following components:

Arithmetic logic unit (ALU):

The ALU performs mathematical operations within the computer, such as addition and multiplication.

Registers:

Registers perform temporary data storage and are used as the primary inputs and outputs of x86 instructions. Registers provide extremely fast access to a single word of data and are typically accessed by name.

Control units:

Control units execute code. This includes reading instructions and orchestrating the operations of other elements within a computer.

Bridges and Peripherals

The CPU is connected via a bus to a bridge. The purpose of the bridge is to connect the CPU to other components of the system, including memory and the I/O bus, which is where peripherals such as the keyboard, mouse, and speakers are connected to the system. While information flows over a bus, the bridge is responsible for controlling this traffic and ensuring that traffic flowing in over one bus is routed out over the appropriate bus.

Peripherals, connected via the I/O bus, allow the computer to communicate with the outside world. This includes sending and receiving data from the graphics card, keyboard, mouse, speakers, and other systems.

Memory and Registers

As its name suggests, memory is where data is stored on the computer. Data is stored as a linear series of bytes that are accessed via their address. This design allows moderately fast access to data stored on the system.

When a program wants to access data in memory, the CPU sends a request via a bus to the bridge, which forwards it to the memory, where the data at the indicated address is accessed. The requested data then needs to retrace that route and return to the CPU before it can be used by the program. In contrast, a register is physically located within the CPU, making it far more accessible.

Registers are storage that lives inside of the CPU and, unlike memory, are not a linear series of bytes. Registers are specifically named and have set sizes associated with each.

Registers and memory both serve the same purpose: they store data. However, they have different specializations (quality versus quantity). Registers are few in number and expensive, but they provide extremely fast access to data. Memory is cheap and plentiful but offers slower access speeds.

The bulk of the data associated with a program, the code itself and its data, will be stored in memory. While the program is running, small chunks of data will be copied to the registers for processing.

Assembly

Computers run on binary, digital logic. Everything is either on (1) or off (0). This includes programs running on a computer. All high-level languages are eventually converted into a series of bits called machine code. This machine code defines the set of instructions that the computer executes to perform a desired function.

Introduction to Machine Code

Every programmer begins learning a language with a “hello world” program. In x86, the machine code for “hello world” is as follows:

55 89 e5 83 e4 f0 83 ec 10 b8 b0 84 04 08 89 04 24 e8 1a ff ff ff b8 00 00 00 00 c9 c3 90

This machine code is written in hexadecimal for readability and compactness, but its true value is a binary string of 1s and 0s. This binary string contains instructions to flip transistors to calculate information, fetch data from memory, send signals over the system buses, interact with the graphics card, and, finally, print out the “hello world” text. If this string of characters seems a bit short to accomplish all this, it's because these instructions trigger the operating system (in this example Linux) to help out.

Machine code controls the processor at the most detailed possible level. Some of the functions that machine code performs include the following:

Moving data in and out of memory

Moving data to and out of registers

Controlling the system bus

Controlling the ALU, control unit, and other components

This low-level control means that applications written in machine code can be incredibly powerful and efficient. However, while memorizing and inputting various series of bits to perform certain tasks is pretty awesome, it is inefficient and prone to error.

From Machine Code to Assembly

In machine code, a series of bits represents a particular action. For example, 0x81 or 10000001 is an instruction that adds two values together and stores the result at a particular location.

Assembly code is designed to be a human-readable version of machine code. Instead of memorizing a binary or hexadecimal string like 0x81 or 10000001, a programmer can use add. The add mnemonic is mapped to 0x81, so this shorthand makes programming easier without losing any of the benefits of writing in machine code.

Translating machine code to assembly code makes it much easier to understand. For example, the previous “hello world” example code can be converted into a series of comprehensible instructions.

MACHINE CODE

ASSEMBLY

55

push ebp

89 e5

mov ebp,esp

83 e4 f0

and esp, 0xfffffff0

83 ec 10

sub esp, 0x10

b8 b0 84 04 08

mov eax

89 04 24

mov [esp], eax

e8 1a ff ff ff

call 80482f4

b8 00 00 00 00

mov eax, 0x0

c9

leave

c3

ret

90

nop

If you understand machine code, writing directly in it can be fun, and there are cases where it may make sense. However, the majority of the time, it is inefficient and impractical. Writing in assembly provides the same benefits as writing in machine code but is much more practical.

After code has been written in assembly, it can be translated to machine code by an assembler in a process called assembling. A program already in machine code can be disassembled into assembly code by a disassembler.

DEFINITION

Assemblers convert assembly code to machine code. Disassemblers convert machine code to assembly.

Many programmers don't write in machine code or assembly. Instead, they use higher-level languages that abstract away more of the details. For example, the following pseudocode is similar to many high-level procedural languages.

int x=1, y=2, z=x+y;

During the compiling process, these higher-level languages are converted into assembly code similar to the following:

mov [ebp-4], 0x1

mov [ebp-8], 0x2

mov eax, [ebp-8]

mov edx, [ebp-4]

lea eax, [edx+1*eax]

mov [ebp-0xc], eax

An assembler can then be used to convert the assembly code into the following machine code that a computer can use:

c7 45 fc 01 00 00 00 c7 45 f8 02 00 00 00 8b 45 f8 8b 55 fc 8d 04 02 89 45 f4

Instruction Set Architectures and Microarchitectures

The word computer covers a wide range of systems. A smartwatch and a desktop computer both work in similar ways. However, their internal components can differ significantly.

An instruction set architecture (ISA) describes the ecosystems where programs run. Some of the factors that an ISA defines include the following:

Registers:

The ISA specifies whether a processor has a single register or hundreds. It also defines the size of these registers, whether they contain 8 bits or 128 bits.

Addresses and data formats:

The ISA specifies the format for addresses used to access data in memory. It also defines how many bytes the system can grab from memory at a time.

Machine instructions:

Different ISAs may support different sets of instructions. The ISA defines whether addition, subtraction, equality, halt, and other instructions are supported.

By defining the capabilities of the physical system, the ISA also indirectly defines the assembly language. The ISA specifies which low-level instructions are available and what those instructions do.

A microarchitecture describes how a particular ISA is implemented on a processor. Figure 1.4 shows an example of the Intel Core 2 architecture.

Together, an ISA and microarchitecture define the computer architecture. The existence of thousands of ISA and thousands of microarchitectures means that there are thousands of computer architectures as well.

Figure 1.4: Intel Core 2 architecture

DEFINITION

An instruction set architecture defines how registers, addresses, data formats, and machine instructions work. Microarchitectures implement ISAs on a processor. Together, an ISA and microarchitecture define a computer architecture.

RISC vs. CISC Computer Architectures

While thousands of computer architectures exist, they can be broadly divided into two main categories. Reduced instruction set computing (RISC) architectures define a small number of simpler instructions. In general, RISC architectures are cheaper and easier to create, and the hardware is physically smaller and consumes less power.

In contrast, a complex instruction set computing (CISC) architecture defines a larger number of more powerful instructions. CISC processors are more expensive and difficult to create and are typically larger and consume more power.

While CISC architectures may seem objectively worse than RISC ones, their main benefit lies in the ease and efficiency of programming. For example, consider a hypothetical example where a program wants to multiply a value by 5 in a RISC versus CISC system.

CISC

RISC

mul [100], 5

load r0, 100

mov r1, r0

add r1, r0

add r1, r0

add r1, r0

add r1, r0

mov [100], r1

In this example, a CISC processor can perform the calculation in a single instruction if it has a multiplication operation that can load a value from memory, multiply it, and store the result at the same memory location. However, a RISC processor may lack a multiplication operator because it is a complex operation. Instead, the RISC loads the value from memory, adds it to itself four times, and stores the result in the same memory location across seven steps.

RISC and CISC architectures both have their advantages, disadvantages, and use cases. For example, a RISC operator may take 100 instructions to perform the same operation that a CISC operator can perform in one. However, that single CISC operation may take 100× as long to run or 100× the power.

Both RISC and CISC instruction sets are in common use today. Some examples of widely used RISC architectures include the following:

ARM (used by phones, tablets)

MIPS (used by embedded systems and networking equipment)

PowerPC (used by original Macs and Xbox360)

In this book, we focus on the x86 assembly language, which is a CISC architecture. This architecture is in use on all modern PCs and servers and is supported by all the main operating systems (Windows, Mac, Linux) and even some gaming systems, such as the Xbox One. Making it one of the most powerful to learn for software cracking.

Summary

The machine code that actually runs on computers isn't designed for humans to read and understand. To be usable, it needs to be converted into a different form.

One option for this is decompilation, which produces a result that is similar or identical to the original source code. However, decompilation is not always possible.

For fully compiled languages, such as C/C++, and many other languages, it is necessary to disassemble a compiled executable and analyze it in assembly. However, this requires a much deeper understanding of the computer's architecture and how it actually works than writing and reading code in a higher-level language. Now that we know the role decompilation can play and the need for disassembly, in the next few chapters we'll look at how computers work, so we can learn to disassemble like a pro.

CHAPTER 2x86 Assembly: Data, Modes, Registers, and Memory Access

Most software reverse engineering requires disassembling a compiled executable and analyzing the result. This disassembly results in assembly code, not a higher-level language.

While a few assembly languages exist, x86 is one of the most widely used. This chapter introduces some of the key concepts of x86 assembly, providing a foundation for later chapters.

Introduction to x86

Thousands of computer architectures exist. While they all work similarly, a computer is a computer—but there are minor or major differences between each.

To study reverse engineering, we need to select an architecture to focus on. In this book, we'll be using x86, which was selected for a few different reasons:

Ubiquity:

x86 is the most widely used assembly language, making it widely applicable for reverse engineering.

Computer support:

x86 applications can be built, run, and reverse engineered on any desktop, laptop, or server.

Market share:

x86 is the core of the major operating systems (Windows, Linux, and macOS), so it is used in billions of systems.

The x86 architecture has been around for decades and has evolved significantly over the years. It was first introduced in 1974 by Intel, and some of the main milestones in the history of x86 include the following:

Intel 8080:

8-bit microprocessor, introduced in 1974

Intel 8086:

16-bit microprocessor, introduced in 1978

Intel 80386:

32-bit microprocessor, introduced in 1985

Intel Prescott, AMD Opteron, and Athlon 64:

64-bit microprocessor, introduced in 2003/2004