46,99 €
A crystal-clear and practical blueprint to software disassembly x86 Software Reverse-Engineering, Cracking, and Counter-Measures is centered around the world of disassembling software. It will start with the basics of the x86 assembly language, and progress to how that knowledge empowers you to reverse-engineer and circumvent software protections. No knowledge of assembly, reverse engineering, or software cracking is required. The book begins with a bootcamp on x86, learning how to read, write, and build in the assembly that powers a massive amount of the world's computers. Then the book will shift to reverse engineering applications using a handful of industry favorites such as IDA, Ghidra, Olly, and more. Next, we move to cracking with techniques such as patching and key generation, all harnessing the power of assembly and reverse engineering. Lastly, we'll examine cracking from a defensive perspective. Providing learners with techniques to be a better defender of their own software, or knowledge to crack these techniques more effectively. * Assembly: computer Architecture, x86, system calls, building and linking, ASCII, condition codes, GDB, control flow, stack, calling conventions * Reverse Engineering: reconnaissance, strings, RE strategy, stripping, linking, optimizations, compilers, industry tools * Cracking: patching, key checkers, key generators, resource hacking, dependency walking * Defense: anti-debugging, anti-tamper, packing, cryptors/decryptors, whitelist, blacklist, RASP, code signing, obfuscation A practical and hands-on resource for security professionals to hobbyists, this book is for anyone who wants to learn to take apart, understand, and modify black-box software. x86 Software Reverse-Engineering, Cracking, and Counter-Measures is a vital resource for security researchers, reverse engineers and defenders who analyze, research, crack or defend software applications.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 372
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Introduction
Who Should Read This Book
What to Expect from This Book
History
Legal
Chapter 1: Decompilation and Architecture
Decompilation
Lab 1: Decompiling
Architecture
Summary
Chapter 2: x86 Assembly: Data, Modes, Registers, and Memory Access
Introduction to x86
Assembly Syntax
Data Representation
Registers
Memory Access
Addressing Modes
Summary
Chapter 3: x86 Assembly: Instructions
x86 Instruction Format
x86 Instructions
Putting It All Together
Common x86 Instruction Mistakes
Summary
Chapter 4: Building and Running Assembly Programs
Output
System Calls
Building and Linking
objdump
Lab: Hello World
ASCII
Summary
Chapter 5: Understanding Condition Codes
Condition Codes
Summary
Chapter 6: Analyzing and Debugging Assembly Code
Binary Analysis
Breakpoints
gdb
Segmentation Faults
Lab: Shark Sim 3000
Tuning Out the Noise
Summary
Chapter 7: Functions and Control Flow
Control Flow
Logic Constructs in x86
Stack
Function Calls and Stack Frames
Summary
Chapter 8: Compilers and Optimizers
Finding Starting Code
Compilers
Summary
Chapter 9: Reverse Engineering: Tools and Strategies
Lab: RE Bingo
Basic REconnaissance
Reverse Engineering Strategy
Summary
Chapter 10: Cracking: Tools and Strategies
Key Checkers
Key Generators
Lab: Introductory Keygen
Procmon
Resource Hacker
Patching
Other Debuggers
Debugging with Immunity
Lab: Cracking with Immunity
Summary
Chapter 11: Patching and Advanced Tooling
Patching in 010 Editor
CodeFusion Patching
Cheat Engine
Lab: Cracking LaFarge
IDA Introduction
IDA Patching
Lab: IDA Logic Flows
Ghidra
Lab: Cracking with IDA
Summary
Chapter 12: Defense
Obfuscation
Lab: Obfuscation
Anti-Debugging
Lab: Anti-Debugging
Summary
Chapter 13: Advanced Defensive Techniques
Tamper-Proofing
Packing
Lab: Detecting and Unpacking
Virtualization
Cryptors/Decryptors
Summary
Chapter 14: Detection and Prevention
CRC
Code Signing
RASP
Allowlisting
Blocklisting
Remote Authentication
Lab: ProcMon
Summary
Chapter 15: Legal
U.S. Laws Affecting Reverse Engineering
Summary
Chapter 16: Advanced Techniques
Timeless Debugging
Summary
Chapter 17: Bonus Topics
Stack Smashing
Connecting C and x86
Summary
Conclusion
Index
Copyright
About the Authors
About the Technical Writer
About the Technical Editor
End User License Agreement
Chapter 4
Table 4.1: x86 Ports
Table 4.2:
sys_write
Table 4.3:
sys_exit
Chapter 7
Table 7.1: x86 conditional jump instructions
Table 7.2: Pushing a variable onto the stack
Table 7.3: Popping a variable from the stack
Table 7.4: Stack trace examples
Table 7.5: Function calls and the stack
Table 7.6: Program stack after calling
a
Table 7.7: Program stack after calling
b
Table 7.8: Program stack after calling
c
Table 7.9: Stack in add function
Table 7.10: Effects of function prologue on stack
Table 7.11: Effects of function epilogue on stack
Table 7.12: Stack locations for common values
Table 7.13: Stack content at points 1, 2, and 3 in the program
Table 7.14: Stack frame of one_up program
Table 7.15: Stack locations for local variables
Table 7.16: Complete function stack frame
Table 7.17: Two types of prologues
Table 7.18: Two types of epilogues
Chapter 1
Figure 1.1: JetBrains dotPeek .NET decompiler
Figure 1.2: Obfuscation in JetBrains dotPeek
Figure 1.3: Computer architecture
Figure 1.4: Intel Core 2 architecture
Chapter 2
Figure 2.1: Hexadecimal
Figure 2.2: Base conversions in the Windows calculator
Figure 2.3: Bit and byte significance labels
Figure 2.4: Endianness
Figure 2.5: x86 registers
Figure 2.6: Pieces of the
eax
register
Figure 2.7: Common x64 registers
Figure 2.8: Pieces of the
r8
register
Figure 2.9: Comparing differently sized
mov
instructions
Chapter 3
Figure 3.1:
mov
instructions
Chapter 4
Figure 4.1: Binary wristwatch
Figure 4.2: ASCII table
Figure 4.3: Program output
Figure 4.4: ASCII uppercase and lowercase values
Chapter 5
Figure 5.1: Effects of
add al,bl
with various inputs
Figure 5.2: Effects of
sub al, bl
with various inputs
Figure 5.3:
cmp
truth table
Chapter 6
Figure 6.1: The
gdb
command
Figure 6.2: Disassembly in
gdb
Figure 6.3: Setting a breakpoint in
gdb
Figure 6.4:
gdb info files
command
Figure 6.5:
gdb info register
command
Figure 6.6:
gdb info variable
command
Figure 6.7:
gdb stepi
command
Figure 6.8:
gdb x
command
Figure 6.9: Printing 10 bytes with the
gdb x
command
Chapter 7
Figure 7.1: Example jump table
Figure 7.2: Stack address growth
Figure 7.3: Stack frames for hack and drink functions
Chapter 8
Figure 8.1: Application without debugging symbols in
gdb
Figure 8.2:
.text
disassembly in
gdb
Figure 8.3: Main function disassembly in
gdb
Figure 8.4: Unoptimized code in a disassembler
Figure 8.5: Speed and space-optimized code in a disassembler
Figure 8.6: Space-optimized code in a disassembler
Figure 8.7: Application debugging symbols
Figure 8.8: Linked libraries in “hello world” program
Chapter 9
Figure 9.1:
objump
options
Figure 9.2: Sample
objdump
output
Figure 9.3:
strace output
for
echo hello!
Figure 9.4: Kitten cursor applications
Figure 9.5: Examining registry modifications in Dependency Walker
Chapter 10
Figure 10.1: Halting Process Monitor
Figure 10.2: Filtering events in Procmon
Figure 10.3: Defining a filter in Procmon
Figure 10.4: Filtering on Registry events in Procmon
Figure 10.5: Including and excluding event categories in Procmon
Figure 10.6: Notepad font change registry event
Figure 10.7: Event properties in Procmon
Figure 10.8: Stack view in Procmon's Properties window
Figure 10.9: Stack trace for
notepad.exe
Figure 10.10: File operations in Procmon
Figure 10.11: Security Registry queries in Procmon
Figure 10.12: Sample application in Resource Hacker
Figure 10.13: Password window
Figure 10.14: String search in Resource Hacker
Figure 10.15: Identifying a dialog box in Resource Hacker
Figure 10.16: Microsoft Calculator
Figure 10.17: Searching for
Calculator
in ResHack
Figure 10.18: Calculator window in Resource Hacker
Figure 10.19: Compiling the modified application
Figure 10.20: Modified window in Resource Hacker
Figure 10.21: Saving the modified application in ResHack
Figure 10.22: Immunity debugger window
Figure 10.23: Assembly code in Immunity debugger
Figure 10.24: Executable modules in the Immunity debugger
Figure 10.25: Strings in Immunity debugger
Figure 10.26: String references in Immunity debugger
Figure 10.27: Launching an executable in Immunity debugger
Figure 10.28: Single-stepping in Immunity debugger
Figure 10.29: Stepping over instructions in Immunity debugger
Figure 10.30: Exceptions in Immunity debugger
Figure 10.31:
nop
ing out code in Immunity debugger
Figure 10.32:
nop
ed code in Immunity debugger
Figure 10.33: Reverting modified code in Immunity debugger
Figure 10.34: Saving a modified file in Immunity debugger
Chapter 11
Figure 11.1: Viewing a file in 010 Editor
Figure 11.2: Inspector pane in 010 Editor
Figure 11.3: Searching in 010 Editor
Figure 11.4: Jumping to an address in 010 Editor
Figure 11.5: CodeFusion start screen
Figure 11.6: Loading a file in CodeFusion
Figure 11.7: Adding patch information in CodeFusion
Figure 11.8: Launching the patched executable in CodeFusion
Figure 11.9: Opening a process in Cheat Engine
Figure 11.10: Viewing memory in Cheat Engine
Figure 11.11: Memory Viewer pane in Cheat Engine
Figure 11.12: String references in Cheat Engine
Figure 11.13:
nop
ing out instruction...
Figure 11.14: Reverting changes in Cheat Engine
Figure 11.15: Copying bytes in Cheat Engine
Figure 11.16: Loading a file in IDA
Figure 11.17: IDA graph view
Figure 11.18: Opening strings view in IDA
Figure 11.19: Strings view in IDA
Figure 11.20: String cross-references in IDA
Figure 11.21: Strings in IDA code view
Figure 11.22: Basic blocks in IDA
Figure 11.23: Function arguments in IDA
Figure 11.24: Local variables in IDA
Figure 11.25: Local variables and function arguments in IDA
Figure 11.26: IDA comment window
Figure 11.27: Searching for comments in IDA
Figure 11.28: Search results in IDA
Figure 11.29: Code paths in IDA
Figure 11.30: Showing opcode bytes in IDA
Figure 11.31: Password-checking code in IDA
Figure 11.32: IDA Patch Bytes window
Figure 11.33: Password-checking logic in IDA after patching
Chapter 12
Figure 12.1: Control flow flattening in IDA
Figure 12.2: Opaque predicates in IDA
Chapter 13
Figure 13.1: Packed code in IDA
Figure 13.2: Identifying packers with PEiD
Chapter 14
Figure 14.1: Windows warning of unverified program
Chapter 17
Figure 17.1: Function stack frame before
strcpy
Figure 17.2: Function stack after
strcpy
Cover
Title Page
Copyright
About the Authors
About the Technical Writer
About the Technical Editor
Introduction
Table of Contents
Begin Reading
Conclusion
Index
End User License Agreement
iii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
iv
v
vii
ix
285
Stephanie Domas
Christopher Domas
Reverse engineering and software cracking are disciplines with a long, rich history. For decades, software developers have attempted to build defenses into their applications to protect intellectual property or to prevent modifications to the program code. The art of cracking has been around nearly as long as reverse engineers have been examining and modifying code for fun or profit.
Before diving into the details of how reverse engineering works, it is useful to understand the context in which these disciplines reside. This chapter describes what to expect from this book and dives into the history and legal considerations of software reverse engineering and cracking.
From security professionals to hobbyists, this book is for anyone who wants to learn to take apart, understand, and modify black-box software. This book takes a curious security-minded individual behind the curtain to how software cracking and computers work. Learning how an x86 computer works is not only powerful from a reverse-engineering and cracking perspective, but will make each reader a stronger developer, with advanced knowledge they can apply to code optimization, efficiency, debugging, compiler settings and chip selection. Then the curtain continues to pull back as readers learn how software cracking happens. Readers will learn about tools and techniques that real-world software crackers use, and they will set their newfound knowledge to the test by cracking real-world applications of their own in numerous hands-on labs. We then circle back to understand defensive techniques for combating software cracking. By learning both the offensive and defensive techniques, readers will walk away as strong software crackers or software defenders.
This book is based on these three core tenets of reverse engineering:
There is no such thing as uncrackable software.
The goal in offense is to try to go faster.
The goal in defense is to try to slow down.
Based on this philosophy, any software can be reverse engineered and have its secrets stolen and protections circumvented. It's just a matter of time.
Like other areas of cybersecurity, both offensive and defensive reverse engineers benefit from having a similar set of skills. This book is designed to provide an introduction to these three interrelated skill sets:
Reverse engineering:
Reverse engineering is the process of taking software apart and figuring out how it works.
Cracking:
Cracking builds on reverse engineering by manipulating a program's internals to get it to do something that it was not intended to.
Defense:
While all software is crackable, defenses can make a program more difficult and time-consuming to crack.
Both offensive and defensive reverse engineers benefit from the same set of skills. Without an understanding of reverse engineering and cracking, a defender can't craft effective protections. On the other hand, an attacker can more effectively bypass and overcome these protections if they can understand and manipulate how a program works.
This book is organized based on these three core capabilities and skill sets. The structure is as follows:
PART
TOPICS
GOAL
Part 1: Background
History and legal considerations x86 crash course
Understand x86 and learn to move quickly.
Part 2: Software Reverse Engineering
Reconnaissance Key checkers Key generators Process monitoring Resource manipulation Static analysis Dynamic analysis Writing key gens Cracking software
Master the tools, approaches, and mindset required to take software apart and understand its inner workings.
Part 3: Software Cracking
Manual patching Automated patchers Advanced dynamic analysis Execution tracing Advanced static analysis Trial periods Nag screens More key gens More cracks
Master the tools, approaches, and mindset necessary to isolate behavior and modify software.
Part 4: Defenses, Countermeasures, and Advanced Topics
Obfuscation/deobfuscation Anti-debugging/anti-anti-debugging Packing/unpacking Cryptors/decryptors Architectural defenses Legal Timeless debugging Binary instrumentation Intermediate representations Decompiling Automatic structure recovery Visualization Theorem provers Symbolic analysis Cracking extravaganza
Master defenses and counter-defenses. Evaluate defensive posture and tradeoffs. Explore advanced topics. Exercise reverse engineering and cracking tools, techniques, and mindset.
The best way to learn reverse engineering and software cracking is by doing it. For this reason, this book will include several hands-on labs that demonstrate the concepts described in the text.
The goal of this book isn't to teach a particular set of tools and techniques. While the focus is on x86 software running in Windows, many of the approaches and techniques will translate to other platforms. This book will attempt to demonstrate a wide range of tools, including open-source, freeware, shareware, and commercial solutions. With an understanding of what tools are available and their relative strengths and weaknesses, you can more effectively select the right tool for the job.
Hands-on labs and exercises will also focus on reverse engineering and cracking a variety of different targets, including the following:
Real software:
Some exercises will use real-world software carefully selected to avoid copyright violations.
Manufactured examples:
Software written specifically for this book to illustrate concepts that are impractical to demonstrate with real-world examples.
Crackmes:
Manufactured software developed by crackers to illustrate a concept or challenge others.
The book mentions some additional files, such as labs or tools. These items are available for download from https://github.com/DazzleCatDuo/X86-SOFTWARE-REVERSE-ENGINEERING-CRACKING-AND-COUNTER-MEASURES.
Before diving into the nitty-gritty details of cracking and reverse engineering, it is useful to understand its history. Software protections and the tricks and techniques used to overcome them have been evolving for decades.
The first software copy protections emerged in the 1970s. Some of the early movers in the space were as follows:
Apple II:
The Apple II incorporated proprietary disk drivers that would allow writing at half-tracks, writing extra rings, and staggering and overlapping sectors. The purpose of this was to make the disks unusable by non-Apple machines and software that wouldn't know to read and write at these odd offsets.
Atari 800:
Atari 800 systems would intentionally include bad sectors in their disks and attempt to load these sectors. If these loads didn't return a “bad sector” error, then the software knew it wasn't a valid disk and would halt execution.
Commodore 64:
Legitimate Commodore 64 software was distributed only on read-only disks. The software would attempt to overwrite the disk, and, if it succeeded, it knew the disk was counterfeit.
These protections all depended on unusual behavior by the software, such as the use of invalid memory or attempting to overwrite the program's own code. Defeating these protections required an understanding of how the software worked.
The rise of cracking and reverse engineering began in the 1980s. However, these early crackers weren't in it for the money. Cracking was a contest to determine who could figure out and bypass software protections the quickest.
Over the next several decades, the reverse engineering and cracking scene evolved. These are some of the key dates in the history of reverse engineering:
1987:
Fairlight's formation in 1987 by Bacchus defines one of the first operational groups. Fairlight will later come to prominence in FBI crackdowns of the early 2000s. For more historic details visit
www.fairlight.to
and
csdb.dk
.
1990:
Elliot J. Chikofsky and James H. Cross II defined reverse engineering as “the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction. (“Reverse Engineering and Design Recovery: A Taxonomy.”
IEEE Software
, Vol. 7, Issue 1, Jan 1990).
1997:
Old Red Cracker (handle +ORC) founds the Internet-based High Cracking University (+HCU) to allow everyone to learn about cracking. +ORC released “how to crack” lessons online and authored academic papers. +HCU students had handles that began with an +.
1997–2009:
The “warez scene” emerges with groups competing to be the first to release copyrighted material. Insiders (aka “suppliers”) provided early access to their groups, “crackers” broke the protections, and “couriers” distributed cracked software to FTP sites. Between 2003 and 2009, approximately 3,164 active groups were on “the scene”, competing primarily for pride and bragging rights, not money.
2004:
The FBI and other countries begin raids against “the scene”. Operation Fastlink (2004) led to the conviction of 60 warez members, and Operation Site Down (2005) took down 25 warez groups.
The arms race between software protections and crackers continues to rage, and reverse engineering is an invaluable skill set on both sides. Crackers need to understand how a program works to manipulate it and bypass defenses. On the defensive side, it's important to understand the latest cracking techniques to develop defenses that protect intellectual property and other sensitive data.
The best way to learn is by doing. This is why this book includes labs and exercises with real-world software as well as manufactured examples and crackmes. We are not lawyers, and those with concerns should consult a lawyer. We recommend the Electronic Frontier Foundation (www.eff.org). Chapter 15 covers legal topics because we feel it's important for everyone to understand the US-based laws that affect this area. There are two main laws to be aware of: the Copyright Act and the Digital Millennium Copyright Act (DMCA).
The Fair Use Clause of the Copyright Act (Copyright Act, 17 U.S.C. § 107) states that reverse engineering falls under “fair use” when done for “…purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research….” This exception is balanced against “the effect of the use upon the potential market for or value of the copyrighted work.” In essence, reverse engineering used for educational purposes is legal if you don't share or sell the cracked software.
In October 2016, the DMCA also added an exception for good faith security research. It states, “accessing a computer program solely for purposes of good-faith testing, …where such activity is carried out in a controlled environment designed to avoid any harm to individuals or the public, …and is not used or maintained in a manner that facilitates copyright infringement.”
The software examined in this book and used in exercises was carefully selected to fall under the fair use and DMCA exceptions. If you are planning to reverse engineer and crack software for anything other than self-education, you should consult a lawyer. The legal considerations of reverse engineering will also be explored in greater detail in a later chapter.
Software reverse engineering and cracking have a rich history, and this skill set has both offensive and defensive applications. However, it is important to understand the laws around these disciplines and ensure that your activities fall under the good-faith testing and fair use exemptions.
This book is designed to provide a strong foundation in the skills and tools used for software reverse engineering and cracking. Beginning with the fundamentals, the book will move on through sections on software reverse engineering and cracking to end with an exploration of advanced offensive and defensive techniques.
An effective reverse engineer or cracker is one who understands the systems they are analyzing. Software is designed to run in a particular environment, and if you don't understand how that environment works, you will struggle to understand the software.
This chapter explores the steps necessary to get started reverse engineering an application. Decompilation is crucial to transforming an application from machine code to something that can be read and understood by humans. To actually analyze the resulting code, it is also necessary to understand the architecture of the computers that it is designed to run on.
Most programmers write using a higher-level programming language like C/C++ or Java, which is designed to be human-readable. However, computers are designed to run machine code, which represents instructions in binary.
Compilation is the process of converting a programming language to machine code. This means decompilation would be the process of taking machine code back to the original programming language, recovering the original source code. When available, this is the easiest approach to reverse engineering because source code is designed to be read and interpreted by a human. The majority of this book will focus on the more typical case when decompilation is not possible. But for the purposes of learning, it is important to understand that sometimes you can decompile back to the source code, and when that is an option, you should take it.
For many programming languages, full decompilation is impossible. These languages build code directly to machine code, and some information, such as variable names, is lost in the process. While some advanced decompilers can build pseudocode for these languages, the process isn't perfect.
However, some programming languages use what's called just-in-time (JIT) compilation. When programs written in JIT languages are “built,” they are converted from the source code into an intermediate language (IL), not machine code. JIT compilers store a copy of the code in this IL until the program is run, at which point the code is converted to machine code. Examples of JIT languages include Java, Dalvik (Android), and .NET.
For example, Java is well-known for being largely platform-agnostic, and the reason for this is its use of an IL (Java bytecode) and the Java Virtual Machine (JVM). By distributing the program code as bytecode and compiling it only at runtime, Java's JVM translates from the Java IL to machine code specific to the machine it's running on. While this approach can negatively impact file size and performance, it pays off in portability.
JIT compilation also makes reverse engineering these applications much easier. These intermediate languages are similar enough to the original source code that they can be decompiled or converted back into usable source code. Source code is designed to be human-readable, making it far easier to understand the application's logic and identify software protections or other embedded secrets.
For JIT languages like .NET, several free decompilers are available. One widely used .NET decompiler is JetBrains dotPeek, which is available from www.jetbrains.com/decompiler. Figure 1.1 shows an example of .NET code decompiled in dotPeek.
As shown in the figure, the .NET code is easily readable after decompilation because the intermediate language encodes a wealth of information as metadata, enabling more accurate reconstruction of the source code. Any sensitive information or trade secrets contained within the code are easily accessible to a reverse engineer.
Figure 1.1: JetBrains dotPeek .NET decompiler
Unlike true machine code programs, JIT-compiled programs can often be converted to source code. Lowering the bar for reverse engineering the code makes many of the x86 anti-reverse engineering defenses discussed in later chapters unnecessary and overkill.
For decompilable languages, a commonly used defense against reverse engineering is obfuscation. Figure 1.2 shows an example of a .NET application before and after obfuscation.
The top half of the figure contains code before obfuscation occurs, where the function and variable names and strings are easily readable. The information in these variable names makes it easier for a reverse engineer to understand the purpose of each function and how the application works as a whole.
In the bottom half of the image, we see the obfuscated version of the same code. Now, function names, variable names, and strings are all mangled, making it much harder to understand the purpose of the function shown, let alone the application as a whole.
Another important security best practice is to avoid writing security or privacy-critical code in JIT languages where reverse engineering is easy. Instead, write this code in an assembled language, such as C/C++, where reverse engineering is significantly more difficult. This code can be included in DLLs that are linked to the executable containing the nonsensitive code written in a JIT language.
Figure 1.2: Obfuscation in JetBrains dotPeek
This is the first hands-on lab for this book. Labs and all associated instructions can be found in their corresponding folder here:
https://github.com/DazzleCatDuo/X86-SOFTWARE-REVERSE-ENGINEERING-CRACKING-AND-COUNTER-MEASURES
For this lab, please locate Lab Decompiling and follow the provided instructions.
Every lab in this book is designed to teach and provide hands-on experience with certain skills. This lab's skills to practice include the following:
Decompiling
Performing introductory reverse engineering
To learn these skills, you'll be using JetBrains dotPeek to reverse engineer and modify a .NET application.
Decompiling is a powerful and easy approach to understanding and modifying a program. However, it doesn't work on every program. While programs written in languages such as C/C++ can be decompiled using tools such as IDA's Hex-Rays Decompiler or Ghidra, the result is often low-quality and difficult to use.
When developing applications that contain sensitive information or that you don't want modified, it's better to use a language that isn't easily decompiled. For example, C/C++ is a better choice for sensitive functionality than a .NET language such as C#.
Decompilation is the easy approach to reverse engineering because it gets you back to higher-level languages and logic structures. However, this easy path is not often available. For languages that build to machine code, we need to go deeper and understand how computer architectures and machine and assembly code work.
It's generally thought that the average programmer doesn't need an in-depth understanding of how computers work. When writing a program in a procedural language, the operating system handles all of the low-level operations. A program is displayed as a process that has access to the processor, memory, and file system whenever it needs them. Processes appear to have their own contiguous memory spaces, and files are just a sequence of bytes to read and write.
However, none of this is actually true, and your operating system has been abstracting the truth from you (to make it easier to program). A solid understanding of how computer architecture actually works is essential for a reverse engineer. Figure 1.3 shows the main components that make up a computer, including the central processing unit, bridge, memory, and peripherals.
Figure 1.3: Computer architecture
The central processing unit (CPU) is where processing occurs on a computer. Inside the CPU are the following components:
Arithmetic logic unit (ALU):
The ALU performs mathematical operations within the computer, such as addition and multiplication.
Registers:
Registers perform temporary data storage and are used as the primary inputs and outputs of x86 instructions. Registers provide extremely fast access to a single word of data and are typically accessed by name.
Control units:
Control units execute code. This includes reading instructions and orchestrating the operations of other elements within a computer.
The CPU is connected via a bus to a bridge. The purpose of the bridge is to connect the CPU to other components of the system, including memory and the I/O bus, which is where peripherals such as the keyboard, mouse, and speakers are connected to the system. While information flows over a bus, the bridge is responsible for controlling this traffic and ensuring that traffic flowing in over one bus is routed out over the appropriate bus.
Peripherals, connected via the I/O bus, allow the computer to communicate with the outside world. This includes sending and receiving data from the graphics card, keyboard, mouse, speakers, and other systems.
As its name suggests, memory is where data is stored on the computer. Data is stored as a linear series of bytes that are accessed via their address. This design allows moderately fast access to data stored on the system.
When a program wants to access data in memory, the CPU sends a request via a bus to the bridge, which forwards it to the memory, where the data at the indicated address is accessed. The requested data then needs to retrace that route and return to the CPU before it can be used by the program. In contrast, a register is physically located within the CPU, making it far more accessible.
Registers are storage that lives inside of the CPU and, unlike memory, are not a linear series of bytes. Registers are specifically named and have set sizes associated with each.
Registers and memory both serve the same purpose: they store data. However, they have different specializations (quality versus quantity). Registers are few in number and expensive, but they provide extremely fast access to data. Memory is cheap and plentiful but offers slower access speeds.
The bulk of the data associated with a program, the code itself and its data, will be stored in memory. While the program is running, small chunks of data will be copied to the registers for processing.
Computers run on binary, digital logic. Everything is either on (1) or off (0). This includes programs running on a computer. All high-level languages are eventually converted into a series of bits called machine code. This machine code defines the set of instructions that the computer executes to perform a desired function.
Every programmer begins learning a language with a “hello world” program. In x86, the machine code for “hello world” is as follows:
55 89 e5 83 e4 f0 83 ec 10 b8 b0 84 04 08 89 04 24 e8 1a ff ff ff b8 00 00 00 00 c9 c3 90
This machine code is written in hexadecimal for readability and compactness, but its true value is a binary string of 1s and 0s. This binary string contains instructions to flip transistors to calculate information, fetch data from memory, send signals over the system buses, interact with the graphics card, and, finally, print out the “hello world” text. If this string of characters seems a bit short to accomplish all this, it's because these instructions trigger the operating system (in this example Linux) to help out.
Machine code controls the processor at the most detailed possible level. Some of the functions that machine code performs include the following:
Moving data in and out of memory
Moving data to and out of registers
Controlling the system bus
Controlling the ALU, control unit, and other components
This low-level control means that applications written in machine code can be incredibly powerful and efficient. However, while memorizing and inputting various series of bits to perform certain tasks is pretty awesome, it is inefficient and prone to error.
In machine code, a series of bits represents a particular action. For example, 0x81 or 10000001 is an instruction that adds two values together and stores the result at a particular location.
Assembly code is designed to be a human-readable version of machine code. Instead of memorizing a binary or hexadecimal string like 0x81 or 10000001, a programmer can use add. The add mnemonic is mapped to 0x81, so this shorthand makes programming easier without losing any of the benefits of writing in machine code.
Translating machine code to assembly code makes it much easier to understand. For example, the previous “hello world” example code can be converted into a series of comprehensible instructions.
MACHINE CODE
ASSEMBLY
55
push ebp
89 e5
mov ebp,esp
83 e4 f0
and esp, 0xfffffff0
83 ec 10
sub esp, 0x10
b8 b0 84 04 08
mov eax
89 04 24
mov [esp], eax
e8 1a ff ff ff
call 80482f4
b8 00 00 00 00
mov eax, 0x0
c9
leave
c3
ret
90
nop
If you understand machine code, writing directly in it can be fun, and there are cases where it may make sense. However, the majority of the time, it is inefficient and impractical. Writing in assembly provides the same benefits as writing in machine code but is much more practical.
After code has been written in assembly, it can be translated to machine code by an assembler in a process called assembling. A program already in machine code can be disassembled into assembly code by a disassembler.
Assemblers convert assembly code to machine code. Disassemblers convert machine code to assembly.
Many programmers don't write in machine code or assembly. Instead, they use higher-level languages that abstract away more of the details. For example, the following pseudocode is similar to many high-level procedural languages.
int x=1, y=2, z=x+y;
During the compiling process, these higher-level languages are converted into assembly code similar to the following:
mov [ebp-4], 0x1
mov [ebp-8], 0x2
mov eax, [ebp-8]
mov edx, [ebp-4]
lea eax, [edx+1*eax]
mov [ebp-0xc], eax
An assembler can then be used to convert the assembly code into the following machine code that a computer can use:
c7 45 fc 01 00 00 00 c7 45 f8 02 00 00 00 8b 45 f8 8b 55 fc 8d 04 02 89 45 f4
The word computer covers a wide range of systems. A smartwatch and a desktop computer both work in similar ways. However, their internal components can differ significantly.
An instruction set architecture (ISA) describes the ecosystems where programs run. Some of the factors that an ISA defines include the following:
Registers:
The ISA specifies whether a processor has a single register or hundreds. It also defines the size of these registers, whether they contain 8 bits or 128 bits.
Addresses and data formats:
The ISA specifies the format for addresses used to access data in memory. It also defines how many bytes the system can grab from memory at a time.
Machine instructions:
Different ISAs may support different sets of instructions. The ISA defines whether addition, subtraction, equality, halt, and other instructions are supported.
By defining the capabilities of the physical system, the ISA also indirectly defines the assembly language. The ISA specifies which low-level instructions are available and what those instructions do.
A microarchitecture describes how a particular ISA is implemented on a processor. Figure 1.4 shows an example of the Intel Core 2 architecture.
Together, an ISA and microarchitecture define the computer architecture. The existence of thousands of ISA and thousands of microarchitectures means that there are thousands of computer architectures as well.
Figure 1.4: Intel Core 2 architecture
An instruction set architecture defines how registers, addresses, data formats, and machine instructions work. Microarchitectures implement ISAs on a processor. Together, an ISA and microarchitecture define a computer architecture.
While thousands of computer architectures exist, they can be broadly divided into two main categories. Reduced instruction set computing (RISC) architectures define a small number of simpler instructions. In general, RISC architectures are cheaper and easier to create, and the hardware is physically smaller and consumes less power.
In contrast, a complex instruction set computing (CISC) architecture defines a larger number of more powerful instructions. CISC processors are more expensive and difficult to create and are typically larger and consume more power.
While CISC architectures may seem objectively worse than RISC ones, their main benefit lies in the ease and efficiency of programming. For example, consider a hypothetical example where a program wants to multiply a value by 5 in a RISC versus CISC system.
CISC
RISC
mul [100], 5
load r0, 100
mov r1, r0
add r1, r0
add r1, r0
add r1, r0
add r1, r0
mov [100], r1
In this example, a CISC processor can perform the calculation in a single instruction if it has a multiplication operation that can load a value from memory, multiply it, and store the result at the same memory location. However, a RISC processor may lack a multiplication operator because it is a complex operation. Instead, the RISC loads the value from memory, adds it to itself four times, and stores the result in the same memory location across seven steps.
RISC and CISC architectures both have their advantages, disadvantages, and use cases. For example, a RISC operator may take 100 instructions to perform the same operation that a CISC operator can perform in one. However, that single CISC operation may take 100× as long to run or 100× the power.
Both RISC and CISC instruction sets are in common use today. Some examples of widely used RISC architectures include the following:
ARM (used by phones, tablets)
MIPS (used by embedded systems and networking equipment)
PowerPC (used by original Macs and Xbox360)
In this book, we focus on the x86 assembly language, which is a CISC architecture. This architecture is in use on all modern PCs and servers and is supported by all the main operating systems (Windows, Mac, Linux) and even some gaming systems, such as the Xbox One. Making it one of the most powerful to learn for software cracking.
The machine code that actually runs on computers isn't designed for humans to read and understand. To be usable, it needs to be converted into a different form.
One option for this is decompilation, which produces a result that is similar or identical to the original source code. However, decompilation is not always possible.
For fully compiled languages, such as C/C++, and many other languages, it is necessary to disassemble a compiled executable and analyze it in assembly. However, this requires a much deeper understanding of the computer's architecture and how it actually works than writing and reading code in a higher-level language. Now that we know the role decompilation can play and the need for disassembly, in the next few chapters we'll look at how computers work, so we can learn to disassemble like a pro.
Most software reverse engineering requires disassembling a compiled executable and analyzing the result. This disassembly results in assembly code, not a higher-level language.
While a few assembly languages exist, x86 is one of the most widely used. This chapter introduces some of the key concepts of x86 assembly, providing a foundation for later chapters.
Thousands of computer architectures exist. While they all work similarly, a computer is a computer—but there are minor or major differences between each.
To study reverse engineering, we need to select an architecture to focus on. In this book, we'll be using x86, which was selected for a few different reasons:
Ubiquity:
x86 is the most widely used assembly language, making it widely applicable for reverse engineering.
Computer support:
x86 applications can be built, run, and reverse engineered on any desktop, laptop, or server.
Market share:
x86 is the core of the major operating systems (Windows, Linux, and macOS), so it is used in billions of systems.
The x86 architecture has been around for decades and has evolved significantly over the years. It was first introduced in 1974 by Intel, and some of the main milestones in the history of x86 include the following:
Intel 8080:
8-bit microprocessor, introduced in 1974
Intel 8086:
16-bit microprocessor, introduced in 1978
Intel 80386:
32-bit microprocessor, introduced in 1985
Intel Prescott, AMD Opteron, and Athlon 64:
64-bit microprocessor, introduced in 2003/2004