142,99 €
Embedded Cryptography provides a comprehensive exploration of cryptographic techniques tailored for embedded systems, addressing the growing importance of security in devices such as mobile systems and IoT. The books explore the evolution of embedded cryptography since its inception in the mid-90s and cover both theoretical and practical aspects, as well as discussing the implementation of cryptographic algorithms such as AES, RSA, ECC and post-quantum algorithms.
The work is structured into three volumes, spanning forty chapters and nine parts, and is enriched with pedagogical materials and real-world case studies, designed for researchers, professionals, and students alike, offering insights into both foundational and advanced topics in the field.
Embedded Cryptography 1 is dedicated to software side-channel attacks, hardware side-channel attacks and fault injection attacks.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 680
Veröffentlichungsjahr: 2025
Cover
Table of Contents
Title Page
Copyright Page
Preface
Part 1. Software Side-Channel Attacks
Chapter 1. Timing Attacks
1.1. Foundations
1.2. Example attacks
1.3. Example mitigations
1.4. Notes and further references
1.5. References
Chapter 2. Microarchitectural Attacks
2.1. Background
2.2. The Prime+Probe attack
2.3. The Flush+Reload attack
2.4. Attacking other microarchitectural components
2.5. Constant-time programming
2.6. Covert channels
2.7. Transient-execution attacks
2.8. Summary
2.9. Notes and further references
2.10. References
Part 2. Hardware Side-Channel Attacks
Chapter 3. Leakage and Attack Tools
3.1. Introduction
3.2. Data-dependent physical emissions
3.3. Measuring a side-channel
3.4. Leakage modeling
3.5. Notes and further references
3.6. References
Chapter 4. Supervised Attacks
4.1. General framework
4.2. Building a model
4.3. Controlling the dimensionality
4.4. Building de-synchronization-resistant models
4.5. Summary of the chapter
4.6. Notes and further references
4.7. References
Chapter 5. Unsupervised Attacks
5.1. Introduction
5.2. Distinguishers
5.3. Likelihood distinguisher
5.4. Mutual information
5.5. Correlation
5.6. A priori knowledge synthesis
5.7. Conclusion on statistical tools
5.8. Exercise solutions
5.9. Notes and further references
5.10. References
Chapter 6. Quantities to Judge Side Channel Resilience
6.1. Introduction
6.2. Metrics for comparing the effectiveness of specific attack vectors
6.3. Metrics for evaluating the leakage (somewhat) independent of a specific attack vector
6.4. Metrics for evaluating the remaining effort of an adversary
6.5. Leakage detection as a radical alternative to attack driven evaluations
6.6. Formal evaluation schemes
6.7. References
Chapter 7. Countermeasures and Advanced Attacks
7.1. Introduction
7.2. Misalignment of traces
7.3. Masking
7.4. Combination of countermeasures
7.5. To go further
7.6. References
Chapter 8. Mode-Level Side-Channel Countermeasures
8.1. Introduction
8.2. Building blocks
8.3. Security definitions
8.4. Leakage models
8.5. Constructions
8.6. Acknowledgments
8.7. Notes and further references
8.8. References
Part 3. Fault Injection Attacks
Chapter 9. An Introduction to Fault Injection Attacks
9.1. Fault injection attacks, disturbance of electronic components
9.2. Practical examples of fault injection attacks
9.3. Notes and further references
9.4. References
Chapter 10. Fault Attacks on Symmetric Cryptography
10.1. Introduction
10.2. Differential fault analysis
10.3. Automation of DFA
10.4. DFA countermeasures: general idea and taxonomy
10.5. Advanced FA
10.6. Leakage assessment in fault attacks
10.7. Chapter summary
10.8. Notes and further references
10.9. References
Chapter 11. Fault Attacks on Public-key Cryptographic Algorithms
11.1. Introduction
11.2. Preliminaries
11.3. Attacking the RSA using the Chinese remainder theorem
11.4. Attacking a modular exponentiation
11.5. Attacking the ECDSA
11.6. Other attack strategies
11.7. Countermeasures
11.8. Conclusion
11.9. Notes and further references
11.10. References
Chapter 12. Fault Countermeasures
12.1. Anatomy of a fault attack
12.2. Understanding the attacker
12.3. Taxonomy of fault countermeasures
12.4. Fault countermeasure principles
12.5. Fault countermeasure examples
12.5.1. Algorithm level countermeasures
12.6. ISA level countermeasures
12.7. RTL-level countermeasures
12.8. Circuit-level countermeasures
12.9. Design automation of fault countermeasures
12.10. Notes and further references
12.11. References
List of Authors
Index
Summary of Volume 2
Summary of Volume 3
End User License Agreement
Chapter 2
Table 2.1.
Typical cache parameters for many Intel Core processors
Chapter 3
Table 3.1.
Power consumption of a CMOS inverter gate observed from the V
DD
rail
Chapter 5
Table 5.1.
Synthesis of classic assumptions and properties
Table 5.2.
Summary of the characteristics of the...
Chapter 8
Table 8.1. CIML2
game
Table 8.2.
The game
Table 8.3.
Strong unpredictability with leakage in...
Chapter 9
Table 9.1.
Features of laser focusing lenses
Table 9.2.
Two instruction replay fault model....
Chapter 10
Table 10.1.
Fault template for the χ
3
S-Box
13
Table 10.2.
Template for attacking TI PRESENT (middle round). The black cells...
Table 10.3.
Summary of results
17
Chapter 1
Figure 1.1.
Two descriptions of bubble sort
Figure 1.2.
Attack models relating to the examples presented in section 1.2
Chapter 2
Figure 2.1.
The typical cache hierarchy of a four-core Intel processor....
Figure 2.2.
Prime+Probe on AES. The shade indicates the relative probe....
Figure 2.3.
Flush+Reload on the square-and-multiply implementation of....
Chapter 3
Figure 3.1.
Dynamic currents in a CMOS inverter gate due to
0 →....
Figure 3.2.
Current consumption in a CMOS inverter (seen from V
DD
)
Figure 3.3.
Static power measurement by stopping the clock signal CLK....
Figure 3.4.
Generic power analysis setup model....
Figure 3.5.
Probing methodologies for power analysis measure: shunt....
Figure 3.6.
Example side-channel trace: beginning of an AES
Figure 3.7.
Example side-channel trace: SNR results. Left to right:...
Figure 3.8.
Three boards described in this chapter. From the top and clock....
Chapter 4
Figure 4.1.
Supervised attack scenario
Figure 4.2.
Example of Gaussian distributions for B
= 2
, D
....
Figure 4.3.
A sketch of MLP....
Figure 4.4.
Principle of an SNR (relative scale for the y-axes)....
Figure 4.5.
Two convolutional layers. Left: W
= 2...
Figure 4.6.
A Max Pooling layer....
Figure 4.7.
Synthesis of generative and discriminative models....
Chapter 5
Figure 5.1.
Principle of an unsupervised attack
Figure 5.2.
Scenario of an unsupervised attack
Chapter 7
Figure 7.1.
Examples of side-channel traces and SNR....
Figure 7.2.
Examples of side-channel traces and SNR....
Figure 7.3.
Examples of consumption and SNR traces for...
Figure 7.4.
Examples of side-channel traces and SNR for...
Chapter 8
Figure 8.1.
Leakage-resilient PRG
Figure 8.2.
Leakage-resilient PRF
Figure 8.3.
CBC-MAC authenticates a message M
=
m
1
...
Figure 8.4.
The HBC MAC together with its tag verification operation
Figure 8.5.
The CTR mode used to encrypt a message M
=
m
1
...
Figure 8.6.
A
CPAL
secure encryption mode
Figure 8.7.
Exemplary leakage-resistant AE
Chapter 9
Figure 9.1.
Mask in black paint to reveal only the parts of the component...
Figure 9.2.
Fault injection device using light disturbances: use of camera...
Figure 9.3.
Basic internal architecture of a digital integrated circuit
Figure 9.4.
Illustration of the fault injection mechanism by violation of setup...
Figure 9.5.
Architecture of the AES encryption block
Figure 9.6.
Illustration of the overclocking fault injection mechanism
Figure 9.7.
AES-128: demonstrating the data dependency of fault injection by...
Figure 9.8.
Distribution of single-bit faults injected by increasing the clock...
Figure 9.9.
Representation of a nominal clock signal (upper part) and a clock...
Figure 9.10.
Clock glitch fault injection
Figure 9.11.
Evolution of the critical time of the AES-128 co-processor data...
Figure 9.12.
Evolution of the critical time of three data paths as a function...
Figure 9.13.
Representation and effect of a negative voltage glitch on AES...
Figure 9.14.
Representation of the theoretical (left) and real (right)...
Figure 9.15.
Joint effect of temperature and supply voltage variations...
Figure 9.16.
Injection probes EM: (left) detail of the injection probe;...
Figure 9.17.
Distribution of AES encryption block elements on the FPGA...
Figure 9.18.
Sampling faults injected by EM disturbance into AES-128...
Figure 9.19.
Open microcontroller component for laser access: acid...
Figure 9.20.
Illustration of the fault injection mechanism using...
Figure 9.21.
Presentation of the areas of an SRAM cell sensitive...
Figure 9.22.
Laser fault injection sensitivity maps of SRAM memories,...
Figure 9.23.
Sensitivity maps for laser fault injection of D flip-flops...
Figure 9.24.
General architecture of a fault injection bench
Figure 9.25.
Traditional power glitch generator using a transistor
Figure 9.26.
EM pulse fault injection devices in integrated circuits: principle...
Figure 9.27.
Laser fault injection bench
Figure 9.28.
Architecture of a laser source based on diode technology
Figure 9.29.
Laser-induced instruction skip in a microcontroller test program,...
Figure 9.30.
Trace of a microcontroller’s power consumption during start-up...
Figure 9.31.
Laser fault injection in instructions and data read from Flash mem...
Figure 9.32.
Laser-induced single and multiple instruction skips during...
Figure 9.33.
Replay and instruction skips obtained by laser illumination of...
Figure 9.34.
Resynchronization of 12 DES key loading processes
Figure 9.35.
Generation of seven flashes on the 12 DES key loading loops
Figure 9.36.
verifyPIN() PIN code verification
Figure 9.37.
byteArrayCompare()
function for comparing user...
Figure 9.38.
Assembly instructions for assigning the Boolean value auth_status...
Figure 9.39.
Instruction skip, corresponding to the replacement of the instruc...
Figure 9.40.
PIN code bypass obtained by four consecutive laser shots,...
Figure 9.41.
Illustration of a brute-force attack on the PIN verifica...
Chapter 10
Figure 10.1.
Fault propagation path in AES for a byte fault injection....
Figure 10.2.
Diagonal fault attack with the faults injected at the di...
Figure 10.3.
ExpFault: conceptual view
5
Figure 10.4.
Partial CDG of AES from ninth-round MixColumns till the...
Figure 10.5.
Taxonomy of notable FA countermeasures
Figure 10.6.
Illustrating the root cause of SIFA: (a) the possible....
Figure 10.7.
Fault propagation: (a) XOR gate; (b) AND gate. The....
Figure 10.8.
ALAFA leakage assessment test
15
Figure 10.9. t
-test scores: (a) infection countermeasure, Gierlichs....
Chapter 11
Figure 11.1.
Result of the SIFA targeting the most significant byte....
Figure 11.2.
Detective (left) and infective (right) countermeasures
Figure 11.3.
CRT-RSA with a final verification
Chapter 12
Figure 12.1.
Taxonomy of fault countermeasures along three dimensions
Cover Page
Table of Contents
Title Page
Copyright Page
Preface
Begin Reading
List of Authors
Index
Summary of Volume 2
Summary of Volume 3
iii
iv
xiii
xiv
xv
xvi
xvii
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
65
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
355
356
357
358
359
360
361
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
SCIENCES
Computer Science,Field Director – Jean-Charles Pomerol
Cryptography, Data Security,Subject Head – Damien Vergnaud
Coordinated by
Emmanuel Prouff
Guénaël Renault
Matthieu Rivain
Colin O’Flynn
First published 2025 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the under mentioned address:
ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK
www.iste.co.uk
John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA
www.wiley.com
© ISTE Ltd 2025The rights of Emmanuel Prouff, Guénaël Renault, Matthieu Rivain and Colin O’Flynn to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2024945145
British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-213-6
ERC code:PE6 Computer Science and InformaticsPE6_5 Security, privacy, cryptology, quantum cryptography
Emmanuel PROUFF1, Guénaël RENAULT2, Matthieu RIVAIN3, and Colin O’FLYNN4
1LIP6, Sorbonne Université, Paris, France
2Agence nationale de la sécurité des systèmes d’information, Paris, France
3CryptoExperts, Paris, France
4Dalhousie University and NewAE Technology Inc, Halifax, Canada
The idea for this project was born during a discussion with Damien Vergnaud. Damien had been asked to propose a series of volumes covering the different domains of modern cryptography for the SCIENCES series. He offered us the opportunity to take charge of the Embedded Cryptography books, which sounded like a great challenge to take on. In particular, we thought it was perfectly timely as the field was gaining increasing importance with the growing development of complex mobile systems and the Internet of Things.
The field of embedded cryptography, as a research domain, was born in the mid-1990s. Until that time, the evaluation of a cryptosystem and the underlying attacker model were usually agnostic of implementation aspects, whether the cryptosystem was deployed on a computer or on some embedded hardware like a smart card. Indeed, the attacker was assumed to have no other information than the final results of a computation and, possibly, the corresponding inputs. In this black box context, defining a cryptanalytic attack and evaluating resistance to it essentially consisted of finding flaws in the abstract definition of the cryptosystem.
In the 1990s, teams of researchers published the first academic results, highlighting very effective means of attack against embedded systems. These attacks were based on the observation that a system’s behavior during a computation strongly depends on the values of the data manipulated (which was previously known and exploited by intelligence services). Consequently, a device performing cryptographic computation does not behave like a black box whose inputs and outputs are the only known factors. The power consumption of the device, its electromagnetic radiation or its running time are indeed other sources that provide the observer with information on the intermediate results of the computation. Teams of researchers have also shown that it was possible to disrupt a computation using external energy sources such as lasers or electromagnetic pulses.
Among these so-called physical attacks, two main families emerge. The first gathers the (passive) side-channel attacks, including timing attacks proposed by Kocher (1996) and power analysis attacks proposed by Kocher et al. (1999), as well as the microarchitectural attacks that have considerably developed after the publication of the Spectre and Meltdown attacks in 2018 (Kocher et al. 2018). This first family of attacks focuses on the impact that the data manipulated by the system have on measurable physical quantities such as time, current consumption or energy dissipation related to state changes in memories. The second family gathers the (active) fault injection attacks, whose first principles were introduced by Boneh et al. (1997). These attacks aim to put the targeted system into an abnormal state of functioning. They consist, for example, of ensuring that certain parts of a code are not executed or that operations are replaced by others. Using attacks from either of these families, an adversary might learn sensitive information by exploiting the physical leakage or the faulted output of the system.
Since their inception, side-channel attacks and fault injection attacks, along with their countermeasures, have significantly evolved. Initially, the embedded systems industry and a limited number of academic labs responded with ad hoc countermeasures. Given the urgency of responding to the newly published attacks, these countermeasures were reasonably adequate at the time. Subsequently, the invalidation of many of these countermeasures and the increasing sophistication of attack techniques highlighted the need for a more formalized approach to security in embedded cryptography. A community was born from this observation in the late 1990s and gathered around a dedicated conference known as cryptographic hardware and embedded systems (CHES). Since then, the growth of this research domain has been very significant, resulting from the strong stake of the industrial players and the scientific interest of the open security issues. Nowadays, physical attacks involve state-of-the-art equipment capable of targeting nanoscale technologies used in the semiconductor industry. The attackers routinely use advanced statistical analyses or signal processing, while the defenders designing countermeasures call on concepts from algebra, probability theory or formal methods. More recently, and notably with the publication of the Spectre and Meltdown attacks, side-channel attacks have extended to so-called microarchitectural attacks, exploiting very common optimization techniques in modern CPUs such as out-of-order execution or speculative execution. Twenty-five years after the foundational work, there is now a large community of academic and industrial scientists dedicated to these problems. Embedded cryptography has gradually become a classic topic in cryptography and computer security, as illustrated by the increasing importance of this field in major cryptography and security conferences besides CHES, such as CRYPTO, Eurocrypt, Asiacrypt, Usenix Security, IEEE S&P or ACM CCS.
For this work, it seemed important to us to have both scientifically ambitious and pedagogical content. We indeed wanted this book to appeal not only to researchers in embedded cryptography but also to Master’s students interested in the subject and curious to take their first steps. It was also important to us that the concepts and notions developed in the book be as illustrated as possible and therefore accompanied by a pedagogical base. In addition to the numerous illustrations proposed in the chapters, we have made pedagogical material available (attack scripts, implementation examples, etc.) to test and deepen the various concepts. These can be found on the following GitHub organization: https://github.com/embeddedcryptobook.
This book provides a comprehensive exploration of embedded cryptography. It comprises 40 chapters grouped into nine main parts, and spanning three volumes. The book primarily addresses side-channel and fault injection attacks as well as their countermeasures. Part 1 of this volume is dedicated to Software Side-Channel Attacks, namely, timing attacks and microarchitectural attacks, primarily affecting software; whereas Part 2 is dedicated to Hardware Side-Channel Attacks, which exploit hardware physical leakages, such as power consumption and electromagnetic emanations. Part 3 focuses on the second crucial family of physical attacks against embedded systems, namely, Fault Injection Attacks.
A full part of the book is then dedicated to Masking in Part 1 of Volume 2, which is a widely used countermeasure against side-channel attacks and which has become an important research topic since their introduction in 1999. This part covers a variety of masking techniques, their security proofs and their formal verification. Besides general masking techniques, efficient and secure embedded cryptographic implementations are very dependent on the underlying algorithm. Consequently, Part 2, Cryptographic Implementations, is dedicated to the implementation of specific cryptographic algorithm families, namely, AES, RSA, ECC, and post-quantum cryptography. This part also covers hardware acceleration and constant-time implementations. Secure embedded cryptography needs to rely on secure hardware and secure randomness generation. In cases where hardware alone is insufficient for security, we must rely on additional software techniques to protect cryptographic keys. The latter is known as white-box cryptography. The next three parts of the book address those aspects. Part 3, Volume 2, Hardware Security, covers invasive attacks, hardware countermeasures and physically unclonable functions (PUF).
Part 1 of Volume 3 is dedicated to White-Box Cryptography: it covers general concepts, practical attack tools, automatic (gray-box) attacks and countermeasures as well as code obfuscation, which is often considered as a complementary measure to white-box cryptography. Part 2 is dedicated to Randomness and Key Generation in embedded cryptography. It covers both true and pseudo randomness generation as well as randomness generation for specific cryptographic algorithms (prime numbers for RSA, random nonces for ECC signatures and random errors for post-quantum schemes).
Finally, we wanted to include concrete examples of real-world attacks against embedded cryptosystems. The final part of this series of books contains those examples of Real World Applications and Attacks in the Wild. While not exhaustive, we selected representative examples illustrating the practical exploitation of the attacks presented in this book, hence demonstrating the necessity of the science of embedded cryptography.
This series of books results from a collaborative work and many persons from the embedded cryptography community have contributed to its development. We have tried to cover (as broadly as possible) the field of embedded cryptography and the many research directions related to this field. This has not been an easy task, given the dynamism and growth of the field over the past 25 years. Some experts from the community kindly shared their insights on the preliminary plan of the book, namely, Sonia Belaïd, Pierre-Alain Fouque, Marc Joye, Victor Lomné and Yannick Sierra. We would like to thank them for their insightful comments.
For each of the identified topics we wanted the book to cover, we have called upon expert researchers from the community, who have honored us by joining this project. The essence of this work is theirs. Dear Guillaume Barbu, Lejla Batina, Sonia Belaïd, Davide Bellizia, Florent Bernard, Begül Bilgin, Eleonora Cagli, Lukasz Chmielewski, Jessy Clédière, Brice Colombier, Jean-Sébastien Coron, Jean-Luc Danger, Lauren De Meyer, Cécile Dumas, Jean-Max Dutertre, Viktor Fischer, Pierre Galissant, Benedikt Gierlichs, Louis Goubin, Vincent Grosso, Sylvain Guilley, Patrick Haddad, Laurent Imbert, Ján Jančár, Marc Joye, Matthias Kannwischer, Stefan Katzenbeisser, Victor Lomné, José Lopes-Esteves, Ange Martinelli, Pedro Massolino, Loïc Masure, Nele Mentens, Debdeep Mukhopadhyay, Camille Mutschler, Ruben Niederhagen, Colin O’Flynn, Elisabeth Oswald, Dan Page, Pascal Paillier, Louisa Papachristodoulou, Thomas Peeters, Olivier Peirera, Thomas Pornin, Bart Preneel, Thomas Prest, Jean-René Reinhard, Thomas Roche, Francisco Rodríguez-Henríquez, Franck Rondepierre, Eyal Ronen, Melissa Rossi, Mylene Rousselet, Sylvain Ruhault, Ulrich Ruhrmair, Sayandeep Saha, Patrick Schaumont, Sebastian Schrittwieser, Peter Schwabe, Richa Singh, Sergei Skorobogatov, François-Xavier Standaert, Petr Svenda, Marek Sys, Akira Takahashi, Abdul Rahman Taleb, Yannick Teglia, Philippe Teuwen, Adrian Thillard, Medhi Tibouchi, Mike Tunstall, Aleksei Udovenko, David Vigilant, Lennert Wouters, Yuval Yarom and Rina Zeitoun: thank you so much for the hard work!
To accompany these authors, we also relied on numerous reviewers who kindly shared their remarks on the preliminary versions of the chapters. Their behind-the-scenes work allowed us to greatly improve the technical and editorial quality of the books. We express our gratitude to them, namely, Davide Alessio, Sébastien Bardin, Sonia Belaïd, Eloi Benoist-Vanderbeken, Gaëtan Cassiers, Jean-Sebastien Coron, Debayan Das, Cécile Dumas, Julien Eynard, Wieland Fischer, Thomas Fuhr, Daniel Genkin, Dahmun Goudarzi, Eliane Jaulmes, Victor Lomné, Loïc Masure, Bart Mennink, Stjepan Picek, Thomas Pornin, Thomas Prest, Jurgen Pulkus, Michaël Quisquater, Thomas Roche, Franck Rondepierre, Franck Salvador, Tobias Schneider, Okan Seker, Pierre-Yves Strub, Akira Takahashi, Abdul Rahman Taleb, Mehdi Tibouchi, Aleksei Udovenko, Gilles Van Assche, Damien Vergnaud, Vincent Verneuil and Gabriel Zaid.
October 2024
Boneh, D., DeMillo, R.A., Lipton, R.J. (1997). On the importance of checking cryptographic protocols for faults. In
Proceedings of Eurocrypt’97
. Springer, Heidelberg, 1233, 37–51.
Kocher, P.C. (1996). Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In
Proceedings of CRYPTO’96
. Springer-Verlag, Heidelberg, 104–113.
Kocher, P.C., Jaffe, J., Jun, B. (1999). Differential power analysis. In
Proceedings of CRYPTO’99
. Springer-Verlag, Heidelberg, 1666, 388–397.
Kocher, P.C., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T., Schwarz, M., Yarom, Y. (2018). Spectre attacks. Exploiting speculative execution.
ArXiv
[Online]. Available at:
http://arxiv.org/abs/1801.01203
.
Daniel PAGE
University of Bristol, United Kingdom
Set within the context of cryptographic engineering, timing attacks are a class of side-channel attack: if the execution latency of some target operation depends on security-critical data, this may allow an attacker to leverage passive execution latency measurements in any attempt to recover said data. Timing attacks have a rich history, and, in part because variation in execution latency can arise from a wide range of sources, remain surprisingly stubborn. Although a known problem, they remain far from a solved problem as evidenced by related instances continuing to appear more than 20 years since initial publications on the topic. The aim of this chapter is to offer an introduction to that topic, covering both a set of foundational concepts from a more theoretical perspective and a set of example attacks and their mitigation (via countermeasures) from a more applied perspective. As a result, the content should enable you to understand and apply the concepts covered in a broader context, for example, within your own work, and beyond the specific, limited set of examples used.
In this section, we focus on explanation of the central concepts. The aim, in fairly non-technical terms, is to first explain what a timing attack is, and then establish a terminology and model for characterizing and reasoning about them. Doing so offers a foundation, which we will then build on in section 1.2 through an investigation of more concrete, example attacks.
Figure 1.1.Two descriptions of bubble sort
Given a computational process P, the term execution latency is a measure of the delay between P starting and finishing execution (whether it completes normally or is terminated abnormally).
We intentionally leave the unit of measurement undefined. That is, within different contexts it might be appropriate to use different units, for example, an abstract unit such as computational steps, a concrete, human-oriented unit such as seconds, or a concrete, machine-oriented unit such as clock cycles. If the unit relates to time in a meaningful way, then it is common to see execution time used synonymously with execution latency: both terms capture the idea that we measure how much time elapses between starting and finishing execution.
The bubble sort algorithm is conceptually simple, so is an appealing, generic example to use as a vehicle to explore this concept further. The algorithm, described in Figure 1.1(a) using pseudo-code, repeatedly iterates over an n-element input array X, comparing adjacent elements and swapping them in-place if they occur in the wrong order; this is repeated until the array is sorted, which we know is true when no more swaps are required.
So, given an input array, what is the execution latency of this algorithm? That is, how much time elapses between starting (using X and n as input) and finishing execution (meaning a sorted X is produced as output) of the algorithm? Analysis of computational complexity (or, more specifically, time complexity) offers one approach to providing an answer. Doing so gives a result in terms of abstract computational steps, focused on the problem size n. By using such analysis, we conclude that in the worst case and average case, the algorithm will perform O(n2) comparison operations and O(n2) swap operations; in the best case, the algorithm will perform O(n) comparison operations and O(1) swap operations.
The execution latency of a computational process P is termed data dependent if it depends on, and so may vary as a result of, the input data. Otherwise, it is termed data independent: this implies that the execution latency is the same, irrespective of the input data for every execution of P.
Two facts should be clear from our analysis of bubble sort. First, and more obviously, the length of X impacts the execution latency: in all cases, the number of steps will be greater for an X with many elements (i.e. larger n) than it will for an X with few elements (i.e. smaller n). Second, and less obviously, however, the content of X also impacts the execution latency: as suggested by the differing best and worse cases, the number of steps will be less for an X which is already (close to being) sorted than it will for an X, which is not. Either way, execution latency of the bubble sort algorithm would therefore be classified as data dependent.
In concrete terms, data-dependent execution latency arises from (at least) the following cases: (1) which operations are executed, as determined by control-flow decisions made during execution; and (2) how operations are executed, or, put another way, what the execution latency of each executed operation is and whether or not it is data dependent.
Although instances of the first case can be identified within an abstract algorithm, instances of the second case are dependent on concrete details of an associated implementation and platform it is executed on. Conceptually at least, we could view this shift as including rather than (intentionally) excluding constant factors in analysis using computational complexity. Doing so is clearly imperative, because we ultimately aim to reason about concrete, real-world versus abstract, on-paper scenarios.
We can identify instances of both cases using Figure 1.1(b), which describes an implementation of bubble sort using C source code. For example, the if statement on line 8 represents an example of the first case: the flow of control is managed so lines 9–13 are either executed or not, depending on whether or not X[ i - 1 ] > X [ i ]. Likewise, the memory access implied by the expressions X[ i - 1 ] and X[ i ] on line 8 represents an example of the second case: the latencies of said accesses depend, for example, on the memory hierarchy (e.g. the existence and state of any caches that exist within it). More generally, the platform will optimize instruction execution; it may do so for both memory access and computational instructions (for example, although this implementation does not include any, shift, multiplication, and division instructions sometimes exhibit data-dependant execution latency). An implication of this fact is that robust analysis of data (in)dependence cannot arise from the (static) source code alone. The Instruction Set Architecture (ISA) said source code ultimate targets will typically only capture functional properties: how instructions will be (dynamically) executed, and therefore the behavioral properties of the platform (i.e. a specific micro-architectural implementation of that ISA), also need to be assessed.
In a sense, bubble sort is as efficient as it can be: it only performs the steps required to produce a correct output from an associated input by exiting the outer-loop as soon as no more swaps are required. Adopting a perspective where efficiency is the metric of interest, such an approach is clearly positive. In fact, do any negatives stem from it? If we adopt a different perspective where security is the (or at least a) metric of interest, the answer is yes.
Put simply, the fact that execution latency is data dependent implies some form of correlation: measuring the execution latency will provide information, of some form, about the data it depends on. Set in some scenario where an adversary has access to execution latency measurements, this can be hugely problematic. Specifically, consider a scenario where the data involved are security critical, for example, but without loss of generality, some cryptographic key material. Using bubble sort as an example starts to become tenuous at this point, so instead consider the following generic attack model:
which helps fix various concepts and notation. is using to perform some computation f. Having first accepted an input x from , computes and then provides r = f(k, x) as output. can also measure the execution latency associated with computation by , which we denote Λ. On the one hand, k is not and should not be known by . On the other hand, however, Λ may provide information about k if the execution latency of f is data dependent, and therefore may potentially be exploited in an attack aiming to recover it. This is the crux of a side-channel attack based on execution latency, or, less formally, a timing attack. We say that information about the security-critical data is leaked (or is leakage) through a side-channel represented by execution latency. Such attacks are potent, because, as a result of exploiting the additional information afforded by the side-channel, they can often be significantly more efficient than alternatives (e.g. traditional cryptanalysis).
Now focused on the topic of side-channel attacks, we need to address two important points regarding terminology. First, in relation to data (in)dependence, the term data are qualified by the fact that we almost always mean security-critical data. There is limited value in learning information about data we already know, for example, and so no implication for execution latency depending on it. So, from here on, read “data” as “security-critical data” unless otherwise stated. Second, it is very common to see the term constant-time (respectively, variable-time) used as a synonym for data-independent (respectively, data-dependent) execution latency in this context (already ignoring the clash vs. O(1) in computational complexity). We avoid these terms, deeming data (in)dependence more reflective of what is actually intended. For example, the execution latency of a given algorithm may vary and need not reflect any specific constant: we only demand that it does not vary, and so is (some) constant with respect to security-critical data involved. Likewise, use of the term constant-time suggests execution time (or latency) is the only pertinent issue; in reality, and more generally, we need to address the challenge of data-dependent resource use (both in terms of which resources are used, and the length of time they are used for). We can find examples of this fact investigated in Chapter 2.
Having presented the high-level concept, we now refine various lower level details of our attack model. We do so in a piecemeal manner below, framing the discussion around a set of concise definitions which, in combination, help to explain and characterize specific attacks (such as those used as examples in section 1.2).
The attack strategy employed by may be described as differential or non-differential.
At a high level, a non-differential attack will typically use few (even just one) execution latency measurements in isolation. In contrast, a differential attack will typically use many execution latency measurements in combination, for example, through analysis of when and how they differ as the result of data-dependent behavior by .
The attack strategy employed by may be assessed, where relevant, in relation to either efficacy and/or efficiency.
Although exceptions might exist, efficacy is normally a binary assessment: either the attack by is successful with respect to some stated aim or not. In contrast, there are many ways to assess efficient resource use by during said attack. Focusing on time, for example, we might view the number of interactions with , the high-level, algorithm-related attack latency (e.g. stated in terms of computational complexity) or the low-level, implementation-related attack latency (e.g. stated in terms of wall-clock time), as indicative of how efficient is.
If the interaction between and is direct, the attack is described as local; otherwise, if the interaction is indirect, the attack is described as remote.
There are other, reasonable ways to capture a similar concept, for example, using physical proximity (implying physical access) as the measure of localness. We opt for the above instead, because and may be in close physical proximity but still physically or logically isolated from each other; this would imply an indirect form of interaction. Either way, two points are important. First, although the attack model above at least suggests that and interact remotely, this is not a limitation. For example, may leverage physical access to some device in order to mount a local attack; and may be software processes (co-)resident and so executing locally on the same platform; and may be software processes resident and so executing on different platforms, interacting remotely via an intermediate network. Second, if/when a remote attack is feasible, this is often viewed as an advantage. For example, such an attack typically removes the need for physical access to , avoids any reliance on additional equipment (for example, see power- or EM-based side-channels) and allows attacks to be applied at greater scale (in the sense there are likely to be more accessible instances of to attack).
From the perspective of , computation by may be described as synchronous or asynchronous.
The synchronous case suggests that knows or even controls when starts or finishes execution; in contrast, the asynchronous case suggests that operates independently from (and, as a result, is often described as free-running). In the asynchronous case, one typically attempts to identify some form of trigger or reference event that indicates the start and/or finish of execution. Such an event might be observable via a secondary side-channel, or simply observation of when and how interacts with external parties (such as ) or resources (such as memory or peripheral devices).
The execution latency measurements taken by are characterized by their accuracy, that is, how close the measured and actual execution latencies are, and precision, that is, how close two execution latency measurements for the same computation are. The former case typically relates to systematic error, for example, due to the impact of quantization; whereas the latter case typically relates to random error, for example, due to the impact of noise.
Figure 1.2.Attack models relating to the examples presented in section 1.2
We deem any formal exploration of what noise is beyond our scope; it suffices to understand that noise will degrade accuracy and precision, in the sense that execution latency of the same computation within different interactions between and may differ (or fluctuate). The term Signal-to-Noise Ratio (SNR) is often used to capture the relative strength of the signal of interest, that is, the actual execution latency, versus any noise that causes the measured execution latency to differ. Both statistical and non-statistical techniques can typically be harnessed in an attempt to reduce or even remove noise, that is, yield higher SNR; examples include averaging and filtering. Doing so can be important, because, ultimately, SNR will typically have some impact on the efficiency and/or efficacy of most attacks.
In this section, we focus on application of the central concepts: the aim is to illustrate how timing attacks are mounted. Harnessing the foundation offered by section 1.1, we do so by considering a small set of examples each framed within a simple but plausible scenario; we structure each example in the same way by (1) defining the attack model, (2) analyzing the attack model, for example, to characterize the source of data dependence and then, finally, (3) presenting an attack based on said analysis.
Consider the attack model depicted in Figure 1.2(a): an adversary interacts with a target device that stores some embedded, security-critical data P (a password). When provided with input G (a guess at the password), first performs the computation detailed by the algorithm MATCH (that is, string matching, or, more specifically password validation) and then responds with r as output. can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering P (implying, for example, it then gains access to ).
In the same way we reasoned about bubble sort, the MATCH algorithm is as efficient as it can be: as soon as it detects a mismatch between P and G, it immediately returns false rather than needlessly continuing. Such a mismatch can occur for two reasons. First, and resulting in execution of line 3, the mismatch might stem from the length of, that is, number of characters in, P and G: if |P| ≠ |G|, then the two cannot match. Second, and resulting in execution of line 7, the mismatch might arise from the content of P and G: if Pi ≠ Gi for some i, then the two cannot match. This is termed early-abort behavior, and is possible when a special-case condition means the general-case algorithm need not be fully executed in order to provide correct output from the associated input. Viewed from the perspective of efficiency, this behavior is positive; when viewed from the perspective of security, the same behavior becomes negative; however, it means that the execution latency of MATCH is data dependent, that is, depends on P and G.
Assume, without loss of generality, that the set of valid characters is A = { 'a', 'b',...,'z'}, and, specifically, that P = "pencil", which means |P| = 6. The attack proceeds in two phases:
In this phase, the attack aims to recover |
P
|. To do so, we use a sequence of guesses:
during interaction with , that is, the ith guess is i + 1 characters in length. The execution latency for guess G = "aaaaaa" will be longer than for the others because |P| = |G|; this means execution for it will progress into the loop, rather than early-abort at line 3. As such, identifying this G recovers |P|.
In this phase, the attack aims to recover
P
iteratively, that is, character-by-character. To recover
P
i
in iteration
i
= 0, we use a sequence of guesses:
during interaction with , that is, the ith character cycles through all possibilities, while the others remain fixed. The execution latency for guess G = "paaaaa" will be longer than for the others, because |P| = |G| and P0 = G0; this means execution will progress into loop iteration i + 1 = 1, rather than early-abort at line 7 in iteration i = 0. As such, G0 = 'p' is deemed correct in this guess, which we now fix. To recover Pi in iteration i = 1, we use a sequence of guesses:
during interaction with , that is, the ith character cycles through all possibilities, while the others remain fixed. The execution latency for guess G = "peaaaa" will be longer than for the others, because |P| = |G|, P0 = G0 and P1 = G1; this means execution will progress into loop iteration i + 1 = 2, rather than early-abort at line 7 in iteration i = 1. As such, G1 = 'e' is deemed correct in this guess, which we now fix. The attack continues in the same way for 0 ≤ i < n, until P is eventually fully recovered.
Since this is an explanatory attack, it is easy to assess it in terms of both efficacy and efficiency versus intuitive alternatives. For example, we could consider a brute-force attack: by essentially trying every n-character guess, this strategy assumes knowledge of n, requires O(|A|n) guesses and guarantees recovery of P. Or, we could consider a dictionary attack: by first constructing a dictionary of common (or likely) passwords D = {"password", "qwerty", "admin",…} to use as guesses, this strategy does not assume knowledge of n, requires O(|D|) guesses and does not guarantee recovery of P. In contrast, our side-channel attack does not assume knowledge of n, requires O(|A|·n) guesses and guarantees recovery of P; in a sense, the additional information allows it to offer greater efficiency than the brute-force attack and greater efficacy than the dictionary attack.
Assuming use of AES-128 throughout, consider the attack model depicted in Figure 1.2(b): an adversary interacts with a target device , which stores some embedded, security-critical data k (an AES cipher key). When provided with input m (an AES plaintext), first performs the computation detailed by the algorithm AES.ENC (i.e. AES encryption) and then responds with c (an AES ciphertext) as output. can also measure the execution latency of computation performed by , so we attempted to exploit this information with the aim of recovering k.
AES is an iterative block cipher based on an SP network; as one of three possible parameterizations, AES-128 uses a 16 byte cipher key and b = 16 byte block size. At a low level, it operates on elements in the finite field realized as , where p(x) = x8 + x4 + x3 + x + 1. A short-hand is often adopted for elements of this field, meaning, for example, a(x) = x4 + x +1 ≡ 13(16), suggesting that each element can be represented by a byte (whose ith bit represents the ith coefficient). Elements of are collected into (4 × 4)-element state and round key matrices; the ith row and jth column of such a matrix relating to round r is denoted and respectively, with super- and/or subscripts omitted whenever irrelevant. The process of encryption initializes s(0) = m and rk(0) = k, in both cases filling the left-hand matrix in a column-wise manner using content from the right-hand sequence of bytes.
adopts an implementation strategy common in constrained platforms, targeting a balance that trades-off higher execution latency for lower memory footprint. This can be summarized as follows. First, it pre-computes the S-box: doing so implies it is realized using a table look-ups, rather than any computation. Second, it updates (or evolves) the cipher key, computing each round key during encryption rather than pre-computing them. Third, and finally, it uses the algorithm:
to compute the multiplication-by-x operation, that is, a(x) ·x (mod p(x)). On lines 3 and 5, the unconditional left-shift captures multiplication by x. On line 3, however, the conditional XOR captures a reduction modulo p(x); note that such a reduction is only required when a7 = 1. Using this algorithm, it applies:
to each jth column of the state within MIXCOLUMNS.
In summary, the execution latency of AES encryption is data dependent: this stems from the implementation strategy adopted, whereby the execution latency of xtime and therefore MIX -COLUMNS is data dependent.
The attack strategy aims to recover k iteratively, that is, byte-by-byte. Let kj and mj denote the jth byte of plaintext and cipher key, respectively. In any tth iteration of the attack, we aim to recover the target byte kt based on the observation that, in the first MIX-COLUMNS invocation, the computation:
is performed at least once; we term this the target xtime. Doing so involves the following steps, parameterized by N and M:
Compute the (256 ×
N
)-element matrix such that:
Across the rows, each 0 ≤ i < 256 captures a possible value of kt; across the columns, each 0 ≤ j < N captures a possible value of mt (meaning N = 256 will consider all possible values). As such, each captures whether or not the target xtime performs a conditional XOR for specific kt and mt.
Compute the (
N
×
M
)-element matrix:
of plaintexts. So, since all plaintexts in the ith row have the same tth byte, they all yield the same input to the target xtime. Let denote the execution latency arising from encryption of , which we measure via interaction with , and denote the mean of execution latencies across the ith row of .
The idea is that, provided N and M are large enough, will reflect the target xtime, that is, it will be larger or smaller depending on whether or not the conditional XOR occurs. Put another way, with some probability of error, indicates the value of S-BOX(i ⊕ kt)7. To recover kt, we compare each with each , for example, using the Pearson correlation coefficient: the i which yields the strongest correlation indicates the value of kt .
Consider the attack model depicted in Figure 1.2(c): an adversary interacts with a target device , which stores the embedded, security-critical data (N, d) (an RSA private key). When provided with input c (an RSA ciphertext), first performs the computation detailed by the algorithm RSA.DEC (i.e. RSA decryption) and then responds with m (an RSA plaintext) as output. can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering d.
uses the left-to-right, binary (or square-and-multiply) exponentiation algorithm. Note that all modular squaring and multiplication operations, for example, in lines 4 and 6, are based on the use of Montgomery reduction. This requires pre-computation of ω and ρ from N, such that (mod N) then denotes the Montgomery representations of some x.
The execution latency of modular exponentiation is data dependent, because the execution of line 6 is conditional on di. An execution latency measurement may therefore allow us to infer how often line 6 is executed, and hence the Hamming weight of (or number of bits equal to 1 in) d. Beyond this, we need to analyze the MONTMUL algorithm as used to support the exponentiation algorithm. The steps involved are captured as follows:
where line 2 performs an integer multiplication, line 3 performs a Montgomery reduction, and then, finally, line 4 performs a conditional subtraction to fully reduce r modulo N. From this, we conclude that the execution latency of MONTMUL is also data dependent, because the execution of line 4 is dependent on both the operands x and y and the modulus N; as used to support exponentiation, this implies dependence on c (because it forms one of the operands), and, crucially, d (because it controls when, where and how MONTMUL is invoked).
The attack strategy aims to recover d iteratively, that is, bit-by-bit, working from the most-significant toward the least-significant bit. In any tth iteration of the attack, d is partially known and partially unknown, which we can depict as follows:
That is, we already know some more-significant bits of d (noting that it must be the case that some d|d|−1 = 1; otherwise d = 0) and so aim to recover the next less significant, as yet unknown target bit dt. Let c[k] denote some kth ciphertext, randomly selected modulo N for 0 ≤ k < n; for each such c[k], we use (partial) knowledge of d to simulate a (partial) decryption as would be performed by . Doing so involves the following steps:
Start by setting
c
=
c
[
k
], then simulating lines 2 and 3, that is, computing and .
Step the simulation forward until we reach the start of loop iteration
t
(between lines 4 and 5). We can do so because
d
i
is known for
t
<
i
< |
d
|, meaning the simulation can perform exactly the same computation that would.
Step the simulation forward until we reach the end of loop iteration
t
(between lines 8 and 9). We can do so because although
d
i
is unknown for
i
=
t
, we can fork the simulation to reflect guessing both possibilities:
for the simulation fork that guesses
d
i
= 0, line 6 is not executed so we do not update ;
for the simulation fork that guesses
d
i
= 1, line 6 is executed so we update .
Step the simulation forward until we reach the middle of loop iteration
t
+ 1 (between lines 5 and 6). That is, we update and, in doing so, record whether or not the conditional subtraction, that is, line 4 of MONTMUL, is executed.
Classify
c
by placing it into set:
if the simulation fork guessed
d
i
= 0 and no conditional subtraction is executed
if the simulation fork guessed
d
i
= 0 and conditional subtraction is executed
if the simulation fork guessed
d
i
= 1 and no conditional subtraction is executed
if the simulation fork guessed
d
i
= 1 and conditional subtraction is executed
This means c is placed into exactly two sets: either or arising from the simulation fork that guessed di = 0, and either or arising from the simulation fork that guessed di = 1.
The idea is that we have intentionally constructed the sets so there should be a difference between the execution latency for the ciphertexts in versus those in , and the ciphertexts in versus those in . Under the assumption that conditional subtractions will occur more or less at random during exponentiation, ciphertexts in the latter will, on average, execute an extra conditional subtraction versus those in the former. However, only one of the guesses made by the simulation is actually correct, so that difference should only be evident in either the first or second case. Put another way, for the correct (respectively, incorrect) guess, the simulated computation will (respectively, will not) match the actual computation by ; therefore, for the correct (respectively, incorrect) guess, the number of conditional subtractions that occur during exponentiation of ciphertexts in one set will differ from (respectively, be approximately equal to) those in the other set. So, let
denote the average execution latency of ciphertexts in set , where Λ(c) denotes the execution latency for some ciphertext c: we measure this by interacting with . One of three cases will apply:
In the first (top) and second (middle) cases, the difference we expect will allow us to infer the value of dt. In the third (bottom) case, however, the difference is too close to allow confident inference of dt; this may be due to, for example, use of too small an n or the impact of noise.
Having recovered dt, we proceed with the next, (t + 1)th attack iteration and recover the next, (t + 1)th bit of d in the same way; this continues until d is eventually fully recovered.
Assuming use of AES-128 throughout, consider the attack model depicted in Figure 1.2(d): an adversary interacts with a target device , which stores the embedded, security-critical data k (some key material). When provided with input (iv, c), first performs the computation detailed by the algorithm PROCESS, then responds with r as output. obtains a ciphertext (iv*, c*), which is the encryption of some unknown plaintext m* under k. It can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering m*.
Let x [i]j denote the jth byte within the ith block of some n-block plaintext or ciphertext x; this means 0 ≤ i < n and 0 ≤ j < b, assuming a b-byte block size.
At a high level, PROCESS can be described line-by-line as follows. Line 2 decrypts the ciphertext c, parsing the result m′ as a plaintext m, some padding ρ and a tag τ. Then, line 3 checks whether ρ is valid, aborting if not; line 4 checks whether τ is valid, aborting if not; and, finally, line 5 processes m using some function f to produce r. At a low level, some further details are important: note that line 2 decrypts c using AES in Cipher Block Chaining (CBC) mode using the key k1, and line 3 assumes τ is a SHA1-based HMAC Message Authentication Code (MAC) tag and checks validity accordingly using the key k2. The padding scheme used to specify ρ is also important, but not detailed within the description of PROCESS. Although alternatives exist, assume a TLS-like scheme is used: let
denote a valid padding sequence, which contains α byte(s) equal to α and then 1 compulsory byte equal to α, which records the total padding length. Note that if m || τ is l bytes long, then ρ = ρ(b − (l + 1) mod b ).
The execution latency of PROCESS is data dependent: if t3 denotes the execution latency when early-aborts on line 3 based on ρ, t4 denotes the execution latency when early-aborts on line 4 based on τ, and t5 denotes the execution latency when executes line 5, then, irrespective of their concrete values, t3 < t4 < t5. In fact, we can focus on the ability to distinguish between these cases by defining a so-called padding oracle: for the scenario at hand, this would be something like:
Doing so can be viewed as an abstraction of the underlying mechanism used to measure execution latency, thus separating it from the attack strategy.
Use of AES in CBC mode means that will compute:
where
for 0 ≤ i < n. The resulting plaintext m′ is then parsed into the fields m, τ, and ρ, which we can depict as follows:
From this specific example, we can infer several more general facts. First, ρ will occur in m′[n − 1], that is, the last, (n − 1)th block of m′. Second, although |ρ| is unknown, we do know |ρ| ≤ b because the point of including it at all is to ensure |m′| = 0 (mod b). Third, the definition of CBC mode means that:
As such, by controlling xi (meaning either iv or c[n − 2], depending on i), we can control m′[n − 1] and hence also the ρ then checked for validity by . For example, consider a two-block ciphertext c. If we construct an input (iv, c′) where c′ = (c[0] ⊕ δ) || c[1], for some δ, that is, c′[0] = c[0] ⊕ δ and c′[1] = c[1], then will compute:
That is, will compute an m′ where m′[0] is randomized by and m′[1] is controlled by the δ selected; since m′[1] is the (n − 1)th block in m′, δ will also control ρ and hence the padding oracle result, that is,
Based on these facts, and assuming again that |c*| = 2, the attack proceeds in two phases:
In this phase, the attack strategy constructs a so-called “recover last byte(s)” oracle. First, select a random
b
-byte block
δ
and search for a value of
δ
b
−1
such that:
Since the padding of m′ is deemed to be valid, we know that one of:
is true: however, we do not know which one. So, to recover one (or more) byte(s) of m′, we need |ρ|, that is, the padding length. To recover it, start from t = 0 and alter (e.g. increment) δt. If
then t marks the initial padding byte (because we just altered it, thus invalidating the padding) and implies that |ρ| = b − t; otherwise, increment t and try again.
In this phase, the attack strategy constructs a so-called “block decryption” oracle. As mentioned above, we have already recovered the most-significant
b
−
t