Embedded Cryptography 1 - Emmanuel Prouff - E-Book

Embedded Cryptography 1 E-Book

Emmanuel Prouff

0,0
142,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Embedded Cryptography provides a comprehensive exploration of cryptographic techniques tailored for embedded systems, addressing the growing importance of security in devices such as mobile systems and IoT. The books explore the evolution of embedded cryptography since its inception in the mid-90s and cover both theoretical and practical aspects, as well as discussing the implementation of cryptographic algorithms such as AES, RSA, ECC and post-quantum algorithms.

The work is structured into three volumes, spanning forty chapters and nine parts, and is enriched with pedagogical materials and real-world case studies, designed for researchers, professionals, and students alike, offering insights into both foundational and advanced topics in the field.

Embedded Cryptography 1 is dedicated to software side-channel attacks, hardware side-channel attacks and fault injection attacks.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 680

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright Page

Preface

Part 1. Software Side-Channel Attacks

Chapter 1. Timing Attacks

1.1. Foundations

1.2. Example attacks

1.3. Example mitigations

1.4. Notes and further references

1.5. References

Chapter 2. Microarchitectural Attacks

2.1. Background

2.2. The Prime+Probe attack

2.3. The Flush+Reload attack

2.4. Attacking other microarchitectural components

2.5. Constant-time programming

2.6. Covert channels

2.7. Transient-execution attacks

2.8. Summary

2.9. Notes and further references

2.10. References

Part 2. Hardware Side-Channel Attacks

Chapter 3. Leakage and Attack Tools

3.1. Introduction

3.2. Data-dependent physical emissions

3.3. Measuring a side-channel

3.4. Leakage modeling

3.5. Notes and further references

3.6. References

Chapter 4. Supervised Attacks

4.1. General framework

4.2. Building a model

4.3. Controlling the dimensionality

4.4. Building de-synchronization-resistant models

4.5. Summary of the chapter

4.6. Notes and further references

4.7. References

Chapter 5. Unsupervised Attacks

5.1. Introduction

5.2. Distinguishers

5.3. Likelihood distinguisher

5.4. Mutual information

5.5. Correlation

5.6. A priori knowledge synthesis

5.7. Conclusion on statistical tools

5.8. Exercise solutions

5.9. Notes and further references

5.10. References

Chapter 6. Quantities to Judge Side Channel Resilience

6.1. Introduction

6.2. Metrics for comparing the effectiveness of specific attack vectors

6.3. Metrics for evaluating the leakage (somewhat) independent of a specific attack vector

6.4. Metrics for evaluating the remaining effort of an adversary

6.5. Leakage detection as a radical alternative to attack driven evaluations

6.6. Formal evaluation schemes

6.7. References

Chapter 7. Countermeasures and Advanced Attacks

7.1. Introduction

7.2. Misalignment of traces

7.3. Masking

7.4. Combination of countermeasures

7.5. To go further

7.6. References

Chapter 8. Mode-Level Side-Channel Countermeasures

8.1. Introduction

8.2. Building blocks

8.3. Security definitions

8.4. Leakage models

8.5. Constructions

8.6. Acknowledgments

8.7. Notes and further references

8.8. References

Part 3. Fault Injection Attacks

Chapter 9. An Introduction to Fault Injection Attacks

9.1. Fault injection attacks, disturbance of electronic components

9.2. Practical examples of fault injection attacks

9.3. Notes and further references

9.4. References

Chapter 10. Fault Attacks on Symmetric Cryptography

10.1. Introduction

10.2. Differential fault analysis

10.3. Automation of DFA

10.4. DFA countermeasures: general idea and taxonomy

10.5. Advanced FA

10.6. Leakage assessment in fault attacks

10.7. Chapter summary

10.8. Notes and further references

10.9. References

Chapter 11. Fault Attacks on Public-key Cryptographic Algorithms

11.1. Introduction

11.2. Preliminaries

11.3. Attacking the RSA using the Chinese remainder theorem

11.4. Attacking a modular exponentiation

11.5. Attacking the ECDSA

11.6. Other attack strategies

11.7. Countermeasures

11.8. Conclusion

11.9. Notes and further references

11.10. References

Chapter 12. Fault Countermeasures

12.1. Anatomy of a fault attack

12.2. Understanding the attacker

12.3. Taxonomy of fault countermeasures

12.4. Fault countermeasure principles

12.5. Fault countermeasure examples

12.5.1. Algorithm level countermeasures

12.6. ISA level countermeasures

12.7. RTL-level countermeasures

12.8. Circuit-level countermeasures

12.9. Design automation of fault countermeasures

12.10. Notes and further references

12.11. References

List of Authors

Index

Summary of Volume 2

Summary of Volume 3

End User License Agreement

List of Tables

Chapter 2

Table 2.1.

Typical cache parameters for many Intel Core processors

Chapter 3

Table 3.1.

Power consumption of a CMOS inverter gate observed from the V

DD

rail

Chapter 5

Table 5.1.

Synthesis of classic assumptions and properties

Table 5.2.

Summary of the characteristics of the...

Chapter 8

Table 8.1. CIML2

game

Table 8.2.

The game

Table 8.3.

Strong unpredictability with leakage in...

Chapter 9

Table 9.1.

Features of laser focusing lenses

Table 9.2.

Two instruction replay fault model....

Chapter 10

Table 10.1.

Fault template for the χ

3

S-Box

13

Table 10.2.

Template for attacking TI PRESENT (middle round). The black cells...

Table 10.3.

Summary of results

17

List of Figures

Chapter 1

Figure 1.1.

Two descriptions of bubble sort

Figure 1.2.

Attack models relating to the examples presented in section 1.2

Chapter 2

Figure 2.1.

The typical cache hierarchy of a four-core Intel processor....

Figure 2.2.

Prime+Probe on AES. The shade indicates the relative probe....

Figure 2.3.

Flush+Reload on the square-and-multiply implementation of....

Chapter 3

Figure 3.1.

Dynamic currents in a CMOS inverter gate due to

0 →....

Figure 3.2.

Current consumption in a CMOS inverter (seen from V

DD

)

Figure 3.3.

Static power measurement by stopping the clock signal CLK....

Figure 3.4.

Generic power analysis setup model....

Figure 3.5.

Probing methodologies for power analysis measure: shunt....

Figure 3.6.

Example side-channel trace: beginning of an AES

Figure 3.7.

Example side-channel trace: SNR results. Left to right:...

Figure 3.8.

Three boards described in this chapter. From the top and clock....

Chapter 4

Figure 4.1.

Supervised attack scenario

Figure 4.2.

Example of Gaussian distributions for B

= 2

, D

....

Figure 4.3.

A sketch of MLP....

Figure 4.4.

Principle of an SNR (relative scale for the y-axes)....

Figure 4.5.

Two convolutional layers. Left: W

= 2...

Figure 4.6.

A Max Pooling layer....

Figure 4.7.

Synthesis of generative and discriminative models....

Chapter 5

Figure 5.1.

Principle of an unsupervised attack

Figure 5.2.

Scenario of an unsupervised attack

Chapter 7

Figure 7.1.

Examples of side-channel traces and SNR....

Figure 7.2.

Examples of side-channel traces and SNR....

Figure 7.3.

Examples of consumption and SNR traces for...

Figure 7.4.

Examples of side-channel traces and SNR for...

Chapter 8

Figure 8.1.

Leakage-resilient PRG

Figure 8.2.

Leakage-resilient PRF

Figure 8.3.

CBC-MAC authenticates a message M

=

m

1

...

Figure 8.4.

The HBC MAC together with its tag verification operation

Figure 8.5.

The CTR mode used to encrypt a message M

=

m

1

...

Figure 8.6.

A

CPAL

secure encryption mode

Figure 8.7.

Exemplary leakage-resistant AE

Chapter 9

Figure 9.1.

Mask in black paint to reveal only the parts of the component...

Figure 9.2.

Fault injection device using light disturbances: use of camera...

Figure 9.3.

Basic internal architecture of a digital integrated circuit

Figure 9.4.

Illustration of the fault injection mechanism by violation of setup...

Figure 9.5.

Architecture of the AES encryption block

Figure 9.6.

Illustration of the overclocking fault injection mechanism

Figure 9.7.

AES-128: demonstrating the data dependency of fault injection by...

Figure 9.8.

Distribution of single-bit faults injected by increasing the clock...

Figure 9.9.

Representation of a nominal clock signal (upper part) and a clock...

Figure 9.10.

Clock glitch fault injection

Figure 9.11.

Evolution of the critical time of the AES-128 co-processor data...

Figure 9.12.

Evolution of the critical time of three data paths as a function...

Figure 9.13.

Representation and effect of a negative voltage glitch on AES...

Figure 9.14.

Representation of the theoretical (left) and real (right)...

Figure 9.15.

Joint effect of temperature and supply voltage variations...

Figure 9.16.

Injection probes EM: (left) detail of the injection probe;...

Figure 9.17.

Distribution of AES encryption block elements on the FPGA...

Figure 9.18.

Sampling faults injected by EM disturbance into AES-128...

Figure 9.19.

Open microcontroller component for laser access: acid...

Figure 9.20.

Illustration of the fault injection mechanism using...

Figure 9.21.

Presentation of the areas of an SRAM cell sensitive...

Figure 9.22.

Laser fault injection sensitivity maps of SRAM memories,...

Figure 9.23.

Sensitivity maps for laser fault injection of D flip-flops...

Figure 9.24.

General architecture of a fault injection bench

Figure 9.25.

Traditional power glitch generator using a transistor

Figure 9.26.

EM pulse fault injection devices in integrated circuits: principle...

Figure 9.27.

Laser fault injection bench

Figure 9.28.

Architecture of a laser source based on diode technology

Figure 9.29.

Laser-induced instruction skip in a microcontroller test program,...

Figure 9.30.

Trace of a microcontroller’s power consumption during start-up...

Figure 9.31.

Laser fault injection in instructions and data read from Flash mem...

Figure 9.32.

Laser-induced single and multiple instruction skips during...

Figure 9.33.

Replay and instruction skips obtained by laser illumination of...

Figure 9.34.

Resynchronization of 12 DES key loading processes

Figure 9.35.

Generation of seven flashes on the 12 DES key loading loops

Figure 9.36.

verifyPIN() PIN code verification

Figure 9.37.

byteArrayCompare()

function for comparing user...

Figure 9.38.

Assembly instructions for assigning the Boolean value auth_status...

Figure 9.39.

Instruction skip, corresponding to the replacement of the instruc...

Figure 9.40.

PIN code bypass obtained by four consecutive laser shots,...

Figure 9.41.

Illustration of a brute-force attack on the PIN verifica...

Chapter 10

Figure 10.1.

Fault propagation path in AES for a byte fault injection....

Figure 10.2.

Diagonal fault attack with the faults injected at the di...

Figure 10.3.

ExpFault: conceptual view

5

Figure 10.4.

Partial CDG of AES from ninth-round MixColumns till the...

Figure 10.5.

Taxonomy of notable FA countermeasures

Figure 10.6.

Illustrating the root cause of SIFA: (a) the possible....

Figure 10.7.

Fault propagation: (a) XOR gate; (b) AND gate. The....

Figure 10.8.

ALAFA leakage assessment test

15

Figure 10.9. t

-test scores: (a) infection countermeasure, Gierlichs....

Chapter 11

Figure 11.1.

Result of the SIFA targeting the most significant byte....

Figure 11.2.

Detective (left) and infective (right) countermeasures

Figure 11.3.

CRT-RSA with a final verification

Chapter 12

Figure 12.1.

Taxonomy of fault countermeasures along three dimensions

Guide

Cover Page

Table of Contents

Title Page

Copyright Page

Preface

Begin Reading

List of Authors

Index

Summary of Volume 2

Summary of Volume 3

Pages

iii

iv

xiii

xiv

xv

xvi

xvii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

65

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

355

356

357

358

359

360

361

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

SCIENCES

Computer Science,Field Director – Jean-Charles Pomerol

Cryptography, Data Security,Subject Head – Damien Vergnaud

Embedded Cryptography 1

Coordinated by

Emmanuel Prouff

Guénaël Renault

Matthieu Rivain

Colin O’Flynn

First published 2025 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the under mentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2025The rights of Emmanuel Prouff, Guénaël Renault, Matthieu Rivain and Colin O’Flynn to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.

Library of Congress Control Number: 2024945145

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-213-6

ERC code:PE6 Computer Science and InformaticsPE6_5 Security, privacy, cryptology, quantum cryptography

Preface

Emmanuel PROUFF1, Guénaël RENAULT2, Matthieu RIVAIN3, and Colin O’FLYNN4

1LIP6, Sorbonne Université, Paris, France

2Agence nationale de la sécurité des systèmes d’information, Paris, France

3CryptoExperts, Paris, France

4Dalhousie University and NewAE Technology Inc, Halifax, Canada

The idea for this project was born during a discussion with Damien Vergnaud. Damien had been asked to propose a series of volumes covering the different domains of modern cryptography for the SCIENCES series. He offered us the opportunity to take charge of the Embedded Cryptography books, which sounded like a great challenge to take on. In particular, we thought it was perfectly timely as the field was gaining increasing importance with the growing development of complex mobile systems and the Internet of Things.

The field of embedded cryptography, as a research domain, was born in the mid-1990s. Until that time, the evaluation of a cryptosystem and the underlying attacker model were usually agnostic of implementation aspects, whether the cryptosystem was deployed on a computer or on some embedded hardware like a smart card. Indeed, the attacker was assumed to have no other information than the final results of a computation and, possibly, the corresponding inputs. In this black box context, defining a cryptanalytic attack and evaluating resistance to it essentially consisted of finding flaws in the abstract definition of the cryptosystem.

In the 1990s, teams of researchers published the first academic results, highlighting very effective means of attack against embedded systems. These attacks were based on the observation that a system’s behavior during a computation strongly depends on the values of the data manipulated (which was previously known and exploited by intelligence services). Consequently, a device performing cryptographic computation does not behave like a black box whose inputs and outputs are the only known factors. The power consumption of the device, its electromagnetic radiation or its running time are indeed other sources that provide the observer with information on the intermediate results of the computation. Teams of researchers have also shown that it was possible to disrupt a computation using external energy sources such as lasers or electromagnetic pulses.

Among these so-called physical attacks, two main families emerge. The first gathers the (passive) side-channel attacks, including timing attacks proposed by Kocher (1996) and power analysis attacks proposed by Kocher et al. (1999), as well as the microarchitectural attacks that have considerably developed after the publication of the Spectre and Meltdown attacks in 2018 (Kocher et al. 2018). This first family of attacks focuses on the impact that the data manipulated by the system have on measurable physical quantities such as time, current consumption or energy dissipation related to state changes in memories. The second family gathers the (active) fault injection attacks, whose first principles were introduced by Boneh et al. (1997). These attacks aim to put the targeted system into an abnormal state of functioning. They consist, for example, of ensuring that certain parts of a code are not executed or that operations are replaced by others. Using attacks from either of these families, an adversary might learn sensitive information by exploiting the physical leakage or the faulted output of the system.

Since their inception, side-channel attacks and fault injection attacks, along with their countermeasures, have significantly evolved. Initially, the embedded systems industry and a limited number of academic labs responded with ad hoc countermeasures. Given the urgency of responding to the newly published attacks, these countermeasures were reasonably adequate at the time. Subsequently, the invalidation of many of these countermeasures and the increasing sophistication of attack techniques highlighted the need for a more formalized approach to security in embedded cryptography. A community was born from this observation in the late 1990s and gathered around a dedicated conference known as cryptographic hardware and embedded systems (CHES). Since then, the growth of this research domain has been very significant, resulting from the strong stake of the industrial players and the scientific interest of the open security issues. Nowadays, physical attacks involve state-of-the-art equipment capable of targeting nanoscale technologies used in the semiconductor industry. The attackers routinely use advanced statistical analyses or signal processing, while the defenders designing countermeasures call on concepts from algebra, probability theory or formal methods. More recently, and notably with the publication of the Spectre and Meltdown attacks, side-channel attacks have extended to so-called microarchitectural attacks, exploiting very common optimization techniques in modern CPUs such as out-of-order execution or speculative execution. Twenty-five years after the foundational work, there is now a large community of academic and industrial scientists dedicated to these problems. Embedded cryptography has gradually become a classic topic in cryptography and computer security, as illustrated by the increasing importance of this field in major cryptography and security conferences besides CHES, such as CRYPTO, Eurocrypt, Asiacrypt, Usenix Security, IEEE S&P or ACM CCS.

Pedagogical material

For this work, it seemed important to us to have both scientifically ambitious and pedagogical content. We indeed wanted this book to appeal not only to researchers in embedded cryptography but also to Master’s students interested in the subject and curious to take their first steps. It was also important to us that the concepts and notions developed in the book be as illustrated as possible and therefore accompanied by a pedagogical base. In addition to the numerous illustrations proposed in the chapters, we have made pedagogical material available (attack scripts, implementation examples, etc.) to test and deepen the various concepts. These can be found on the following GitHub organization: https://github.com/embeddedcryptobook.

Content

This book provides a comprehensive exploration of embedded cryptography. It comprises 40 chapters grouped into nine main parts, and spanning three volumes. The book primarily addresses side-channel and fault injection attacks as well as their countermeasures. Part 1 of this volume is dedicated to Software Side-Channel Attacks, namely, timing attacks and microarchitectural attacks, primarily affecting software; whereas Part 2 is dedicated to Hardware Side-Channel Attacks, which exploit hardware physical leakages, such as power consumption and electromagnetic emanations. Part 3 focuses on the second crucial family of physical attacks against embedded systems, namely, Fault Injection Attacks.

A full part of the book is then dedicated to Masking in Part 1 of Volume 2, which is a widely used countermeasure against side-channel attacks and which has become an important research topic since their introduction in 1999. This part covers a variety of masking techniques, their security proofs and their formal verification. Besides general masking techniques, efficient and secure embedded cryptographic implementations are very dependent on the underlying algorithm. Consequently, Part 2, Cryptographic Implementations, is dedicated to the implementation of specific cryptographic algorithm families, namely, AES, RSA, ECC, and post-quantum cryptography. This part also covers hardware acceleration and constant-time implementations. Secure embedded cryptography needs to rely on secure hardware and secure randomness generation. In cases where hardware alone is insufficient for security, we must rely on additional software techniques to protect cryptographic keys. The latter is known as white-box cryptography. The next three parts of the book address those aspects. Part 3, Volume 2, Hardware Security, covers invasive attacks, hardware countermeasures and physically unclonable functions (PUF).

Part 1 of Volume 3 is dedicated to White-Box Cryptography: it covers general concepts, practical attack tools, automatic (gray-box) attacks and countermeasures as well as code obfuscation, which is often considered as a complementary measure to white-box cryptography. Part 2 is dedicated to Randomness and Key Generation in embedded cryptography. It covers both true and pseudo randomness generation as well as randomness generation for specific cryptographic algorithms (prime numbers for RSA, random nonces for ECC signatures and random errors for post-quantum schemes).

Finally, we wanted to include concrete examples of real-world attacks against embedded cryptosystems. The final part of this series of books contains those examples of Real World Applications and Attacks in the Wild. While not exhaustive, we selected representative examples illustrating the practical exploitation of the attacks presented in this book, hence demonstrating the necessity of the science of embedded cryptography.

Acknowledgments

This series of books results from a collaborative work and many persons from the embedded cryptography community have contributed to its development. We have tried to cover (as broadly as possible) the field of embedded cryptography and the many research directions related to this field. This has not been an easy task, given the dynamism and growth of the field over the past 25 years. Some experts from the community kindly shared their insights on the preliminary plan of the book, namely, Sonia Belaïd, Pierre-Alain Fouque, Marc Joye, Victor Lomné and Yannick Sierra. We would like to thank them for their insightful comments.

For each of the identified topics we wanted the book to cover, we have called upon expert researchers from the community, who have honored us by joining this project. The essence of this work is theirs. Dear Guillaume Barbu, Lejla Batina, Sonia Belaïd, Davide Bellizia, Florent Bernard, Begül Bilgin, Eleonora Cagli, Lukasz Chmielewski, Jessy Clédière, Brice Colombier, Jean-Sébastien Coron, Jean-Luc Danger, Lauren De Meyer, Cécile Dumas, Jean-Max Dutertre, Viktor Fischer, Pierre Galissant, Benedikt Gierlichs, Louis Goubin, Vincent Grosso, Sylvain Guilley, Patrick Haddad, Laurent Imbert, Ján Jančár, Marc Joye, Matthias Kannwischer, Stefan Katzenbeisser, Victor Lomné, José Lopes-Esteves, Ange Martinelli, Pedro Massolino, Loïc Masure, Nele Mentens, Debdeep Mukhopadhyay, Camille Mutschler, Ruben Niederhagen, Colin O’Flynn, Elisabeth Oswald, Dan Page, Pascal Paillier, Louisa Papachristodoulou, Thomas Peeters, Olivier Peirera, Thomas Pornin, Bart Preneel, Thomas Prest, Jean-René Reinhard, Thomas Roche, Francisco Rodríguez-Henríquez, Franck Rondepierre, Eyal Ronen, Melissa Rossi, Mylene Rousselet, Sylvain Ruhault, Ulrich Ruhrmair, Sayandeep Saha, Patrick Schaumont, Sebastian Schrittwieser, Peter Schwabe, Richa Singh, Sergei Skorobogatov, François-Xavier Standaert, Petr Svenda, Marek Sys, Akira Takahashi, Abdul Rahman Taleb, Yannick Teglia, Philippe Teuwen, Adrian Thillard, Medhi Tibouchi, Mike Tunstall, Aleksei Udovenko, David Vigilant, Lennert Wouters, Yuval Yarom and Rina Zeitoun: thank you so much for the hard work!

To accompany these authors, we also relied on numerous reviewers who kindly shared their remarks on the preliminary versions of the chapters. Their behind-the-scenes work allowed us to greatly improve the technical and editorial quality of the books. We express our gratitude to them, namely, Davide Alessio, Sébastien Bardin, Sonia Belaïd, Eloi Benoist-Vanderbeken, Gaëtan Cassiers, Jean-Sebastien Coron, Debayan Das, Cécile Dumas, Julien Eynard, Wieland Fischer, Thomas Fuhr, Daniel Genkin, Dahmun Goudarzi, Eliane Jaulmes, Victor Lomné, Loïc Masure, Bart Mennink, Stjepan Picek, Thomas Pornin, Thomas Prest, Jurgen Pulkus, Michaël Quisquater, Thomas Roche, Franck Rondepierre, Franck Salvador, Tobias Schneider, Okan Seker, Pierre-Yves Strub, Akira Takahashi, Abdul Rahman Taleb, Mehdi Tibouchi, Aleksei Udovenko, Gilles Van Assche, Damien Vergnaud, Vincent Verneuil and Gabriel Zaid.

October 2024

References

Boneh, D., DeMillo, R.A., Lipton, R.J. (1997). On the importance of checking cryptographic protocols for faults. In

Proceedings of Eurocrypt’97

. Springer, Heidelberg, 1233, 37–51.

Kocher, P.C. (1996). Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In

Proceedings of CRYPTO’96

. Springer-Verlag, Heidelberg, 104–113.

Kocher, P.C., Jaffe, J., Jun, B. (1999). Differential power analysis. In

Proceedings of CRYPTO’99

. Springer-Verlag, Heidelberg, 1666, 388–397.

Kocher, P.C., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T., Schwarz, M., Yarom, Y. (2018). Spectre attacks. Exploiting speculative execution.

ArXiv

[Online]. Available at:

http://arxiv.org/abs/1801.01203

.

PART 1Software Side-Channel Attacks

1Timing Attacks

Daniel PAGE

University of Bristol, United Kingdom

Set within the context of cryptographic engineering, timing attacks are a class of side-channel attack: if the execution latency of some target operation depends on security-critical data, this may allow an attacker to leverage passive execution latency measurements in any attempt to recover said data. Timing attacks have a rich history, and, in part because variation in execution latency can arise from a wide range of sources, remain surprisingly stubborn. Although a known problem, they remain far from a solved problem as evidenced by related instances continuing to appear more than 20 years since initial publications on the topic. The aim of this chapter is to offer an introduction to that topic, covering both a set of foundational concepts from a more theoretical perspective and a set of example attacks and their mitigation (via countermeasures) from a more applied perspective. As a result, the content should enable you to understand and apply the concepts covered in a broader context, for example, within your own work, and beyond the specific, limited set of examples used.

1.1. Foundations

In this section, we focus on explanation of the central concepts. The aim, in fairly non-technical terms, is to first explain what a timing attack is, and then establish a terminology and model for characterizing and reasoning about them. Doing so offers a foundation, which we will then build on in section 1.2 through an investigation of more concrete, example attacks.

Figure 1.1.Two descriptions of bubble sort

1.1.1. Execution latency in theory

DEFINITION 1.1.–

Given a computational process P, the term execution latency is a measure of the delay between P starting and finishing execution (whether it completes normally or is terminated abnormally).

We intentionally leave the unit of measurement undefined. That is, within different contexts it might be appropriate to use different units, for example, an abstract unit such as computational steps, a concrete, human-oriented unit such as seconds, or a concrete, machine-oriented unit such as clock cycles. If the unit relates to time in a meaningful way, then it is common to see execution time used synonymously with execution latency: both terms capture the idea that we measure how much time elapses between starting and finishing execution.

The bubble sort algorithm is conceptually simple, so is an appealing, generic example to use as a vehicle to explore this concept further. The algorithm, described in Figure 1.1(a) using pseudo-code, repeatedly iterates over an n-element input array X, comparing adjacent elements and swapping them in-place if they occur in the wrong order; this is repeated until the array is sorted, which we know is true when no more swaps are required.

So, given an input array, what is the execution latency of this algorithm? That is, how much time elapses between starting (using X and n as input) and finishing execution (meaning a sorted X is produced as output) of the algorithm? Analysis of computational complexity (or, more specifically, time complexity) offers one approach to providing an answer. Doing so gives a result in terms of abstract computational steps, focused on the problem size n. By using such analysis, we conclude that in the worst case and average case, the algorithm will perform O(n2) comparison operations and O(n2) swap operations; in the best case, the algorithm will perform O(n) comparison operations and O(1) swap operations.

DEFINITION 1.2.–

The execution latency of a computational process P is termed data dependent if it depends on, and so may vary as a result of, the input data. Otherwise, it is termed data independent: this implies that the execution latency is the same, irrespective of the input data for every execution of P.

Two facts should be clear from our analysis of bubble sort. First, and more obviously, the length of X impacts the execution latency: in all cases, the number of steps will be greater for an X with many elements (i.e. larger n) than it will for an X with few elements (i.e. smaller n). Second, and less obviously, however, the content of X also impacts the execution latency: as suggested by the differing best and worse cases, the number of steps will be less for an X which is already (close to being) sorted than it will for an X, which is not. Either way, execution latency of the bubble sort algorithm would therefore be classified as data dependent.

1.1.2. Execution latency in practice

DEFINITION 1.3.–

In concrete terms, data-dependent execution latency arises from (at least) the following cases: (1) which operations are executed, as determined by control-flow decisions made during execution; and (2) how operations are executed, or, put another way, what the execution latency of each executed operation is and whether or not it is data dependent.

Although instances of the first case can be identified within an abstract algorithm, instances of the second case are dependent on concrete details of an associated implementation and platform it is executed on. Conceptually at least, we could view this shift as including rather than (intentionally) excluding constant factors in analysis using computational complexity. Doing so is clearly imperative, because we ultimately aim to reason about concrete, real-world versus abstract, on-paper scenarios.

We can identify instances of both cases using Figure 1.1(b), which describes an implementation of bubble sort using C source code. For example, the if statement on line 8 represents an example of the first case: the flow of control is managed so lines 9–13 are either executed or not, depending on whether or not X[ i - 1 ] > X [ i ]. Likewise, the memory access implied by the expressions X[ i - 1 ] and X[ i ] on line 8 represents an example of the second case: the latencies of said accesses depend, for example, on the memory hierarchy (e.g. the existence and state of any caches that exist within it). More generally, the platform will optimize instruction execution; it may do so for both memory access and computational instructions (for example, although this implementation does not include any, shift, multiplication, and division instructions sometimes exhibit data-dependant execution latency). An implication of this fact is that robust analysis of data (in)dependence cannot arise from the (static) source code alone. The Instruction Set Architecture (ISA) said source code ultimate targets will typically only capture functional properties: how instructions will be (dynamically) executed, and therefore the behavioral properties of the platform (i.e. a specific micro-architectural implementation of that ISA), also need to be assessed.

1.1.3. Attacks that exploit data-dependent execution latency

In a sense, bubble sort is as efficient as it can be: it only performs the steps required to produce a correct output from an associated input by exiting the outer-loop as soon as no more swaps are required. Adopting a perspective where efficiency is the metric of interest, such an approach is clearly positive. In fact, do any negatives stem from it? If we adopt a different perspective where security is the (or at least a) metric of interest, the answer is yes.

Put simply, the fact that execution latency is data dependent implies some form of correlation: measuring the execution latency will provide information, of some form, about the data it depends on. Set in some scenario where an adversary has access to execution latency measurements, this can be hugely problematic. Specifically, consider a scenario where the data involved are security critical, for example, but without loss of generality, some cryptographic key material. Using bubble sort as an example starts to become tenuous at this point, so instead consider the following generic attack model:

which helps fix various concepts and notation. is using to perform some computation f. Having first accepted an input x from , computes and then provides r = f(k, x) as output. can also measure the execution latency associated with computation by , which we denote Λ. On the one hand, k is not and should not be known by . On the other hand, however, Λ may provide information about k if the execution latency of f is data dependent, and therefore may potentially be exploited in an attack aiming to recover it. This is the crux of a side-channel attack based on execution latency, or, less formally, a timing attack. We say that information about the security-critical data is leaked (or is leakage) through a side-channel represented by execution latency. Such attacks are potent, because, as a result of exploiting the additional information afforded by the side-channel, they can often be significantly more efficient than alternatives (e.g. traditional cryptanalysis).

Now focused on the topic of side-channel attacks, we need to address two important points regarding terminology. First, in relation to data (in)dependence, the term data are qualified by the fact that we almost always mean security-critical data. There is limited value in learning information about data we already know, for example, and so no implication for execution latency depending on it. So, from here on, read “data” as “security-critical data” unless otherwise stated. Second, it is very common to see the term constant-time (respectively, variable-time) used as a synonym for data-independent (respectively, data-dependent) execution latency in this context (already ignoring the clash vs. O(1) in computational complexity). We avoid these terms, deeming data (in)dependence more reflective of what is actually intended. For example, the execution latency of a given algorithm may vary and need not reflect any specific constant: we only demand that it does not vary, and so is (some) constant with respect to security-critical data involved. Likewise, use of the term constant-time suggests execution time (or latency) is the only pertinent issue; in reality, and more generally, we need to address the challenge of data-dependent resource use (both in terms of which resources are used, and the length of time they are used for). We can find examples of this fact investigated in Chapter 2.

Having presented the high-level concept, we now refine various lower level details of our attack model. We do so in a piecemeal manner below, framing the discussion around a set of concise definitions which, in combination, help to explain and characterize specific attacks (such as those used as examples in section 1.2).

DEFINITION 1.4.–

The attack strategy employed by may be described as differential or non-differential.

At a high level, a non-differential attack will typically use few (even just one) execution latency measurements in isolation. In contrast, a differential attack will typically use many execution latency measurements in combination, for example, through analysis of when and how they differ as the result of data-dependent behavior by .

DEFINITION 1.5.–

The attack strategy employed by may be assessed, where relevant, in relation to either efficacy and/or efficiency.

Although exceptions might exist, efficacy is normally a binary assessment: either the attack by is successful with respect to some stated aim or not. In contrast, there are many ways to assess efficient resource use by during said attack. Focusing on time, for example, we might view the number of interactions with , the high-level, algorithm-related attack latency (e.g. stated in terms of computational complexity) or the low-level, implementation-related attack latency (e.g. stated in terms of wall-clock time), as indicative of how efficient is.

DEFINITION 1.6.–

If the interaction between and is direct, the attack is described as local; otherwise, if the interaction is indirect, the attack is described as remote.

There are other, reasonable ways to capture a similar concept, for example, using physical proximity (implying physical access) as the measure of localness. We opt for the above instead, because and may be in close physical proximity but still physically or logically isolated from each other; this would imply an indirect form of interaction. Either way, two points are important. First, although the attack model above at least suggests that and interact remotely, this is not a limitation. For example, may leverage physical access to some device in order to mount a local attack; and may be software processes (co-)resident and so executing locally on the same platform; and may be software processes resident and so executing on different platforms, interacting remotely via an intermediate network. Second, if/when a remote attack is feasible, this is often viewed as an advantage. For example, such an attack typically removes the need for physical access to , avoids any reliance on additional equipment (for example, see power- or EM-based side-channels) and allows attacks to be applied at greater scale (in the sense there are likely to be more accessible instances of to attack).

DEFINITION 1.7.–

From the perspective of , computation by may be described as synchronous or asynchronous.

The synchronous case suggests that knows or even controls when starts or finishes execution; in contrast, the asynchronous case suggests that operates independently from (and, as a result, is often described as free-running). In the asynchronous case, one typically attempts to identify some form of trigger or reference event that indicates the start and/or finish of execution. Such an event might be observable via a secondary side-channel, or simply observation of when and how interacts with external parties (such as ) or resources (such as memory or peripheral devices).

DEFINITION 1.8.–

The execution latency measurements taken by are characterized by their accuracy, that is, how close the measured and actual execution latencies are, and precision, that is, how close two execution latency measurements for the same computation are. The former case typically relates to systematic error, for example, due to the impact of quantization; whereas the latter case typically relates to random error, for example, due to the impact of noise.

Figure 1.2.Attack models relating to the examples presented in section 1.2

We deem any formal exploration of what noise is beyond our scope; it suffices to understand that noise will degrade accuracy and precision, in the sense that execution latency of the same computation within different interactions between and may differ (or fluctuate). The term Signal-to-Noise Ratio (SNR) is often used to capture the relative strength of the signal of interest, that is, the actual execution latency, versus any noise that causes the measured execution latency to differ. Both statistical and non-statistical techniques can typically be harnessed in an attempt to reduce or even remove noise, that is, yield higher SNR; examples include averaging and filtering. Doing so can be important, because, ultimately, SNR will typically have some impact on the efficiency and/or efficacy of most attacks.

1.2. Example attacks

In this section, we focus on application of the central concepts: the aim is to illustrate how timing attacks are mounted. Harnessing the foundation offered by section 1.1, we do so by considering a small set of examples each framed within a simple but plausible scenario; we structure each example in the same way by (1) defining the attack model, (2) analyzing the attack model, for example, to characterize the source of data dependence and then, finally, (3) presenting an attack based on said analysis.

1.2.1. Example 1.1: an explanatory attack on password validation

1.2.1.1. Scenario

Consider the attack model depicted in Figure 1.2(a): an adversary interacts with a target device that stores some embedded, security-critical data P (a password). When provided with input G (a guess at the password), first performs the computation detailed by the algorithm MATCH (that is, string matching, or, more specifically password validation) and then responds with r as output. can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering P (implying, for example, it then gains access to ).

1.2.1.2. Analysis

In the same way we reasoned about bubble sort, the MATCH algorithm is as efficient as it can be: as soon as it detects a mismatch between P and G, it immediately returns false rather than needlessly continuing. Such a mismatch can occur for two reasons. First, and resulting in execution of line 3, the mismatch might stem from the length of, that is, number of characters in, P and G: if |P| ≠ |G|, then the two cannot match. Second, and resulting in execution of line 7, the mismatch might arise from the content of P and G: if Pi ≠ Gi for some i, then the two cannot match. This is termed early-abort behavior, and is possible when a special-case condition means the general-case algorithm need not be fully executed in order to provide correct output from the associated input. Viewed from the perspective of efficiency, this behavior is positive; when viewed from the perspective of security, the same behavior becomes negative; however, it means that the execution latency of MATCH is data dependent, that is, depends on P and G.

1.2.1.3. Exploitation

Assume, without loss of generality, that the set of valid characters is A = { 'a', 'b',...,'z'}, and, specifically, that P = "pencil", which means |P| = 6. The attack proceeds in two phases:

In this phase, the attack aims to recover |

P

|. To do so, we use a sequence of guesses:

during interaction with , that is, the ith guess is i + 1 characters in length. The execution latency for guess G = "aaaaaa" will be longer than for the others because |P| = |G|; this means execution for it will progress into the loop, rather than early-abort at line 3. As such, identifying this G recovers |P|.

In this phase, the attack aims to recover

P

iteratively, that is, character-by-character. To recover

P

i

in iteration

i

= 0, we use a sequence of guesses:

during interaction with , that is, the ith character cycles through all possibilities, while the others remain fixed. The execution latency for guess G = "paaaaa" will be longer than for the others, because |P| = |G| and P0 = G0; this means execution will progress into loop iteration i + 1 = 1, rather than early-abort at line 7 in iteration i = 0. As such, G0 = 'p' is deemed correct in this guess, which we now fix. To recover Pi in iteration i = 1, we use a sequence of guesses:

during interaction with , that is, the ith character cycles through all possibilities, while the others remain fixed. The execution latency for guess G = "peaaaa" will be longer than for the others, because |P| = |G|, P0 = G0 and P1 = G1; this means execution will progress into loop iteration i + 1 = 2, rather than early-abort at line 7 in iteration i = 1. As such, G1 = 'e' is deemed correct in this guess, which we now fix. The attack continues in the same way for 0 ≤ i < n, until P is eventually fully recovered.

Since this is an explanatory attack, it is easy to assess it in terms of both efficacy and efficiency versus intuitive alternatives. For example, we could consider a brute-force attack: by essentially trying every n-character guess, this strategy assumes knowledge of n, requires O(|A|n) guesses and guarantees recovery of P. Or, we could consider a dictionary attack: by first constructing a dictionary of common (or likely) passwords D = {"password", "qwerty", "admin",…} to use as guesses, this strategy does not assume knowledge of n, requires O(|D|) guesses and does not guarantee recovery of P. In contrast, our side-channel attack does not assume knowledge of n, requires O(|A|·n) guesses and guarantees recovery of P; in a sense, the additional information allows it to offer greater efficiency than the brute-force attack and greater efficacy than the dictionary attack.

1.2.2. Example 1.2: an attack onxtime-based AES

1.2.2.1. Scenario

Assuming use of AES-128 throughout, consider the attack model depicted in Figure 1.2(b): an adversary interacts with a target device , which stores some embedded, security-critical data k (an AES cipher key). When provided with input m (an AES plaintext), first performs the computation detailed by the algorithm AES.ENC (i.e. AES encryption) and then responds with c (an AES ciphertext) as output. can also measure the execution latency of computation performed by , so we attempted to exploit this information with the aim of recovering k.

1.2.2.2. Analysis

AES is an iterative block cipher based on an SP network; as one of three possible parameterizations, AES-128 uses a 16 byte cipher key and b = 16 byte block size. At a low level, it operates on elements in the finite field realized as , where p(x) = x8 + x4 + x3 + x + 1. A short-hand is often adopted for elements of this field, meaning, for example, a(x) = x4 + x +1 ≡ 13(16), suggesting that each element can be represented by a byte (whose ith bit represents the ith coefficient). Elements of are collected into (4 × 4)-element state and round key matrices; the ith row and jth column of such a matrix relating to round r is denoted and respectively, with super- and/or subscripts omitted whenever irrelevant. The process of encryption initializes s(0) = m and rk(0) = k, in both cases filling the left-hand matrix in a column-wise manner using content from the right-hand sequence of bytes.

adopts an implementation strategy common in constrained platforms, targeting a balance that trades-off higher execution latency for lower memory footprint. This can be summarized as follows. First, it pre-computes the S-box: doing so implies it is realized using a table look-ups, rather than any computation. Second, it updates (or evolves) the cipher key, computing each round key during encryption rather than pre-computing them. Third, and finally, it uses the algorithm:

to compute the multiplication-by-x operation, that is, a(x) ·x (mod p(x)). On lines 3 and 5, the unconditional left-shift captures multiplication by x. On line 3, however, the conditional XOR captures a reduction modulo p(x); note that such a reduction is only required when a7 = 1. Using this algorithm, it applies:

to each jth column of the state within MIXCOLUMNS.

In summary, the execution latency of AES encryption is data dependent: this stems from the implementation strategy adopted, whereby the execution latency of xtime and therefore MIX -COLUMNS is data dependent.

1.2.2.3. Exploitation

The attack strategy aims to recover k iteratively, that is, byte-by-byte. Let kj and mj denote the jth byte of plaintext and cipher key, respectively. In any tth iteration of the attack, we aim to recover the target byte kt based on the observation that, in the first MIX-COLUMNS invocation, the computation:

is performed at least once; we term this the target xtime. Doing so involves the following steps, parameterized by N and M:

Compute the (256 ×

N

)-element matrix such that:

Across the rows, each 0 ≤ i < 256 captures a possible value of kt; across the columns, each 0 ≤ j < N captures a possible value of mt (meaning N = 256 will consider all possible values). As such, each captures whether or not the target xtime performs a conditional XOR for specific kt and mt.

Compute the (

N

×

M

)-element matrix:

of plaintexts. So, since all plaintexts in the ith row have the same tth byte, they all yield the same input to the target xtime. Let denote the execution latency arising from encryption of , which we measure via interaction with , and denote the mean of execution latencies across the ith row of .

The idea is that, provided N and M are large enough, will reflect the target xtime, that is, it will be larger or smaller depending on whether or not the conditional XOR occurs. Put another way, with some probability of error, indicates the value of S-BOX(i ⊕ kt)7. To recover kt, we compare each with each , for example, using the Pearson correlation coefficient: the i which yields the strongest correlation indicates the value of kt .

1.2.3. Example 1.3: an attack on Montgomery-based RSA

1.2.3.1. Scenario

Consider the attack model depicted in Figure 1.2(c): an adversary interacts with a target device , which stores the embedded, security-critical data (N, d) (an RSA private key). When provided with input c (an RSA ciphertext), first performs the computation detailed by the algorithm RSA.DEC (i.e. RSA decryption) and then responds with m (an RSA plaintext) as output. can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering d.

1.2.3.2. Analysis

uses the left-to-right, binary (or square-and-multiply) exponentiation algorithm. Note that all modular squaring and multiplication operations, for example, in lines 4 and 6, are based on the use of Montgomery reduction. This requires pre-computation of ω and ρ from N, such that (mod N) then denotes the Montgomery representations of some x.

The execution latency of modular exponentiation is data dependent, because the execution of line 6 is conditional on di. An execution latency measurement may therefore allow us to infer how often line 6 is executed, and hence the Hamming weight of (or number of bits equal to 1 in) d. Beyond this, we need to analyze the MONTMUL algorithm as used to support the exponentiation algorithm. The steps involved are captured as follows:

where line 2 performs an integer multiplication, line 3 performs a Montgomery reduction, and then, finally, line 4 performs a conditional subtraction to fully reduce r modulo N. From this, we conclude that the execution latency of MONTMUL is also data dependent, because the execution of line 4 is dependent on both the operands x and y and the modulus N; as used to support exponentiation, this implies dependence on c (because it forms one of the operands), and, crucially, d (because it controls when, where and how MONTMUL is invoked).

1.2.3.3. Exploitation

The attack strategy aims to recover d iteratively, that is, bit-by-bit, working from the most-significant toward the least-significant bit. In any tth iteration of the attack, d is partially known and partially unknown, which we can depict as follows:

That is, we already know some more-significant bits of d (noting that it must be the case that some d|d|−1 = 1; otherwise d = 0) and so aim to recover the next less significant, as yet unknown target bit dt. Let c[k] denote some kth ciphertext, randomly selected modulo N for 0 ≤ k < n; for each such c[k], we use (partial) knowledge of d to simulate a (partial) decryption as would be performed by . Doing so involves the following steps:

Start by setting

c

=

c

[

k

], then simulating lines 2 and 3, that is, computing and .

Step the simulation forward until we reach the start of loop iteration

t

(between lines 4 and 5). We can do so because

d

i

is known for

t

<

i

< |

d

|, meaning the simulation can perform exactly the same computation that would.

Step the simulation forward until we reach the end of loop iteration

t

(between lines 8 and 9). We can do so because although

d

i

is unknown for

i

=

t

, we can fork the simulation to reflect guessing both possibilities:

for the simulation fork that guesses

d

i

= 0, line 6 is not executed so we do not update ;

for the simulation fork that guesses

d

i

= 1, line 6 is executed so we update .

Step the simulation forward until we reach the middle of loop iteration

t

+ 1 (between lines 5 and 6). That is, we update and, in doing so, record whether or not the conditional subtraction, that is, line 4 of MONTMUL, is executed.

Classify

c

by placing it into set:

if the simulation fork guessed

d

i

= 0 and no conditional subtraction is executed

if the simulation fork guessed

d

i

= 0 and conditional subtraction is executed

if the simulation fork guessed

d

i

= 1 and no conditional subtraction is executed

if the simulation fork guessed

d

i

= 1 and conditional subtraction is executed

This means c is placed into exactly two sets: either or arising from the simulation fork that guessed di = 0, and either or arising from the simulation fork that guessed di = 1.

The idea is that we have intentionally constructed the sets so there should be a difference between the execution latency for the ciphertexts in versus those in , and the ciphertexts in versus those in . Under the assumption that conditional subtractions will occur more or less at random during exponentiation, ciphertexts in the latter will, on average, execute an extra conditional subtraction versus those in the former. However, only one of the guesses made by the simulation is actually correct, so that difference should only be evident in either the first or second case. Put another way, for the correct (respectively, incorrect) guess, the simulated computation will (respectively, will not) match the actual computation by ; therefore, for the correct (respectively, incorrect) guess, the number of conditional subtractions that occur during exponentiation of ciphertexts in one set will differ from (respectively, be approximately equal to) those in the other set. So, let

denote the average execution latency of ciphertexts in set , where Λ(c) denotes the execution latency for some ciphertext c: we measure this by interacting with . One of three cases will apply:

In the first (top) and second (middle) cases, the difference we expect will allow us to infer the value of dt. In the third (bottom) case, however, the difference is too close to allow confident inference of dt; this may be due to, for example, use of too small an n or the impact of noise.

Having recovered dt, we proceed with the next, (t + 1)th attack iteration and recover the next, (t + 1)th bit of d in the same way; this continues until d is eventually fully recovered.

1.2.4. Example 1.4: a padding oracle attack on AES-CBC

1.2.4.1. Scenario

Assuming use of AES-128 throughout, consider the attack model depicted in Figure 1.2(d): an adversary interacts with a target device , which stores the embedded, security-critical data k (some key material). When provided with input (iv, c), first performs the computation detailed by the algorithm PROCESS, then responds with r as output. obtains a ciphertext (iv*, c*), which is the encryption of some unknown plaintext m* under k. It can also measure the execution latency of computation performed by , so attempts to exploit this information with the aim of recovering m*.

1.2.4.2. Analysis

Let x [i]j denote the jth byte within the ith block of some n-block plaintext or ciphertext x; this means 0 ≤ i < n and 0 ≤ j < b, assuming a b-byte block size.

At a high level, PROCESS can be described line-by-line as follows. Line 2 decrypts the ciphertext c, parsing the result m′ as a plaintext m, some padding ρ and a tag τ. Then, line 3 checks whether ρ is valid, aborting if not; line 4 checks whether τ is valid, aborting if not; and, finally, line 5 processes m using some function f to produce r. At a low level, some further details are important: note that line 2 decrypts c using AES in Cipher Block Chaining (CBC) mode using the key k1, and line 3 assumes τ is a SHA1-based HMAC Message Authentication Code (MAC) tag and checks validity accordingly using the key k2. The padding scheme used to specify ρ is also important, but not detailed within the description of PROCESS. Although alternatives exist, assume a TLS-like scheme is used: let

denote a valid padding sequence, which contains α byte(s) equal to α and then 1 compulsory byte equal to α, which records the total padding length. Note that if m || τ is l bytes long, then ρ = ρ(b − (l + 1) mod b ).

The execution latency of PROCESS is data dependent: if t3 denotes the execution latency when early-aborts on line 3 based on ρ, t4 denotes the execution latency when early-aborts on line 4 based on τ, and t5 denotes the execution latency when executes line 5, then, irrespective of their concrete values, t3 < t4 < t5. In fact, we can focus on the ability to distinguish between these cases by defining a so-called padding oracle: for the scenario at hand, this would be something like:

Doing so can be viewed as an abstraction of the underlying mechanism used to measure execution latency, thus separating it from the attack strategy.

1.2.4.3. Exploitation

Use of AES in CBC mode means that will compute:

where

for 0 ≤ i < n. The resulting plaintext m′ is then parsed into the fields m, τ, and ρ, which we can depict as follows:

From this specific example, we can infer several more general facts. First, ρ will occur in m′[n − 1], that is, the last, (n − 1)th block of m′. Second, although |ρ| is unknown, we do know |ρ| ≤ b because the point of including it at all is to ensure |m′| = 0 (mod b). Third, the definition of CBC mode means that:

As such, by controlling xi (meaning either iv or c[n − 2], depending on i), we can control m′[n − 1] and hence also the ρ then checked for validity by . For example, consider a two-block ciphertext c. If we construct an input (iv, c′) where c′ = (c[0] ⊕ δ) || c[1], for some δ, that is, c′[0] = c[0] ⊕ δ and c′[1] = c[1], then will compute:

That is, will compute an m′ where m′[0] is randomized by and m′[1] is controlled by the δ selected; since m′[1] is the (n − 1)th block in m′, δ will also control ρ and hence the padding oracle result, that is,

Based on these facts, and assuming again that |c*| = 2, the attack proceeds in two phases:

In this phase, the attack strategy constructs a so-called “recover last byte(s)” oracle. First, select a random

b

-byte block

δ

and search for a value of

δ

b

−1

such that:

Since the padding of m′ is deemed to be valid, we know that one of:

is true: however, we do not know which one. So, to recover one (or more) byte(s) of m′, we need |ρ|, that is, the padding length. To recover it, start from t = 0 and alter (e.g. increment) δt. If

then t marks the initial padding byte (because we just altered it, thus invalidating the padding) and implies that |ρ| = b − t; otherwise, increment t and try again.

In this phase, the attack strategy constructs a so-called “block decryption” oracle. As mentioned above, we have already recovered the most-significant

b

t