Reinforcement Learning for Cyber Operations - Abdul Rahman - E-Book

Reinforcement Learning for Cyber Operations E-Book

Abdul Rahman

0,0
110,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A comprehensive and up-to-date application of reinforcement learning concepts to offensive and defensive cybersecurity

In Reinforcement Learning for Cyber Operations: Applications of Artificial Intelligence for Penetration Testing, a team of distinguished researchers delivers an incisive and practical discussion of reinforcement learning (RL) in cybersecurity that combines intelligence preparation for battle (IPB) concepts with multi-agent techniques. The authors explain how to conduct path analyses within networks, how to use sensor placement to increase the visibility of adversarial tactics and increase cyber defender efficacy, and how to improve your organization’s cyber posture with RL and illuminate the most probable adversarial attack paths in your networks.

Containing entirely original research, this book outlines findings and real-world scenarios that have been modeled and tested against custom generated networks, simulated networks, and data. You’ll also find:

  • A thorough introduction to modeling actions within post-exploitation cybersecurity events, including Markov Decision Processes employing warm-up phases and penalty scaling
  • Comprehensive explorations of penetration testing automation, including how RL is trained and tested over a standard attack graph construct
  • Practical discussions of both red and blue team objectives in their efforts to exploit and defend networks, respectively
  • Complete treatment of how reinforcement learning can be applied to real-world cybersecurity operational scenarios

Perfect for practitioners working in cybersecurity, including cyber defenders and planners, network administrators, and information security professionals, Reinforcement Learning for Cyber Operations: Applications of Artificial Intelligence for Penetration Testing will also benefit computer science researchers.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 476

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright

Dedication

List of Figures

About the Authors

Foreword

Preface

Acknowledgments

Acronyms

Introduction

References

1 Motivation

1.1 Introduction

1.2 Attack Graphs

1.3 Cyber Terrain

1.4 Penetration Testing

1.5 AI Reinforcement Learning Overview

1.6 Organization of the Book

References

2 Overview of Penetration Testing

2.1 Penetration Testing

2.2 Importance of Data

2.3 Conclusion

References

3 Reinforcement Learning: Theory and Application

3.1 An Introduction to Reinforcement Learning (RL)

3.2 RL and Markov Decision Processes

3.3 Learnable Functions for Agents

3.4 Enter Deep Learning

3.5 Q-Learning and Deep Q-Learning

3.6 Advantage Actor-Critic (A2C)

3.7 Proximal Policy Optimization

3.8 Conclusion

References

4 Motivation for Model-driven Penetration Testing

4.1 Introduction

4.2 Limits of Modern Attack Graphs

4.3 RL for Penetration Testing

4.4 Modeling MDPs

4.5 Conclusion

References

5 Operationalizing RL for Cyber Operations

5.1 A High-Level Architecture

5.2 Layered Reference Model

5.3 Key Challenges for Operationalizing RL

5.4 Conclusions

References

6 Toward Practical RL for Pen-Testing

6.1 Current Challenges to Practicality

6.2 Practical Scalability in RL

6.3 Model Realism

6.4 Examples of Applications

6.5 Realism and Scale

References

7 Putting it Into Practice: RL for Scalable Penetration Testing

7.1 Crown Jewels Analysis

7.2 Discovering Exfiltration Paths

7.3 Discovering Command and Control Channels

7.4 Exposing Surveillance Detection Routes

7.5 Enhanced Exfiltration Path Analysis

References

8 Using and Extending These Models

8.1 Supplementing Penetration Testing

8.2 Risk Scoring

8.3 Further Modeling

8.4 Generalization

References

9 Model-driven Penetration Testing in Practice

9.1 Recap

9.2 The Case for Model-driven Cyber Detections

References

Notes

A Appendix

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1 Levels of testing compared to speed accuracy and depth.

Chapter 4

Table 4.1 Attack graph generation methods.

Table 4.2 Penetration Testing Using RL with Attack Graphs. From ref. [14], 2...

Chapter 5

Table 5.1 Table of classification of elements in LRM-RAG.

Chapter 7

Table 7.1 Experimental results from CJA RL.

Table 7.2 Experimental results for exfiltration path discovers.

List of Illustrations

Chapter 1

Figure 1.1 MITRE ATTŒCK framework identifying the...

Figure 1.2 Mapping of physical network into an attack graph.

Chapter 2

Figure 2.1 Red, blue, and purple teams working together to secure the networ...

Figure 2.2 Red, blue, and purple team in various industries.

Figure 2.3 Machine-based penetration testing assistance workflow.

Figure 2.4 Pentesting lifecycle and where machine learning can assist.

Figure 2.5 Top used ports on the Internet.

Figure 2.6 Exploitation paths.

Figure 2.7 Ports responding to scans, revealing their state.

Figure 2.8 Screen of all Nessus reporting options (file/data formats).

Figure 2.9 Nessus normal text report format example screenshot.

Figure 2.10 Nessus CSV report format example screenshot.

Figure 2.11 Nessus XML report format example screenshot.

Figure 2.12 OpenVAS normal text report format example screenshot.

Figure 2.13 OpenVAS CSV report format screenshot.

Chapter 3

Figure 3.1 RL agent interaction with environment.

Figure 3.2 The process of training a deep learning model.

Figure 3.3 Python code for initializing the environment and parameters in Ca...

Figure 3.4 Definition of the DQN class and network architecture.

Figure 3.5 Python function implementing a Boltzmann policy.

Figure 3.6 Python class implementing a replay buffer.

Figure 3.7 Saving a PyTorch model.

Figure 3.8 A2C initial python imports.

Figure 3.9 Python code implementing the actor and critic networks in A2C.

Figure 3.10 Python function for generalized action estimation.

Chapter 4

Figure 4.1 Layered processes with which RL agents interact.

Figure 4.2 A taxonomy for attack graph generation and analysis.

Figure 4.3 A depiction of whole campaign emulation (WCE).

Figure 4.4 An example of whole campaign emulation (WCE).

Chapter 5

Figure 5.1 The layered reference model for RL with attack graphs (LRM-RAG). ...

Figure 5.2 A layered reference model for automating penetration testing usin...

Chapter 6

Figure 6.1 Convergent vs. non-convergent learning. When a model does not con...

Figure 6.2 The rough maximum number of hosts an RL methodology for cyberatta...

Figure 6.3 A visualization of a hierarchical action space. At the top level,...

Figure 6.4 A double agent architecture has two sets of action spaces, but th...

Figure 6.5 Navigating terrain in traditional and cyber warfare has many para...

Figure 6.6 A surveillance detection route is a method for detecting potentia...

Figure 6.7 As the penalty scale is increased, an agent acts in a more risk-a...

Figure 6.8 Diagram of network nodes with a few hops of a high-value target (...

Figure 6.9 The shortest path to exfiltrate data from an infected host to the...

Figure 6.10 An attacker that sticks to a particular protocol preferentially ...

Figure 6.11 A diagram of a typical incident of a ransomware attack. An other...

Figure 6.12 Overall view of Mult-Objective RL with preference adjusted rewar...

Chapter 7

Figure 7.1 The CJA algorithm and network setup.

Figure 7.2 Training metrics and plots for exfiltration path discovery.

Figure 7.3 Reward accrual across episodes for command and control channel di...

Figure 7.4 Times of upload actions taken by agent.

Figure 7.5 Training plots for SDR path analysis.

Figure 7.6 SDR paths at varying penalty scales between models.

Figure 7.7 Enhanced exfiltration path training on first network [1].

Figure 7.8 Enhanced exfiltration path training on second network [1].

Figure 7.9 Enhanced exfiltration path analysis

Chapter 8

Figure 8.1 UI Mockup of penetration testing planning application.

Figure 8.2 An example diagram of how multiple cyber defense models could int...

Figure 8.3 An example of metric learning, specifically triplet loss, where e...

Figure 8.4 The aggregate result of metric learning is a latent space with me...

Chapter 9

Figure 9.1 Training performance of DAA and A2C agents with different penalty...

Figure 9.2 Network diagram showing the SDR for various penalty scale factors...

Figure 9.3 MDPS from attack graphs.

Figure 9.4 Overview of stages involved in the deployment of command and cont...

Figure 9.5 RL recommendation of best path to build for C2 channel based on v...

Guide

Cover

Table of Contents

Series Page

Title Page

Copyright

Dedication

List of Figures

About the Authors

Foreword

Preface

Acknowledgments

Acronyms

Introduction

Begin Reading

A Appendix

Index

END USER LICENSE AGREEMENT

Pages

ii

iii

iv

v

xv

xvi

xvii

xviii

xix

xxi

xxii

xxiii

xxiv

xxv

xxvii

xxviii

xxix

xxx

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

251

253

254

255

256

257

IEEE Press445 Hoes LanePiscataway, NJ 08854

 

IEEE Press Editorial BoardSarah Spurgeon, Editor-in-Chief

 

Moeness Amin

Jón Atli Benediktsson

Adam Drobot

James Duncan

Ekram Hossain

Brian Johnson

Hai Li

James Lyke

Joydeep Mitra

Desineni Subbaram Naidu

Tony Q. S. Quek

Behzad Razavi

Thomas Robertazzi

Diomidis Spinellis

Reinforcement Learning for Cyber Operations

 

Applications of Artificial Intelligence for Penetration Testing

 

Abdul Rahman, PhD

Washington, D.C., USA

Christopher Redino, PhD

New York, New York, USA

Dhruv Nandakumar

Boston, MA, USA

Tyler Cody, PhD

Virginia Tech, USA

Sachin Shetty, PhD

Old Dominion University

Suffolk, Virginia, USA

Dan Radke

Arlington, VA, USA

 

 

 

 

 

Copyright © 2025 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data applied for:

Hardback ISBN: 9781394206452

Cover Design: WileyCover Image: © Israel Sebastian/Getty Images

 

 

 

 

To the courageous pioneers in cybersecurity - whose unwavering commitment and bold actions protect our digital frontier, this book is humbly dedicated with the utmost respect and gratitude to helping your cause.

List of Figures

 

Figure 1.1 MITRE ATT&CK framework identifying the tactics for both network and end-point detection

Figure 1.2 Mapping of physical network into an attack graph

Figure 2.1 Red, blue, and purple teams working together to secure the network

Figure 2.2 Red, blue, and purple team in various industries

Figure 2.3 Machine-based penetration testing assistance workflow

Figure 2.4 Pentesting lifecycle and where machine learning can assist

Figure 2.5 Top used ports on the Internet

Figure 2.6 Exploitation paths

Figure 2.7 Ports responding to scans, revealing their state

Figure 2.8 Screen of all Nessus reporting options (file/data formats)

Figure 2.9 Nessus normal text report format example screenshot

Figure 2.10 Nessus CSV report format example screenshot

Figure 2.11 Nessus XML report format example screenshot

Figure 2.12 OpenVAS normal text report format example screenshot

Figure 2.13 OpenVAS CSV report format screenshot

Figure 3.1 RL agent interaction with environment

Figure 3.2 The process of training a deep learning model

Figure 3.3 Python code for initializing the environment and parameters in CartPole-v1 using PyTorch and OpenAI Gym

Figure 3.4 Definition of the DQN class and network architecture

Figure 3.5 Python function implementing a Boltzmann policy

Figure 3.6 Python class implementing a replay buffer

Figure 3.7 Saving a PyTorch model

Figure 3.8 A2C initial python imports

Figure 3.9 Python code implementing the actor and critic networks in A2C

Figure 3.10 Python function for generalized action estimation

Figure 4.1 Layered processes with which RL agents interact

Figure 4.2 A taxonomy for attack graph generation and analysis

Figure 4.3 A depiction of whole campaign emulation (WCE)

Figure 4.4 An example of whole campaign emulation (WCE)

Figure 5.1 The layered reference model for RL with attack graphs (LRM-RAG). Structure and behavior refer to network (and path) structure and behavior

Figure 5.2 A layered reference model for automating penetration testing using RL and attack graphs

Figure 6.1 Convergent vs. non-convergent learning. When a model does not converge on an effective policy, the number of steps per episode will always bounce around some high value. The reward bounces around some noisy floor value for a non convergent model, when the agent never completes its mission, and plateaus around an optimal value for convergent agents that complete their mission goals

Figure 6.2 The rough maximum number of hosts an RL methodology for cyberattack simulation has successfully scaled to over time. These limits often indicate the maximum size of network for which a model can even converge on policy and are independent of how fast a policy can be learned. Only recently are methodologies for realistically sized networks even feasible

Figure 6.3 A visualization of a hierarchical action space. At the top level, an agent only has to decide between a few options and the with subsequent actions the agent moves down the diagram, never having too many options presented to it at once

Figure 6.4 A double agent architecture has two sets of action spaces, but they both act on a single environment which has state variables separately relevant to each agent. Rewards resulting from the exploit agents actions feedback to both agents, but the structuring agents actions only directly feed rewards back to itself

Figure 6.5 Navigating terrain in traditional and cyber warfare has many parallels. The information and options available are asymmetric between attackers and defenders. Defenses and obstacles may be obvious or implicit, and proper reconnaissance, planning and strategy can greatly effect the success of a mission

Figure 6.6 A surveillance detection route is a method for detecting potential surveillance activities or put another way, a method for surveillance of surveillance. This is done by exploiting an asymmetry in information. The hooded figure with the spy glass is trying to gain information about where he may be detected. The dark alley (a possible attack path) is dotted with street lights, a patrol car, and a watch dog, various forms of defense the figure would have to have a plan to avoid if he were to navigate the alley

Figure 6.7 As the penalty scale is increased, an agent acts in a more risk-averse manner. In the case of a reconnaissance mission, the agent will explore less of the network

Figure 6.8 Diagram of network nodes with a few hops of a high-value target (crown jewels) blue nodes are two hops or less from the target

Figure 6.9 The shortest path to exfiltrate data from an infected host to the internet is often safer, as the path will cross fewer defenses, such as the path to the right above, which only crosses fewer firewalls than the path to the left

Figure 6.10 An attacker that sticks to a particular protocol preferentially may see longer exfiltration paths, but they may also avoid detection by blending in with benign or otherwise unmonitored traffic over the network

Figure 6.11 A diagram of a typical incident of a ransomware attack. An otherwise healthy machine (or network of machines) acquires malware in some way which then phones home to an attacker who in turn encrypts high-value assets which will remain encrypted until the attacker receives payment from their victims

Figure 6.12 Overall view of Mult-Objective RL with preference adjusted rewards for structuring and exploiting agents

Figure 7.1 The CJA algorithm and network setup

Figure 7.2 Training metrics and plots for exfiltration path discovery

Figure 7.3 Reward accrual across episodes for command and control channel discovers

Figure 7.4 Times of upload actions taken by agent

Figure 7.5 Training plots for SDR path analysis

Figure 7.6 SDR paths at varying penalty scales between models

Figure 7.7 Enhanced exfiltration path training on first network [1]

Figure 7.8 Enhanced exfiltration path training on second network [1]

Figure 7.9 Enhanced exfiltration path analysis

Figure 8.1 UI Mockup of penetration testing planning application

Figure 8.2 An example diagram of how multiple cyber defense models could interact, as compartmentalized parts of understanding how an attack actually happens. This looks advanced and complicated compared to the current state today, but even this aspirational image is still analogous to having separate brains for walking and chewing gum

Figure 8.3 An example of metric learning, specifically triplet loss, where examples belong to similar classes (in blue) are pulled together in the latent space as the model learns, while simultaneously pushing examples from different classes (in red here) are pushed farther away. The technique is named for computing the loss based on sets of three examples: an anchor “A” for reference, a positive “P” of the same class, and a negative “N” of some other class

Figure 8.4 The aggregate result of metric learning is a latent space with meaningful arrangement and distances (metrics) between examples such that the space encodes a conceptual understanding. The colored dots representing examples belonging to different classes are sorted and separated out as the underlying conceptual idea of the different classes becomes more crisp and defined in the model

Figure 9.1 Training performance of DAA and A2C agents with different penalty scales

Figure 9.2 Network diagram showing the SDR for various penalty scale factors

Figure 9.3 MDPS from attack graphs

Figure 9.4 Overview of stages involved in the deployment of command and control (C2) infrastructure

Figure 9.5 RL recommendation of best path to build for C2 channel based on visibility considerations

About the Authors

Dr. Abdul Rahman holds PhDs in physics, math, information technology–cybersecurity and has expertise in cybersecurity, big data, blockchain, and analytics (AI, ML).

Dr. Christopher Redino holds a PhD in theoretical physics and has extensive data science experience in every part of the AI/ML lifecycle.

Mr. Dhruv Nandakumar has extensive data science expertise in deep learning.

Dr. Tyler Cody is an Assistant Research Professor at the Virginia Tech National Security Institute.

Dr. Sachin Shetty is a Professor in the Electrical and Computer Engineering Department at Old Dominion University and the Executive Director of the Center for Secure and Intelligent Critical Systems at the Virginia Modeling, Analysis and Simulation Center.

Mr. Dan Radke is an Information Security professional with extensive experience in both offensive and defensive cybersecurity.

Foreword

I first met Dr. Abdul Rahman when he was serving as research faculty at the Hume Center at Virginia Tech, and I was building an AI Center of Excellence at one of the largest companies in the world. Abdul’s energy and creativity are the first thing I noticed, quickly followed by his expertise that has been acquired through his decades of experience as a cybersecurity professional. All of this provides a continuous source of unique insights into where AI/ML (machine learning) can make the biggest impact in protecting organizations from cyber threats. What’s been equally impressive to me is his ability to build teams, partner, and collaborate. This book and the group of talented authors assembled to write it are a great example of his ability to build impressive and productive teams.

This work is timely as the nature of cyber threats is evolving at an accelerated pace as attack surfaces continue to expand, nation-state actors become more sophisticated, and ML provides a force multiplier for adversaries. As ML (a subset of artificial intelligence [AI]) becomes more sophisticated, organizations are going to be required to use ML techniques to respond. ML will be a force multiplier for organizations as a small number of operators with AI tools can defend against a higher volume of activities with faster response times. In addition, AI/ML will provide detection capabilities outside the ability of existing rules-based technologies and humans to detect. The norm will be using “AI to combat AI.”

Reinforcement learning (RL), as an ML technique, thrives in situations where a physical environment is being modeled. Trial and failure, or reward and penalties, are central to constructing RL models. A network is an excellent environment for applying RL: it’s intuitively easy to understand the concept of an agent navigating a network and receiving rewards for remaining undetected and penalties when encountering well-secured nodes.

I’ve had the pleasure of working with the authors on a number of ML research projects designed to improve the ability of organizations to detect adversarial campaigns. They have published numerous papers in peer-reviewed scientific forums, a few of which I have had the pleasure of coauthoring. The work is industry-leading and has been deployed at commercial, government, and nonprofit organizations. This esteemed group of authors has successfully bridged the gap between academic research and commercial deployment in a powerful way. The contents of this book are not just academic; these techniques have been applied successfully in the real world. As the age of AI continues to evolve, rapidly translating research into practical commercial tools is going to be critical.

The authors have done a wonderful job of telling the story of the research, development, and application journey that we have been on for the last few years. I hope that you enjoy the book and take away a few of the many useful lessons contained within. Happy reading.

Edward Bowen, PhD

Preface

Artificial intelligence (AI)-driven penetration testing using reinforcement learning (RL) to support cyber defense operations (i.e., blue teaming) enables optimizations in workflows to increase visibility, optimize analyst workflows, and improve cyber detections. Similarly, teams dedicated to developing cyber resilience plans through penetration testing (i.e., red–purple–blue teaming) can leverage this type of RL to identify exploitable weaknesses to improve an organization’s cyber posture.

                                                      

Abdul Rahman

Christopher Redino

Dhruv Nandakumar

Tyler Cody

Sachin Shetty

Dan Radke

USA

Acknowledgments

We thank our esteemed professional colleagues. We also thank the fantastic editorial staff at Wiley-IEEE Indirakumari S., Mary Hatcher, and Victoria Bradshaw for their patience, dedication, and professionalism. We thank our reviewers for their comments and recommendations on improving the manuscript. We also thank our families and each other for the effort, time, and dedication required to see this project through. We are grateful for Dr. Edward Bowen’s unwavering support in incubating and nurturing our efforts from their inception. Dr. Sachin Shetty encouraged our group at the outset to pursue this manuscript; we are very grateful for his collaboration and support. Special thanks to Irfan Saif, Emily Mossberg, Mike Morris, Will Burns, Eric Dull, Dr. Laura Freeman, Dr. Peter Beling, Adnan Amjad, Deb Golden, Eric Hipkins, Joe Nehila, Joe Tobolski, and Ranjith Raghunath for their encouragement and support.

Dr. Abdul Rahman is grateful for his family, coauthors, colleagues, and the great staff at Wiley for supporting this effort.

Dr. Christopher Redino thank Ren Redino, his own little neutral network that he did not design, but he does try to train.

Dhruv Nandakumar would like to thank his family and friends for their invaluable support through the authorship process. He would also like to thank his incredibly talented co-authors without whom this would have not been possible.

Dr. Tyler Cody is grateful to his wife, Bingxu, and their loyal dog, Meimei.

Dr. Sachin Shetty is grateful for his family and coauthors.

Dan Radke extends his deepest thanks to his incredible wife, Danah, for her love and infinite patience. She is the steadfast anchor to reality and the brilliant spark that fuels his journey to dream big and venture into the wonderfully weird.

                    

Abdul Rahman

Christopher Redino

Dhruv Nandakumar

Tyler Cody

Sachin Shetty

Dan Radke

Acronyms

A2C

advantage actor to critic

AD

active directory

ALE

annualized loss exposure

APT

advanced persistent threat

AV

antivirus

C2

command and control

CISO

chief information security officer

CPE

common platform enumeration

CVE

common vulnerability and exposures

CVSS

common vulnerability scoring system

CWE

common weakness enumeration

DAA

double agent architecture

DLP

data loss prevention

DQN

deep Q-network

EDR

end point detection and response

EPP

end point protection

HID

host-based intrusion detection

HIP

host-based intrusion prevention

IAPTS

intelligent automated penetration testing system

IDS

intrusion detection system

IPB

intelligence preparation of the battlefield

IPS

intrusion prevention system

LDAP

lightweight directory access protocol

LEF

loss event frequency

LM

loss magnitude

LRM-RAG

layered reference model for RL with attack graphs

MDP

Markov decision processes

MOL

multi objective learning

MORL

multi objective reinforcement learning

MTL

multi task learning

Nmap

network mapper

OSINT

open source intelligence

PII

personally identifiable information

PPO

proximal policy optimization

PT

penetration testing

RL

reinforcement learning

SDA

situational domain awareness

SDR

surveillance detection routes

SMB

server message block

SNMP

simple network management protocol

TEF

threat event frequency

TTP

tactics, techniques, and procedures

YAML

YAML ain’t markup language

Introduction

This book focuses on using reinforcement learning (RL), a type of artificial intelligence (AI), to learn about the traversal of optimal network paths to identify valuable operational details for cyber analysts. Bad actor intent inspires network path traversal leading toward adversarial campaign goals. Among an adversary’s key objectives is moving through their target network while minimizing detection. Cyber defenders strive to optimize sensor placement in key network locations and to optimize the visibility facilitating the detection of hackers, threats, and detection of malicious software (malware)/threats. The AI proposed in this book is designed to accomplish both goals.

A network consists of devices (switches, routers, firewalls, intrusion detection systems, intrusion prevention systems, endpoints, and various security appliances) deployed to maximize protection for the organization’s information technology (IT) resources. Anticipating vulnerabilities and potential exploits to enhance cyber defense, commonly called the “blue picture,” typically requires analyzing collected data from sensors and agents across diverse networks. Understanding patterns based on observed network activities involves careful curation and processing of this data to facilitate developing, training, testing, and validating AI / Machine Learning (ML) / analytics models for detecting nefarious behaviors. Although this traditional approach has yielded some results in the past, this book aims to present a different methodology for pinpointing weaknesses in networks. This book discusses how to translate physical networks into logical representations called attack graphs and employ RL AI models to predict vulnerabilities, optimal visibility, and weaknesses within these network topologies.

Adversarial attack campaigns vary by intention, from establishing a foothold to full disruption. A core premise in most advanced persistent threat (APT) and other sophisticated adversary’s objectives is to develop an approach to understanding an environment (hosts, devices, network, etc.) to include their roles and relative positioning of relevant devices in the network. Adversaries focus on developing “connectivity maps” while developing attack campaigns to inform, shape, and drive clear focus on their goals. Among these goals is learning about the environment via undetected reconnaissance methods to ultimately map out the adversarial attack paths, systems, and applications leading to the compromise of these key assets (also called crown jewels) or theft of data (also called exfiltration) within them.

In this book, the physical network has to be translated into a logical structure called an attack graph [2] so the RL AI can provide useful predictions. In RL, a reward is attributed to the detection value of devices in the network including their inherent “cyber terrain” characterization [1]. The RL AI inspired from adversarial attack campaigns presented in this book motivates different reward systems, the network’s topology, and cyber terrain informed by likely key cyber campaign objectives. Examples of these objectives consist of optimal paths to exfiltrate data or the best location in the network to perform reconnaissance without being append “(e.g. surveillance detection route)”. In this book, we will discuss the requirements for building the RL AI to support deeper and broader coverage within the book, along with several cyber-specific use cases that are useful for cyber analysts, threat researchers, cyber hunters, and cyber operations teams.

Traditional network penetration tests are one of the best ways to turn these network representations into network protection realities. Specialized teams perform these tests teams with highly technical expertise. This expertise includes software design, network transmission protocols and standards, and how businesses and private persons use computers. The human element of the testing is irreplaceable, the variables of the vulnerabilities are complex, and no network is like any other. These complexities require using various technologies, tools, and applied knowledge during a test. However, the results of the tests are often acquired through manual and tedious means due to the overabundance of information, logs, and network endpoints that must be sufficiently aggregated to hunt for vulnerabilities. Using automated tooling to collect information is a concept that has been introduced previously. However, network defenders and red teams still face the significant challenge of analyzing and post-collection intelligence to enable “sense-making”. However, AI capabilities have drastically improved over the past five years. The RL AI model approach proposed in this book can be used to sift through mountains of previously ignored data to find patterns, anomalies, and graph-linked data epiphanies helping analysts and operators accelerate their ability to thwart bad actors with malicious intent.

References

  

1

Greg Conti and David Raymond.

On cyber: towards an operational art for cyber conflict

. Kopidion Press, 2018.

  

2

Xinming Ou, Wayne F Boyer, and Miles A McQueen. A scalable approach to attack graph generation. In

Proceedings of the 13th ACM Conference on Computer and Communications Security

, CCS ’06, pages 336345, New York, NY, USA, 2006. Association for Computing Machinery.

1Motivation

1.1 Introduction

Reinforcement learning (RL) applied to penetration testing has demonstrated feasibility, especially when considering constraints on the representation of attack graphs, such as scale and observability [8]. As adversaries build high-fidelity maps of target networks through reconnaissance methods that populate network topology pictures, a deeper understanding of optimal paths to conduct (cyber) operations becomes clear. In this respect, cyber defenders (blue teams) and cyberattackers (red teams) employ principles of visibility optimization to improve or diminish detection efficacy. Protecting/exploiting targets within complex network topologies involves understanding that detection evasion involves keenly traversing paths by reducing visibility. On the other hand, knowledge of such paths empowers blue teams (cyber defenders) by identifying “blind spots” and weaknesses in the network where defenses can be improved. This book is motivated by the prevailing belief that an attack campaign is a well-planned orchestration by well-equipped adversaries involving keen traversal of network paths to avoid detection. Figure 1.1 depicts the “bookends” of the MITRE ATT&CK framework as the focus on tactics that can be detected through network detection (ND). Whereas the “internal” part of the framework focuses on tactics that can be detected through end-point (EP) detection, the “book ends” typically entail network-centric detections. In this methodology, artificial intelligence (AI) models can learn through direct interaction with attack graphs enriched with scan data derived from EP and network information rather than relying on a fixed and carefully curated dataset. This book aims to offer “blue team” and “red team” operations staff the ability to utilize the RL AI methods developed in this book to improve visibility, optimize operations, clarify cyber planning objectives, and improve overall cybersecurity posture.

Figure 1.1 MITRE ATT&CK framework identifying the tactics for both network and end-point detection.

This chapter introduces the key technical elements that provide a foundation for the use of AI. MITRE’s Fort Meade Experiment (FMX) research environment [25] was used to systematically categorize adversary behavior during structured emulation exercises within test networks. In 2010, FMX served as a dynamic testing ground, providing researchers with a “living lab” capability to deploy tools and refine ideas for more effective threat detection within the MITRE corporate network [25]. MITRE initiated research within FMX to expedite the detection of advanced persistent threats (APTs) under an “assume breach” mindset. Periodic cyber game exercises emulated adversaries in a closely monitored environment, while threat hunting tested analytic hypotheses against collected data, all with the overarching goal of enhancing post-compromise threat detection in enterprise networks through telemetry sensing and behavioral analytics [1, 16, 25].

ATT&CK played a central role in the FMX research, initially crafted in September 2013 with a primary focus on the Windows enterprise environment. Over time, it underwent refinement through internal research and development, leading to its public release in May 2015 with 96 techniques organized under 9 tactics. After its release, ATT&CK experienced substantial growth, fueled by contributions from the cybersecurity community. MITRE introduced additional ATT&CK-based models, expanding beyond Windows to include Mac and Linux in 2017 (ATT&CK for Enterprise). Other models, such as PRE-ATT&CK (2017), ATT&CK for Mobile (2017), ATT&CK for Cloud (2019), and ATT&CK for ICS (2020), addressed specific domains [16, 25].

ATT&CK, functioning as a knowledge base for cyber adversary behavior and taxonomy, consists of two parts: ATT&CK for Enterprise (covering behavior against enterprise IT networks and cloud) and ATT&CK for Mobile (focusing on behavior against mobile devices). Its inception in 2013 aimed to document common tactics, techniques, and procedures (TTPs) employed by APTs on Windows enterprise networks within the context of the FMX research project.

The framework’s significance lies in documentation and in providing behavioral observables for detecting attacks by analyzing cyber artifacts. It employs the structure of TTP to help analysts understand adversarial actions, organize procedures, and fortify defenses. Despite highlighting numerous techniques, ATT&CK falls short in offering insights into how adversaries combine techniques, emphasizing the need for well-defined technique associations for constructing TTP chains. The TTP structure enables analysts to categorize adversarial actions into specific procedures related to particular techniques and tactics, facilitating an understanding of an adversary’s objectives and enhancing defense strategies. In addition, these techniques and procedures also serve as indicators of behavior for detecting attacks by scrutinizing cyber artifacts obtained from network and end-system sources [16].

While MITRE ATT&CK comprehensively outlines various techniques an adversary may employ, the necessary associations between techniques for constructing TTP chains remain insufficiently specified. Establishing these associations is crucial as they assist analysts and operators in reasoning about adversarial behavior and predicting unobserved techniques based on those observed in the TTP chain (i.e., unknown behavior). Without well-defined technique associations, cybersecurity professionals face challenges in efficiently navigating the growing search space, especially as the number of TTP chains expands exponentially with the increasing variety of techniques [1]. Given the limited exploration of technique correlations to date, ATT&CK concentrates on acquiring knowledge about the associations between attack techniques, revealing interdependencies, and relationships through analyzing real-life attack data [1, 16, 25].

In classifying reported attacks, the framework distinguishes between APTs and software attacks. APT attacks align with MITRE’s threat actor “groups,” while software attacks encompass various malicious code forms. Each comprises post-exploit techniques constituting the TTP chain of APTs or software. They use discrete variables, specifically asymmetrical binary variables, with outcomes of 0 or 1 representing negative or positive occurrences of a technique in an attack instance, respectively [1].

The notable limitations outlined by MITRE include a subset representation of techniques and potential biases in heuristic mappings. There is a need to characterize APT and software attacks driven by a requirement to continually evaluate the evolving threat landscape dynamics. MITRE acknowledges a constraint in their data collection process, emphasizing that APT and software attacks may not represent the full spectrum of techniques employed by associated threat actors. Instead, the framework offers a subset based on publicly available reporting, making it challenging to ascertain the true techniques employed in an adversarial attack campaign (i.e., the definition of operational ground truth). Second, the framework is subject to mapping biases, where heuristics and automated mappings of threat reports to techniques may inadvertently exhibit bias. Recognizing these limitations, MITRE ATT&CK and the processes/workflows to keep it up to date embody an approach for continually refining the characterization of APT and software attacks informing on the scope of possible adversarial TTPs in an attack campaign [1]. While not optimal, to date, this represents one of the few broadly accepted methodologies for characterizing adversarial workflows [25].

1.1.1 Cyberattack Campaigns via MITRE ATT&CK

A cyberattack campaign can follow a workflow captured within the MITRE ATT&CK [16] framework. The workflow progresses from left to right where the left column represents the collection of tactics for preplanning. As Figure 1.1 depicts the “bookends” of the kill chain as being relegated to ND capabilities, the middle portion is typically aligned with activities on EPs. Attack campaigns start with a goal that may involve data exfiltration from key information technology (IT) assets, typically called crown jewels (CJs). These systems, databases, and devices are of high value within an organization. While a large emphasis is placed on detecting both known and unknown threats through using agents deployed on EPs, detection of new low signal-to-noise (STN) sophisticated adversarial attacks evades most detection capabilities. Unfortunately, the constant tension between building better EP detections to trigger indicators of compromise (IOC) from new threats is typically hit or miss [12, 13, 15, 23, 26].

1.2 Attack Graphs

The flaw hypothesis model outlines a general process involving the gathering of information about a system, formulating a list of hypothetical flaws (generated, e.g., through domain expert brainstorming), prioritizing the list, testing hypothesized flaws sequentially, and addressing those that are discovered. McDermott emphasizes the model’s applicability to almost all penetration testing scenarios [14]. The attack tree model introduces a tree structure to the information-gathering process, hypothesis generation, etc., offering a standardized approach to manual penetration testing and providing a foundation for automated penetration testing methods [19, 20]. The attack graph model introduces a network structure, distinguishing itself from the attack tree model in terms of the richness of topology, and the corresponding amount of information required to specify the model [7, 14, 18].

Automated penetration testing, integrated into practice [24], relies on the attack tree and attack graph models as their foundation. In RL, these models involve constructing network topologies, treating machines (i.e., servers and network devices) as vertices and links between machines as edges. Variations include additional details about subnetworks and services. In the case of attack trees, probabilities are assigned to branches between parent and child nodes. For attack graphs, transition probabilities between states are assigned to each edge. This is described in more detail in Chapters 3 and 4.

While many advantageous properties of attack trees persist in attack graphs, it remains uncertain whether attack graphs can outperform attack trees in systems that are largely undocumented, i.e., systems with partial observability [14]. RL for penetration testing utilizes the attack graph model, treating the environment either as a Markov decision process (MDP), reflecting classical planning with deterministic actions and known network structure, or as a partially observable Markov decision process (POMDP), where action outcomes are stochastic and network structure and configuration are uncertain [9, 21, 28].

1.3 Cyber Terrain

The foundational concept of the terrain is integral to intelligence preparation of the battlefield (IPB) [5, 10]. In the physical realm, terrain pertains to land and its features. In [5], a definition of cyber terrain is given as “the systems, devices, protocols, data, software, processes, cyber personas, and other network entities that comprise, supervise, and control cyberspace.” Cyber terrain emphasizes operations at the strategic, operational, and tactical levels, encompassing elements such as transatlantic cables and satellite constellations, telecommunications offices and regional data centers, and wireless spectrum and local area network protocols. The use of RL engages in the logical plane of cyber terrain, which includes the data link, network, network transport, session, presentation, and application layers (i.e., layers 2–7) of the open systems interconnection (OSI) model [11].

Terrain analysis typically follows the OAKOC framework, consisting of observation and fields of fire (O), avenues of approach (A), key and decisive terrain (K), obstacles (O), and cover and concealment (C[CE2]). These notions from traditional terrain analysis can be applied to cyber terrain [2, 5]. For example, fields of fire may concern all that is network reachable (i.e., line of sight), and avenues of approach may consider network paths inclusive of available bandwidth [5]. In this previous work, we use obstacles to demonstrate how our methodology can be used to bring the first part of the OAKOC framework to attack graph construction for RL [8]. The RL described in this book illustrates how our methodology can integrate the initial aspect of the OAKOC framework into attack graph construction for RL. Cyber terrain functions for each type of device are annotated into the attack graphs prior to the RL running over them (Figure 1.2).

Figure 1.2 Mapping of physical network into an attack graph.

1.4 Penetration Testing

Penetration testing (pen testing) is defined by Denis et al. as, “a simulation of an attack to verify the security of a system or environment to be analyzed... through physical means utilizing hardware, or through social engineering” [6]. They continue by emphasizing that penetration testing is different from port scanning. Specifically, if port scanning is looking through binoculars at a house to identify entry points, penetration testing is having someone break into the house.

Pen testing is part of broader vulnerability detection and analysis, which typically combines penetration testing with static analysis [3, 4, 22]. Penetration testing models have historically taken the form of the flaw hypothesis model [17, 27], the attack tree model [19, 20], or the attack graph model [7, 14, 18]. A detailed discussion of penetration testing is covered in Chapter 2.

1.5 AI Reinforcement Learning Overview

In the sophisticated arena of cybersecurity, the integration of AI and, more specifically, RL heralds a transformative approach to enhancing penetration testing within network systems. This introductory section foreshadows the comprehensive exploration in Chapter 3, setting the stage for an in-depth discussion on the pivotal role of RL in devising more efficient and robust penetration testing methodologies.

RL emerges as a quintessential paradigm for penetration testing, attributed to its inherent adaptability and sophisticated decision-making properties. While conventional machine learning methodologies excel in prediction-based tasks, they often falter in the face of dynamic and unpredictable scenarios characteristic of network security. In contrast, RL excels by learning to formulate strategies through interaction, thereby making it exceptionally suitable for the multifaceted and unpredictable domain of network security.

Central to RL is the concept of an agent that learns to make decisions by interacting with its environment to achieve a defined objective. This paradigm mirrors the process of a penetration tester navigating through a network, making strategic decisions at each juncture to delve deeper while remaining undetected. With each interaction, the agent acquires knowledge, incrementally refining its strategy to maximize efficacy – akin to how a human tester enhances their techniques through experience.

Chapter 3 is dedicated to elucidating the theoretical and mathematical underpinnings of RL. It commences with a delineation of the environment, states, actions, and rewards, progressing to dissect the MDPs. MDPs offer a mathematical framework to model decision-making scenarios where outcomes are influenced by both randomness and the decisions of the agent, resonating deeply with the unpredictable nature of penetration testing.

The discourse will extend to deep reinforcement learning (DRL), highlighting how neural networks are employed to manage high-dimensional inputs and complex policy formulations. The capability of DRL to process and make informed decisions based on extensive and intricate data is indispensable for navigating and exploiting sophisticated network architectures.

As readers embark on the journey through Chapter 3, it is imperative to recognize that the exploration of RL is not merely about comprehending algorithms but about appreciating their transformative potential in redefining penetration testing. The forthcoming chapter will furnish the technical acumen necessary to fully understand these concepts and explore their applicability to real-world security challenges.

The application of RL in penetration testing signifies a substantial advancement, offering a methodology that learns, adapts, and dynamically optimizes strategies. As readers proceed, they should remain cognizant of the potential of these techniques not only to understand network vulnerabilities but also to anticipate and mitigate evolving security threats. Chapter 3 promises to unfold these concepts meticulously, paving the path for a new paradigm of intelligent and autonomous penetration testing.

1.6 Organization of the Book

Chapter 1 introduces the book’s focus on the intersection of AI, in particular, RL, with cybersecurity. Chapter 2 discusses current approaches to penetration testing followed by a review of RL in Chapter 3. Chapter 4 discusses the motivation for using RL for penetration testing followed by how to operationalize these RL models in Chapter 5. RL for penetration testing from a practical standpoint is covered in Chapter 6 followed by scaling considerations in Chapter 7. Extending and using these models is covered in Chapter 8 followed by the conclusion in Chapter 9.

References

  

1

Rawan Al-Shaer, Jonathan M Spring, and Eliana Christou. Learning the associations of MITRE ATT&CK adversarial techniques. In

2020 IEEE Conference on Communications and Network Security (CNS)

, pages 1–9. IEEE, 2020.

  

2

Scott D Applegate, Christopher L Carpenter, and David C West. Searching for digital hilltops.

Joint Force Quarterly

, 84(1):18–23, 2017.

  

3

Aileen G Bacudio, Xiaohong Yuan, Bei-Tseng B Chu, and Monique Jones. An overview of penetration testing.

International Journal of Network Security & Its Applications

, 3(6):19, 2011.

  

4

Brian Chess and Gary McGraw. Static analysis for security.

IEEE Security & Privacy

, 2(6):76–79, 2004.

  

5

Greg Conti and David Raymond.

On cyber: towards an operational art for cyber conflict

. Kopidion Press, 2018.

  

6

Matthew Denis, Carlos Zena, and Thaier Hayajneh. Penetration testing: concepts, attack methods, and defense strategies. In

2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT)

, pages 1–6. IEEE, 2016.

  

7

Bing Duan, Yinqian Zhang, and Dawu Gu. An easy-to-deploy penetration testing platform. In

2008 The 9th International Conference for Young Computer Scientists

, pages 2314–2318. IEEE, 2008.

  

8

Rohit Gangupantulu, Tyler Cody, Paul Park, Abdul Rahman, Logan Eisenbeiser, Dan Radke, and Ryan Clark. Using cyber terrain in reinforcement learning for penetration testing. In

2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)

, pages 1–8, 2022.

  

9

Mohamed C Ghanem and Thomas M Chen. Reinforcement learning for intelligent penetration testing. In

2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)

, pages 185–192. IEEE, 2018.

10

Jeffrey Guion and Mark Reith. Cyber terrain mission mapping: tools and methodologies. In

2017 International Conference on Cyber Conflict (CyCon U.S.)

, pages 105–111, 2017. doi. 10.1109/CYCONUS.2017.8167504.

11

ISO/IEC 7498-1:1994. Information technology – open systems interconnection – basic reference model: the basic model, 1999. URL

https://www.iso.org/standard/20269.html

.

12

Jason Kick. Cyber exercise playbook. Technical report MP140714, MITRE Corporation, November 2014.

https://www.mitre.org

.

13

Lachlan MacKinnon, Liz Bacon, Diane Gan, Georgios Loukas, David Chadwick, and Dimitrios Frangiskatos. Chapter 20 - Cyber security countermeasures to combat cyber terrorism. In Babak Akhgar and Simeon Yates, editors,

Strategic intelligence management

, pages 234–257. Butterworth-Heinemann, 2013. ISBN978-0-12-407191-9. doi: 10.1016/B978-0-12-407191-9.00020-X. URL

https://www.sciencedirect.com/science/article/pii/B978012407191900020X

.

14

James P McDermott. Attack net penetration testing. In

Proceedings of the 2000 Workshop on New Security Paradigms

, pages 15–21, 2001.

15

MITRE. A practical guide to adversary engagement. Technical report, MITRE Corporation, February 2022. URL

https://engage.mitre.org

.

16

MITRE. MitreATT&CK framework, 2023. URL

https://attack.mitre.org

.

17

Charles P Pfleeger, Shari L Pfleeger, and Mary F Theofanos. A methodology for penetration testing.

Computers & Security

, 8(7):613–620, 1989.

18

Hadar Polad, Rami Puzis, and Bracha Shapira. Attack graph obfuscation. In

International Conference on Cyber Security Cryptography and Machine Learning

, pages 269–287. Springer, 2017.

19

Chris Salter, O Sami Saydjari, Bruce Schneier, and Jim Wallner. Toward a secure system engineering methodology. In

Proceedings of the 1998 Workshop on New Security Paradigms

, pages 2–10, 1998.

20

Bruce Schneier. Attack trees.

Dr. Dobb’s Journal

, 24(12):21–29, 1999.

21

Jonathon Schwartz and Hanna Kurniawati. Autonomous penetration testing using reinforcement learning.

arXiv preprint arXiv:1905.05965

, 2019.

22

Sugandh Shah and Babu M Mehtre. An overview of vulnerability assessment and penetration testing techniques.

Journal of Computer Virology and Hacking Techniques

, 11(1):27–49, 2015.

23

Nivedita Shinde and Priti Kulkarni. Cyber incident response and planning: a flexible approach.

Computer Fraud and Security

, 2021(1):14–19, Jan 2021. doi: 10.1016/s1361-3723(21)00009-9.

24

Yaroslav Stefinko, Andrian Piskozub, and Roman Banakh. Manual and automated penetration testing. Benefits and drawbacks. Modern tendency. In

2016 13th International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET)

, pages 488–491. IEEE, 2016.

25

Blake E Strom, Andy Applebaum, Doug P Miller, Kathryn C Nickels, Adam G Pennington, and Cody B Thomas. MITRE ATT&CK design and philosophy. Technical report MP180360R1, MITRE Corporation, March 2020.

26

Unal Tatar, Bilge Karabacak, and Adrian Gheorghe. An assessment model to improve national cyber security governance. In

11th International Conference on Cyber Warfare and Security: ICCWS2016

, page 312, 2016.

27

Clark Weissman. Penetration testing.

Information Security: An Integrated Collection of Essays

, 6:269–296, 1995.

28

Mehdi Yousefi, Nhamo Mtetwa, Yan Zhang, and Huaglory Tianfield. A reinforcement learning approach for attack graph analysis. In

2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE)

, pages 212–217. IEEE, 2018.

2Overview of Penetration Testing

2.1 Penetration Testing

This chapter introduces penetration testing and an overview of what it involves. We will demonstrate modern cybersecurity practices at a high level and highlight where the processes are manually performed, slow, expensive, and less than ideal. After a quick look into some terms and historical activities, we will delve into the specifics and see how reinforcement learning (RL) can streamline the processes to aid the security professional better. Links to tools and example methodologies are provided to guide the reader to understand the concepts more quickly. By the end of this chapter, you should grasp red and blue team functions, how each plays a role in providing greater cybersecurity for their clients, and how the current process can be enhanced using strategic RL techniques.

2.1.1 Introduction to Red Teaming

In this section, we define the various teams that may be involved in penetration testing, briefly look into its history, and review some of the high-level concepts involved and desired objectives or outcomes of the teams’ efforts.

2.1.1.1 Why? Reasons for Red Team Penetration Testing

Red team penetration testing is a strategic and operational cybersecurity practice used to test an organization’s defenses and find its vulnerabilities thoroughly. Through simulating real-world attack scenarios and using the same sophisticated attack techniques as the malicious actors, red teaming exposes weaknesses that traditional security measures might miss. This proactive approach enables organizations to identify risks to critical assets, prioritize their limited security efforts, and fine-tune their defense mechanisms to respond to the latest threats.

Many corporations fall into strictly regulated data protection and privacy sectors, where high stakes and extremely sensitive information may fall into regulatory compliance mandates, and failing to maintain standards results in steep penalties and consequences. These regulations are in place to protect cyber infrastructure and the sensitive information contained within. Industry leaders and the government maintain and enforce most regulations. Ranging from the Payment Card Industry (PCI) standards [18], the Health Insurance Portability and Accountability Act (HIPPA) [8], and broad government cybersecurity standards [9], the regulations are continuously developed and updated to protect consumers and citizens.

Red teaming is integral in evaluating whether the company is meeting these standards. Ultimately, the red team penetration test empowers organizations to stay ahead of current and evolving threats, minimizing the impact of potential breaches and keeping their leadership informed. It may seem complicated, but maintaining cybersecurity is essential for continued safety and for the advancement of our digital lives. Some real-world benefits achieved by maintaining a policy of using well-executed red team simulations include:

2.1.1.1.1 Realistic Threat Simulation Red team penetration testing simulates real-world attack scenarios compared to static security assessments or in-place methodologies. These scenarios closely mimic the tactics, techniques, and procedures (TTPs) of actual threat actors targeting networks. Using red team tactics allows organizations to understand the vulnerabilities from an attacker’s perspective and assess the level of risk to their assets and their ability to detect and respond to advanced threats.

2.1.1.1.2 Advanced Threat Detection Company networks contain and handle valuable and sensitive data, ranging from personally identifiable information (PII) to trade secrets, and are extremely attractive to cyber criminals and state actors. Red teaming helps identify the weaknesses that traditional security measures might not reveal. For example, vulnerabilities in networks, applications, appliances, cloud environments, and physical security. Detecting these vulnerabilities and remediating them before an attacker notices them secures the network from future breaches.

2.1.1.1.3 Evaluating Security Defenses Red teaming is not just a realistic test; it also evaluates the effectiveness of an organization’s security controls. It assesses the capabilities of firewalls, intrusion detection systems (IDS