The Reign of Botnets - David Senecal - E-Book

The Reign of Botnets E-Book

David Senecal

0,0
25,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

A top-to-bottom discussion of website bot attacks and how to defend against them In The Reign of Botnets: Defending Against Abuses, Bots and Fraud on the Internet, fraud and bot detection expert David Senecal delivers a timely and incisive presentation of the contemporary bot threat landscape and the latest defense strategies used by leading companies to protect themselves. The author uses plain language to lift the veil on bots and fraud, making a topic critical to your website's security easy to understand and even easier to implement. You'll learn how attackers think, what motivates them, how their strategies have evolved over time, and how website owners have changed their own behaviors to keep up with their adversaries. You'll also discover how you can best respond to patterns and incidents that pose a threat to your site, your business, and your customers. The book includes: * A description of common bot detection techniques exploring the difference between positive and negative security strategies and other key concepts * A method for assessing and analyzing bot activity, to evaluate the accuracy of the detection and understand the botnet sophistication * A discussion about the challenge of data collection for the purpose of providing security and balancing the ever-present needs for user privacy Ideal for web security practitioners and website administrators, The Reign of Botnets is the perfect resource for anyone interested in learning more about web security. It's a can't-miss book for experienced professionals and total novices alike.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 377

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Introduction

Who Should Read This Book

1 A Short History of the Internet

From ARPANET to the Metaverse

The Different Layers of the Web

The Emergence of New Types of Abuses

The Proliferation of Botnets

Quantifying the Bot Traffic Volume on the Internet

Botnets Are Unpredictable

Bot Activity and Law Enforcement

Summary

2 The Most Common Attacks Using Botnets

Account Takeover

Account Opening Abuse

Web Scraping

Scalping: Hype Events

Carding Attacks

Spam and Abusive Language

Summary

3 The Evolution of Botnet Attacks

Incentive vs. Botnet Sophistication

HTTP Headers 101

The Six Stages of a Botnet Evolution

Botnets with CAPTCHA-Solving Capabilities

AI Botnets

The Botnet Market

Summary

4 Detection Strategy

Data Collection Strategy

Positive vs. Negative Security

The Evolution of the Internet Ecosystem

The Evolution of Detection Methods

Transparent Detection Methods

Risk Scoring

Summary

5 Assessing Detection Accuracy

Prerequisites

High-Level Assessment

Quantitative Assessment (Volume)

Feedback Loop

Response Strategy Assessment

Low-Level Assessment

Assessment Guidelines

Identifying Botnets

Botnet Case Study

Summary

6 Defense and Response Strategy

Developing a Defense Strategy

Technology Stack to Defend Against Bots and Fraud

Response Strategies

Operationalization

Defending Against CAPTCHA Farms

Summary

7 Internet User Privacy

The Privacy vs. Security Conundrum

The State of Privacy and Its Effect on Web Security

The Private Access Token Approach

Summary

References

Index

Copyright

Dedication

About the Author

About the Technical Editor

Acknowledgments

End User License Agreement

List of Illustrations

Chapter 1

Figure 1.1 The different layers of the Web

Figure 1.2 The AlphaBay marketplace on the dark web

Figure 1.3 The high-level architecture of a botnet

Figure 1.4 Traffic distribution for a sportswear company in the United State...

Figure 1.5 Traffic distribution for an auto part retailer in the United Stat...

Figure 1.6 Traffic distribution for a home improvement company in the United...

Figure 1.7 Typical traffic distribution for a website

Figure 1.8 Continuous and persistent bot activity

Figure 1.9 Persistent bot with long periods of inactivity

Figure 1.10 Punctual bot activity

Figure 1.11 One-off bot activity

Chapter 2

Figure 2.1 The three steps of account takeover

Figure 2.2 The

haveibeenpwned.com

portal

Figure 2.3 The Sentry MBA user interface

Figure 2.4 Phishing attack against a bank

Figure 2.5 The impact of account takeover by industry

Figure 2.6 A credential stuffing attack against an e-commerce website...

Figure 2.7 Attack distribution by country of origin

Figure 2.8 Attack distribution by botnet

Figure 2.9 An example of an HTTP header signature during an attack

Figure 2.10 Account opening fraud scheme by industry

Figure 2.11 A fraudster uses invalid email addresses when the site doesn't e...

Figure 2.12 Account opening abuse with email validation

Figure 2.13

Hotmailbox.me

reselling Outlook email accounts

Figure 2.14

Temp-mail.org

disposable email service portal

Figure 2.15 Large attack using many random email domains

Figure 2.16 Account creation pattern with emails from the domain

Gmail.com

Figure 2.17 Account creation pattern with emails from the domain

iCloud.com

...

Figure 2.18 An account creation pattern with emails from the domain cantuenz...

Figure 2.19 An account creation pattern with emails from the domain cpzmars....

Figure 2.20 Account creation pattern with emails from the domain

yahoo.gr

Figure 2.21 Account opening abuse ring with advanced data input validation

Figure 2.22 The outcome of scraping activity by industry

Figure 2.23 Good bot traffic distribution on a retailer site

Figure 2.24 Scraping activity from Googlebot

Figure 2.25 Scraping activity from Bingbot

Figure 2.26 Scraping activity from Applebot powering Siri

Figure 2.27 Scraping activity from OpenAI powering ChatGPT

Figure 2.28 Bargain hunting life cycle

Figure 2.29 Example of product resale on the marketplace at a premium

Figure 2.30 The business intelligence life cycle

Figure 2.31 Scalping with hype events

Figure 2.32 The checkout process supported by most botnets used for scalping

Figure 2.33 Impact of the hype event on the login endpoint

Figure 2.34 Increases in bot configuration activity preceding the events, as...

Figure 2.35 A significant increase in activity on the checkout endpoint

Figure 2.36 Excessive bot activity that subsides before the event

Figure 2.37 Gift cards used as payment method during the holiday shopping se...

Chapter 3

Figure 3.1 Attack sophistication vs. revenue potential

Figure 3.2 A simple botnet with a handful of nodes

Figure 3.3 A more advanced botnet with browser impersonation and broader dep...

Figure 3.4 A botnet with increased scale leveraging cloud-based proxy servic...

Figure 3.5 A headless browser botnet leveraging advanced proxy services

Figure 3.6 Advanced botnet with CAPTCHA-solving capabilities

Figure 3.7 The workforce distribution by country according to AntiCaptcha

Figure 3.8 The CAPTCHA solver workflow

Figure 3.9 Decision tree of an AI botnet

Chapter 4

Figure 4.1 Coverage of the attack surface with negative and positive securit...

Figure 4.2 Example of word CAPTCHAs

Figure 4.3 Examples of image CAPTCHAs

Figure 4.4 Examples of mini-game CAPTCHAs

Figure 4.5 Behavioral CAPTCHA

Figure 4.6 Detection methods included in a web application firewall

Figure 4.7 Detection methods included in bot management products

Figure 4.8 Detection methods included in an advanced bot management product...

Figure 4.9 Detection methods included in fraud detection product

Figure 4.10 High-level bot management architecture

Figure 4.11 Perceived end-user friction by type of detection and challenge...

Figure 4.12 The PoW detection workflow

Figure 4.13 Human versus bot mouse pointer trajectory

Figure 4.14 Human vs. bot mouse pointer movement over time

Figure 4.15 Human vs. bot mouse pointer velocity

Figure 4.16 Human vs. bot mouse pointer acceleration

Figure 4.17 User-behavior anomaly detection

Figure 4.18 A simple identity graph profile

Figure 4.19 Interconnection between profiles

Figure 4.20 Advanced profile interconnection

Figure 4.21 Output of the

whois

command on a disposable domain

Figure 4.22 Risk score band and risk classification

Chapter 5

Figure 5.1 Low- and high-risk traffic timeline

Figure 5.2 A repeating circadian pattern based on a seven-day time frame...

Figure 5.3 A circadian pattern with lower amplitude

Figure 5.4 A continuous and persistent bot activity pattern

Figure 5.5 An intermittent bot activity pattern

Figure 5.6 A sporadic bot activity pattern

Figure 5.7 A spiky bot activity pattern

Figure 5.8 A poorly simulated bot circadian pattern

Figure 5.9 A wave bot activity pattern

Figure 5.10 Traffic volume variation throughout the week for a banking site...

Figure 5.11 A seven-day traffic pattern with anomalies

Figure 5.12 Bots are unable to complete the challenge.

Figure 5.13 Humans are able to clear the challenge.

Figure 5.14 Legitimate traffic pattern by country

Figure 5.15 Bot traffic pattern by country

Figure 5.16 Traffic pattern from cloud services’ AS number

Figure 5.17 Traffic pattern from residential and mobile ISPs

Figure 5.18 Evenly distributed synchronized bot activity on several IP addre...

Figure 5.19 Sporadic activity from legitimate users

Figure 5.20 Top five User-Agent for high-risk traffic

Figure 5.21 Top five User-Agents for low-risk traffic

Figure 5.22 High-risk traffic assessment decision tree

Figure 5.23 Low-risk traffic assessment decision tree

Figure 5.24 Daily recurring and intense bot activity

Figure 5.25 Daily bot activity with a circadian pattern

Figure 5.26 Nightly square shape bot activity

Figure 5.27 Occasional but intense bot activity

Chapter 6

Figure 6.1 Bot detection life cycle

Figure 6.2 The Forrester Wave, Bot Management, Q2 2022

Figure 6.3 Defense-in-depth architecture

Figure 6.4 Bot and fraud management component architecture

Figure 6.5 Traffic distribution by risk category

Figure 6.6 Traffic distribution by risk category

Chapter 7

Figure 7.1 IP address diversity throughout the day

Figure 7.2 Communication through a proxy service or VPN

Figure 7.3 Communication through a private relay

Figure 7.4 User activity tracking through third-party cookies

Figure 7.5 Requests blocked by

Disconnect.me

in Firefox strict privacy mode...

Figure 7.6 Requests flagged by Disconnect.me in Firefox standard privacy mod...

Figure 7.7 The global browser software market share in 2023

Figure 7.8 The PAT workflow when the user has no tokens to redeem

Figure 7.9 The PAT workflow when the user has tokens to redeem

Guide

Cover

Table of Contents

Title Page

Copyright

Dedication

About the Author

About the Technical Editor

Acknowledgments

Introduction

Begin Reading

References

Index

End User License Agreement

Pages

iii

xvii

xviii

xix

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

223

224

225

226

227

228

229

230

iv

v

vii

ix

xi

xii

231

The Reign of Botnets

Defending Against Abuses, Bots and Fraud on the Internet

 

David Sénécal

 

 

 

 

 

 

Introduction

I've been interested in technology since a very young age with a particular attraction to computers, even if in the late 1980s and 1990s their capabilities were limited compared to what we have today. When I finished high school, the Internet existed but was not widely available. When it came time for me to choose a major for my college application, I looked for something that would allow me to learn and work with this emerging technology. I graduated from the Paul Sabatier University in Toulouse in the South of France in 1998 with a major in electrical engineering with a specialty in computer networking and telecommunications. Armed with this unusual high-tech degree and my knowledge of network protocols and computer programming, I started my career as a network administrator for a major insurance company (Les Mutuelles du Mans Assurances – MMA) in France, overseeing and enhancing the headquarters’ network, supporting more than 5,000 users. After a few years, with my solid understanding of networks and telecommunication, I felt I needed an extra challenge. I moved to England to work as a multilingual technical support engineer for Azlan, a company later acquired by Tech Data, specializing in distributing networking equipment. Remotely helping customers configure and install their switches, routers, and firewalls was occasionally challenging. Doing so in French, English, and German and dealing with multiple regional accents made things even more interesting. Not only did I have to learn several products, but I also sometimes helped customers configure them in unexpected ways.

Several years later, I felt like introducing a change in my life again, and I moved to the United States, where I started working for Akamai Technologies. There, I became more familiar with the intricacies of the Internet. My focus was initially on helping companies accelerate their websites. I worked with the top brands on the Internet from various industries, including e-commerce, travel and hospitality, media, social media, healthcare, and banking. It quickly evolved to help secure their websites as well. What became rapidly apparent to me was that most of the traffic on any website came from bots, causing stability issues. The tools available at the time to defend against such activity (mainly web application firewalls) were only partially effective. New tools needed to be developed to deal with the problem more effectively. So, once more, I decided to get out of my comfort zone and started building a product focusing mostly on bot detection. After all, how hard could it be? This started a new phase of my career as a product architect. At the time, I thought I'd work on solving this problem for a couple of years and then move on to the next challenge. I certainly managed to solve the original threat, but I did not anticipate how it would evolve then. More than 10 years later, I am still working on bot management.

Bot management products evolved rapidly and grew in complexity while becoming a must-have product for protecting life online. However, existing knowledge on bot and fraud detection is fragmented, surrounded by many misconceptions fueled by marketing pitches, myths, and sometimes outdated best practices. This makes the subject much more confusing and frustrating for web security professionals and website owners to understand. The lack of understanding of the problem prevents them from dealing with it effectively, ultimately benefiting fraudsters.

While building bot management products, educating security professionals became a big part of my mission. My peers, the sales force, the product support staff, and, more importantly, customers looking to use my products to protect their online business needed to be trained. Good content that goes to the heart of the problem in simple terms is hard to find and mostly nonexistent. So, I thought: maybe I should write a book! Because, after all, how hard could it be? It turns out it's not easy but somewhat easier and less time-consuming than building a bot management product! I persevered and wrote this book to cover the knowledge gap on the threat landscape and defense strategies. I want to unveil the mystery, clear up some misconceptions, clarify best practices, and make bots and fraud detection more accessible. This book focuses on the bot management concepts and applies to any product, whether from a vendor or homegrown.

This book aims to provide a comprehensive overview of the threat landscape and defense strategies. It provides some insight into the evolution of attacks and defense strategies over time, the motivation of attackers, how detection methods work, and how to analyze the traffic to assess accuracy and decide on the most appropriate response strategy. The knowledge acquired from this book will help security teams regain their advantage over attackers.

Who Should Read This Book

The target audience for this book includes web security professionals, website administrators, and anyone interested in or wanting to learn more about web security and, more specifically, bot management and automated fraud detection. No specific prior knowledge or experience is required to understand the content of this book.

Beginners will learn the basics of the Internet and web security while progressively diving deeper into bots, fraud, and abuse detection and mitigation. Web security practitioners with intermediate or advanced knowledge will better understand the threat evolution and the methods and best practices to mitigate attacks consistently and successfully. Executives and decision-makers reading this book will better appreciate the topic without the common vendor buzzwords or marketing bias, which will help them ask the right questions and make informed buy or build decisions. Technology managers (product managers) and implementers (security architects, developers, solution architects) will better understand the context of the bot problem and the best practices to integrate and use bot management technology to drive the most optimal outcome. Data scientists, data analysts, and security operation support staff monitoring and evaluating the activity detected will be able to interpret the data with a full understanding of the problem and help make data- and context-driven decisions to support the needs of their organization. Students in the field of computer science who are attracted to the cybersecurity space will gain a general understanding of the most critical security issues that affect online businesses today.

Any online business that generates significant revenue is at risk of fraudsters attacking their website using botnets to steal information, take over their users’ identity, and make off with any assets included in the accounts. E-commerce sites (Amazon, Nike, Macy's), social media and dating sites (Facebook, LinkedIn, Match.com), fintech/banking sites (Bank of America, U.S. Bank, Wells Fargo), digital media (Netflix, Hulu, NBC), and gaming websites (Roblox, Electronic Arts, Epic Games) are all targets of bot and fraud attacks abusing the resources available on the website.

1A Short History of the Internet

Our journey begins with a description of the evolution of the Internet and the emergence of a new type of fraud and abuse that leverages botnets.

From ARPANET to the Metaverse

The Internet is so ingrained in our day-to-day life that it seems as though it's always been around. However, the Internet is a relatively new invention—and it keeps evolving. The precursor of the Internet, called the Advanced Research Projects Agency Network (ARPANET), was invented in the 1960s, in the middle of the Cold War, to ensure continuity of availability of the network and computing resources even after a portion of it is removed or destroyed. Government researchers could also share information quickly without requiring them to travel to another location. ARPANET was a closed system using proprietary protocols, and only explicitly authorized people could access it. The idea of a network where one could share information and computing resources sparked the interest of academics, and the need for standardized communication protocols arose. Various communication protocols, including Transmission Control Protocol/Internet Protocol (TCP/IP), Hypertext Transfer Protocol (HTTP), and Domain Name System (DNS), were developed in the 1980s, marking the birth of the Internet as we know it today. TCP/IP defines how information is exchanged between two machines on the Internet. DNS, the equivalent of the phone book, transforms a hostname into the IP address where the service can be found. HTTP defines how web content is to be requested and shared between the browser running on the client and the web server. These protocols enable communication between systems from different vendors and connect them. Secure Sockets Layer (SSL) and, later, the Transport Layer Security (TLS) protocols add a layer of security and safety to the HTTP protocol. Newer languages like HyperText Markup Language (HTML) and JavaScript were invented to help develop websites and make content available in a structured and dynamic way.

Initially, the Internet was reserved for the technical elite who knew the protocols, had the right equipment, understood how to connect to the network, and knew how to query it to retrieve information. The development of web browser software in the 1990s, like Netscape and Internet Explorer, compatible with all of the aforementioned protocols and languages, made the Internet accessible to all. Web search engines such as AltaVista, Yahoo! Search, and Google Search also made it easier to query and find information online. When I was a college student in the 1990s, the Internet was in its infancy. All you could do was visit various websites to find information. Most news outlets would have a website with the latest sports results or events of the world. Major retailers started to create websites to showcase their products, and airlines advertised their flights. But e-commerce wasn't quite a thing just yet, and we still had to go to a brick-and-mortar shop to buy products or to a travel agency to book a flight.

Rapid technological advancement, including faster modems and expansion of the network infrastructure, supported the growth of the Internet. As the Internet grew more popular, investors started pouring money into a multitude of Internet companies with the hope of turning a profit one day.  These companies' valuations, which were purely based on speculative future earnings and profits, surged in the late 1990s with record-breaking initial public offerings (IPOs) that saw their stock triple within a day. These events fueled an irrational investment strategy from venture capital firms to companies that sometimes did not have a strong business plan or viable products for fear of missing out. In March 2000, large stock sell orders from leading high-tech companies like Cisco or Dell caused a panic sale and marked the beginning of the decline of the “Internet bubble.” Investors became more rational, and capital became harder to find for startups that were not profitable. Many of these cash-strapped startups disappeared rapidly. Companies that reorganized and refocused their effort on developing valuable services and products survived, and some, like Akamai Technologies, Google, Amazon, and Apple, became very successful and key players in the development of the Internet.

When the bubble burst, it felt like a setback, but eventually, the Internet not only survived but started to thrive. As the quality of the Internet network improved, so did the content. The classic dial-up modem connection that had a maximum speed of 56Kbps was soon replaced by a more advanced and reliable network and telecom infrastructure. Integrated Services Digital Network (ISDN) offered speeds of up to 128Kbps, more than double what a dial-up modem could achieve. At the turn of the century, digital subscriber lines (DSLs), which offered high-speed Internet, became more widely available through conventional telephone networks, cable, and fiber optics. Today, Internet service providers offer connections as fast as 10Gbps, which is 178,571 times faster than the fastest dial-up modem. Advancements in mobile telecommunication and the emergence of smartphones meant that consumers could access the Internet from anywhere at any time for the first time. Mobile network expansion also helped expand the reach of the Internet to rural areas. Today, one can even browse the Internet while on a plane or cruising on the ocean, thanks to satellite networks.

As more and more people were drawn to the Internet, the distribution of rich content became a real issue. The networks that carried the Internet traffic did not always have the adequate capacity to handle the demand. Telecom operators would do their best to route the traffic, but frequent congestion and often long distances between the client and the server led to slow page load or stream buffering for Internet users, especially during popular events. Content Delivery Network (CDN) companies like Akamai Technologies, Fastly, and Cloudflare, to name a few, became the backbone of the Internet. CDNs helped fix the problem by avoiding transporting the content long distances and bringing it closer to the user. CDNs helped make the Internet faster and more reliable. I've worked on and off for the biggest CDN company in the world, Akamai Technologies, since 2006 and saw the Internet evolve from a front-row seat.

Let's look at different types of websites and services that became available on the Internet and how they managed to turn their online presence into a revenue stream.

Social Media  The first decade of the 21st century saw the emergence of social networks with Myspace, Hi5, Friendster, Tagged, Bebo, Pinterest, Instagram, Facebook, Twitter (now X), Google+, YouTube, and LinkedIn. Storing and delivering user-generated content (photos, videos, articles) to someone's restricted circle was challenging and costly. CDN providers like Akamai had to adapt to the new trend and develop a multi-tier caching strategy to store and deliver content efficiently. Many of the early social media companies did not survive, mainly because they could not figure out how to monetize their content. Facebook, Instagram, X,  YouTube, and LinkedIn fared the best and remain the biggest social networks in America and Europe. But these established platforms are getting some competitive pressure from new entrants like TikTok, favored by younger crowds. The primary source of revenue for social media companies comes from online advertisements or premium membership.

Dating Websites   Dating sites such as Match, eharmony, and Tinder piggybacked on the social networking model. The business model and monetization aspects were much more straightforward for them. Instead of flooding their users with ads, they would charge a monthly subscription fee to give them access to millions of profiles and connect them with compatible people who share their interests.

Media Websites   Websites belonging to the largest broadcasters, like NBC, first published news articles or content about shows on their site. Then, progressively, they started streaming their programs online or making them available on demand. It took broadcasters a while to find a way to monetize the Internet, but, in the end, the solution to monetize free content consisted of building technology to interrupt playback or a live stream with commercials. Later, media sites also introduced online subscriptions or pay-per-view for premium content. What was challenging initially for broadcasters was the need to support multiple proprietary formats such as QuickTime from Apple, Windows Media, and Adobe Flash. It was also a significant headache for CDN companies, as they had to maintain several networks to support all these formats. Standardization of protocols like HTTP Live Streaming (HLS) or WebRTC normalized the streaming methods. What made things even more challenging was that the screens that users used to watch the content became bigger and bigger with the introduction of smart TVs. Media websites wanted to offer the same quality of picture whether the user watched through traditional cable or satellite services or online. The image quality also had to be the same whether the user watched from a phone connected to a mobile network, a tablet connected to a medium-speed residential Internet service provider (ISP), or a large-screen TV connected to a high-speed Internet connection. This required CDN companies to support different bitrates and ever-increasing standards, starting with standard resolution, followed by High Definition (HD), Ultraviolet, 4K, and Over-the-Top (OTT).

Retailers   The Internet allowed retailers to have a point of sale open 24/7. The consumer could browse products and shop from the comfort of their home any time of the day or night, and the product would be delivered a few days later. It required retailers to open new procurement centers to fulfill the orders, but this proved a very lucrative business, generating millions of dollars daily. It took a few more years for grocery stores to offer an online shopping experience with a delivery service since dealing with fresh produce is more complex. E-commerce transformed how retailers interacted with customers and opened new opportunities to expand their brand outside their traditional audience or borders. I've seen many well-known brands offer international shipping over time, turning their website into a global store overnight.

Gaming   In the early days of the Internet, the gaming industry used the Web to advertise its products, which were available only as physical media to run on a personal computer or game console. In the past, if someone wanted to play video games with their friends, they first needed a game that supported multiplayer capability and a game console and they needed to go to their friend’s house. Little by little, consoles could connect to the Internet. New games were developed to support playing online with other people. At the same time, gaming applications for mobile devices that offered an online experience by default emerged. An avatar represents each player, and a player can even buy digital content to outfit their avatar and interact with others who may live on the other side of the planet. While before the main revenue stream for video game studios was only selling the game, online gaming opened up new opportunities to sell additional packages or extensions to enhance the playing experience. For the increasing number of free games available on the Apple App Store or Google Play, the in-game purchase option that generally opens a more exciting or interactive experience becomes the only source of revenue for the publisher.

Banks and Financial Institutions   Banks and financial institutions adopted the Web and offered online services in the mid-1990s. However, because the industry is more regulated and due to the nature of their business, this led to a more conservative attitude toward adopting CDN technologies to accelerate and secure the user's transaction. Their main hesitation was allowing the CDN service to handle their certificates. The reluctance was understandable, considering the risk associated with the transactions and content they were dealing with. They needed to ensure their user identity and life savings would stay safe and secure.

Healthcare Providers   Like the banking industry, healthcare providers were conservative in adopting the Web to ensure they could stay compliant with regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and not inadvertently leak their patients' medical records.

Looking back over the last 20 years, the Internet has changed everything: it has altered the way we shop, bank, interact with our healthcare providers, interact with each other, book our vacations, explore the world, and even work! In the past, someone starting a new business was required to open a physical location. It was only possible to sell products with points of sale at various strategic locations that consumers visited. Now, one can quickly open an online shop and make their product available worldwide. A company doesn't even need to have physical offices anymore. With collaboration tools like Zoom, Webex, Teams, and Skype, meeting virtually from anywhere and running a business is easier. This has been a lifesaver and allowed businesses to survive during the COVID-19 pandemic. The rare few companies that did not have a solid online strategy struggled or disappeared.

Now that the Internet is very well established, we have started to hear about the Metaverse. But what is the Metaverse? The Metaverse is a vision of what the Internet and the online experience may become in the future. But not everyone has the same idea of what that looks like. Some say virtual reality (VR) or augmented reality (AR) technology is key to the Metaverse. The experience will be similar to playing games through an Oculus device. The Metaverse may include some existing components, such as cryptocurrency, that would represent the main means of buying goods or services. But what would those goods be? Are we talking about real products that will be delivered to our doorstep or a virtual shirt and hat to dress an avatar? Maybe all of the above? What about people? Will we interact with real people or AI entities trained to converse on various topics? During the pandemic, I participated as a speaker and attendee in virtual trade shows and security conferences where one could attend pre-recorded or live presentations and ask questions to the presenter through a chat application. One could also visit virtual booths on the virtual show floor to discuss with a sales representative about products solving various security problems. The experience was a mix of video games, social media, video streaming, Zoom meetings, and a good dose of awkwardness. One could navigate the various areas of the conference and join group activities, presentations, and chatrooms to meet other professionals and discuss topics or even join a virtual happy hour to sip wine sent by mail by the event sponsor. The experience was so different from an in-person conference that it fits what the Metaverse means to me. Now, one could argue that none of this is it since all I described already exists. It's always difficult to envision the technology of the future, as we can only imagine it based on what we know is possible today. What the Metaverse will be also depends on what technology or experience users will be the most comfortable with and adopt. Further technological advancement in computing will also be required to support the complex and rich interactions. If we look at how our parents' generation envisioned the future 30 years ago, they got a few things right but also many things wrong. For example, I remember seeing a documentary about the house of the future when I was a kid, and it featured a device that looked like a tablet where one could read the news. That part became a reality with tablets and smartphones, which is how most people consume news today. That same documentary also described something like virtual reality. However, they made it sound like holograms would be common by now, and no one ever talked about wearing those bulky goggles. What things did they get wrong? Well, nothing to do with the Internet, but where's my hoverboard, and what about flying cars? It's probably a good thing they got those concepts wrong!

The Different Layers of the Web

The Web is not uniform, and not all content is readily visible and accessible by all. Figure 1.1 presents the different layers of the Web as an iceberg.

The visible and easy-to-access part is known as the surface web. Its content is discovered, indexed, and found through web search engines. It comprises all e-commerce, travel, news, social media, gaming websites, and more. The content can be accessed freely. This part of the Web represents only about 4% of the Web.

Next comes the deep web. Access to the content and resources is generally restricted behind authentication mechanisms and may contain private information. Most industries are represented in this part of the Web. For example, in the case of e-commerce, the deep web corresponds to the user's purchase history, shopping reward, or tailored services. For social media, it consists of posts and pictures that users publish and share only with their network. This corresponds to an account holder's bank statement in banking or a patient's medical records for healthcare. The deep web represents about 90% of the Web and is legitimate. Access to the content of resources is restricted to the population subscribing to specific services, and web search engines cannot discover and index the content.

Figure 1.1The different layers of the Web

Finally comes the mysterious dark web, often associated with criminal and illegal activity. This is true but somewhat of a misconception since the dark web was initially designed to provide Internet users with anonymity. For example, the infamous The Onion Router (TOR) network was initially invented to protect American intelligence online communication. The code was released in 2004 under a free license and later launched as a free service to all Internet users who wanted to browse without being tracked. Anonymity is, of course, what criminals and hackers look for so that they can carry out their schemes without being disturbed or traced. The dark web is where one can find marketplaces to acquire stolen data from various security breaches, accounts harvested through credential stuffing attacks, or stolen credit cards. You'll also find forums for hackers who exchange methods and best practices to carry out various fraud schemes without being detected or caught. The dark web accounts for about 6% of the Internet. Sellers offer multiple categories of products or services on the dark web. Figure 1.2 shows the now-defunct AlphaBay marketplace that once offered various categories of products, including fraud, hacking and spam, malware, drugs, and illegal chemicals. The FBI announced the takedown of AlphaBay in 2017.

Figure 1.2The AlphaBay marketplace on the dark web

The Emergence of New Types of Abuses

When companies develop a website, they usually focus on specific use cases. For example, suppose a retailer makes its inventory available online. In that case, its focus is on showcasing its products, giving its users the best possible experience, and making the shopping experience as easy as possible to increase its revenue. Security is not part of their core business, and they would not necessarily think about how someone with not-so-good intentions could exploit their site to defraud them and their users. The expansion of commerce on the Internet provided many opportunities for new abuses. In retrospect, it's easy to criticize past design choices made by software companies or website developers when new vulnerabilities that seem evident and outrageous today are found, especially when one realizes that the vulnerability existed for years. However, as a longtime product and software designer, I can attest to the difficulty of anticipating how a feature or workflow could be misused and exploited to commit fraud or abuse a resource. Some hackers specialize in developing viruses that exploit vulnerabilities in the operating system or browser running on the user's machine. Computer viruses have been omnipresent for years, and the ever-increasing ease of exchanging information through a growing number of communication channels has made their spread easier. Viruses can serve multiple purposes, including spying on the user's activity and collecting information like credentials or credit card numbers by logging key presses, stealing or encrypting the content of a disk (ransomware), or serving as a relay in a botnet. Antivirus is part of the enterprise security strategy and is beyond the scope of this book. Other hackers choose to exploit public website's resources and application programming interfaces (APIs) to collect information and defraud users. Protecting against such attacks is within the realm of application security, which we'll discuss further in the following chapters.

As product architects and developers, we follow certain design principles to achieve a specific goal, and overall, we want to keep things as simple as possible to make a product as easy as possible to use and maintain. Thankfully, the software development community has learned about common exploits and past mistakes over time. Most developers also go through yearly secure coding training. Security awareness has improved along with coding practices, but all this knowledge is relatively new and did not exist when the first e-commerce sites started to appear.

How attacks were classified changed over time based on the improved collective understanding of the attacker's motivations. The 2000s saw the rise of what is now known as application-layer attacks. SQL injection, cross-site scripting (XSS), command injection, and cross-site request forgery (CSRF) are commonly used to exploit vulnerabilities on a site to steal information or money or sometimes deface a website. The Open Worldwide Application Security Project (OWASP), MITRE’s Common Vulnerabilities and Exposures (CVE) Program, and other organizations have been vital in spreading awareness of new vulnerabilities, providing guidance on how to solve them, and encouraging better coding practices. The development of open-source web application firewalls (WAFs) such as ModSecurity or equivalent commercial offerings has also been vital in proactively preventing such exploits even when vulnerabilities exist on the site.

By the early 2010s, there was much talk about denial-of-service (DoS) attacks—or their distributed denial-of-service (DDoS) variant. DoS/DDoS attacks come in two primary flavors. First, there are activist attacks, like the infamous Anonymous group that targeted companies or organizations because of their position on specific issues. I'm not going to debate whether Anonymous' motivations were right or wrong, but they were undoubtedly disruptive when they managed to rally a large crowd to their cause. Those with experience dealing with them probably remember the infamous Low Orbit Ion Cannon (LOIC) or the High Orbit Ion Cannon (HOIC) DoS tools, both point-and-shoot attack tools that offered various options. Sites that were not well protected would soon get overwhelmed by the load and essentially be taken offline, which sometimes led to me having to do an emergency integration of the web security product from the company I was working for at the time to protect a site. It often happened on a Friday evening. (Come on, guys, have some compassion for the hard-working web security community here!) A well-executed DDoS attack against a retailer on a critical day like Black Friday can lead to millions of dollars in losses. When these attacks became more common, web security companies extended their WAF, offering DoS/DDoS protection. This mainly consisted of rate-limiting features, IP reputation, and a set of rules designed to detect known attack tool signatures like LOIC or HOIC, most of the time also coupled with a dedicated security analyst who could craft new rules to deal with more recent attack signatures or so-called zero-day attacks.

Beyond the activist attacks, a lot of what is perceived as a DDoS is, sometimes, overzealous botnet traffic scraping a site to collect product descriptions, pricing, and inventory, running a credential stuffing attack to test if a username/password combination is valid, or other types of attacks we'll discuss in detail in Chapter 2, “The Most Common Attacks Using Botnets.” In the early 2010s, no one talked much about bot attacks. But I saw firsthand that a lot of the traffic causing site availability issues was poorly calibrated bot activity. Now, you must remember that in the case of content scraping or a credential stuffing attack, an attacker who takes the site offline is making their attack less effective since it will increase the time it takes to complete the task and, ultimately, their cost. So, a denial of service caused by a botnet is an unintended consequence since their intent is not to take the site down but rather exploit it. Botnet operators quickly learned how to work around the DDoS detection in place, making them ineffective.

At first glance, excessive activity on the login API, for example, can prevent real users from logging in and purchasing products. This can be perceived as a DDoS attack from an activist group wanting to impact the revenue of the targeted company. When looking closer into the attack, one will discover that each request is for a different username/password combination. Let's look at it from the attacker's point of view and reflect for a minute on why login is such a popular target. In general, e-commerce sites encourage their customers to create an account to get discounts, facilitate their purchases, and improve their overall experience when visiting the site. Consumers can then manage their accounts remotely and review their purchases and credits. They also provide their personal information, including phone number, postal address, credit card or bank account number…basically the type of information or services hackers can exploit or monetize.

Gone are the times when criminals would break into a facility or home, storm the filing cabinet or safe, and steal whatever paperwork or money they could find. Now, this source of information is potentially available right at their fingertips with far less risk involved. What's more, we're no longer looking at potentially hardened criminals. Anyone can be a hacker, like computer-savvy teenagers, who may be committing crimes without realizing that what they are doing is illegal. The tool of choice is no longer a crowbar to break a door or window. A laptop, some programming skills, a well-crafted Python script, a few virtual machines running in the cloud, and an armada of proxy servers are the essential ingredients of botnets and have become more effective tools for criminals. There were only a few bot management offerings in the early 2010s; this gave me the idea to look closer at the characteristics of attack traffic, and I started building my first bot management product.

The Proliferation of Botnets

The term bot is short for robot, which generally designates software running on a computer designed to perform a specific task. In the web context, this task mainly consists of collecting the data available on the Web or automating certain transactions. In the fraud and abuse context, the task consists of abusing various site functionalities and performing multiple types of attacks known as account takeover, credential stuffing, or account opening abuse, to name a few. Chapter 2 discusses in greater detail the various types of attacks where bots are used.

A botnet is a network of bots that run the same software designed to accomplish a specific task. A command-and-control center coordinates the activity of the individual bots. The botnet's size can vary depending on the task and the level of sophistication required to defeat any protection in front of the targeted website. Figure 1.3 represents the high-level structure of a botnet.