Intelligent Security Systems - Leon Reznik - E-Book

Intelligent Security Systems E-Book

Leon Reznik

0,0
91,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

INTELLIGENT SECURITY SYSTEMS Dramatically improve your cybersecurity using AI and machine learning In Intelligent Security Systems, distinguished professor and computer scientist Dr. Leon Reznik delivers an expert synthesis of artificial intelligence, machine learning and data science techniques, applied to computer security to assist readers in hardening their computer systems against threats. Emphasizing practical and actionable strategies that can be immediately implemented by industry professionals and computer device's owners, the author explains how to install and harden firewalls, intrusion detection systems, attack recognition tools, and malware protection systems. He also explains how to recognize and counter common hacking activities. This book bridges the gap between cybersecurity education and new data science programs, discussing how cutting-edge artificial intelligence and machine learning techniques can work for and against cybersecurity efforts. Intelligent Security Systems includes supplementary resources on an author-hosted website, such as classroom presentation slides, sample review, test and exam questions, and practice exercises to make the material contained practical and useful. The book also offers: * A thorough introduction to computer security, artificial intelligence, and machine learning, including basic definitions and concepts like threats, vulnerabilities, risks, attacks, protection, and tools * An exploration of firewall design and implementation, including firewall types and models, typical designs and configurations, and their limitations and problems * Discussions of intrusion detection systems (IDS), including architecture topologies, components, and operational ranges, classification approaches, and machine learning techniques in IDS design * A treatment of malware and vulnerabilities detection and protection, including malware classes, history, and development trends Perfect for undergraduate and graduate students in computer security, computer science and engineering, Intelligent Security Systems will also earn a place in the libraries of students and educators in information technology and data science, as well as professionals working in those fields.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 818

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



IEEE Press445 Hoes LanePiscataway, NJ 08854

IEEE Press Editorial BoardEkram Hossain, Editor in Chief

Jón Atli Benediktsson  

Xiaoou Li  

Jeffrey Reed  

Anjan Bose  

Lian Yong  

Diomidis Spinellis  

David Alan Grier  

Andreas Molisch  

Sarah Spurgeon  

Elya B. Joffe  

Saeid Nahavandi  

Ahmet Murat Tekalp  

Intelligent Security Systems

How Artificial Intelligence, Machine Learning and Data Science Work For and Against Computer Security

Leon Reznik

Rochester Institute of Technology, Rochester, New York, USA

Copyright © 2022 by The Institute of Electrical and Electronics Engineers, Inc.All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty:While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging‐in‐Publication Data Applied for :

ISBN: 9781119771531

Cover Design: WileyCover Image: © AerialPerspective Works/E+/Getty Images

To my family, Alexandra, Dmitry, Michelle, and my students

Acknowledgments

Many organizations and individuals helped this book to appear, including my colleagues, students, editorial staff, family, and friends.I would like to thank all of you but have to limit the list of names.Thank you very much:Adam, Adrian, Adwait, Aileen, Akhil, Akshay, Alex, Alok, Amit, Arpit, Ashwin, Andrew, Andrey, Ankan, Anna, Anthony, Asif, Ayush, Benjamin, Brian, Carl, Chinmay, Christian, Darrell, Devang, Dhaval, Dhivya, Dileep, Dinesh, Dmitry, Elisa, Forum, Gaurav, George, Greg, Howard, Igor, James, Jeffrey, Jeton, Jinesh, Jody, Joe, Josh, Juan, Juliet, Justin, Karl, Karteek, Krishna, Kurt, Maninder, Mansha, Matthew, Michael, Michelle, Milan, Mohammed, Mohan, Ninad, Ninel, Olga, Omar, Parinitha, Parth, Pooja, Praful, Punit, Qiaoran, Raja, Ravina, Renzil, Richard, Rishi, Robert, Rohan, Rohit, Roman, Ron, Sagar, Sahil, Salil, Samir, Sanjay, Sandhya, Saransh, Saurabh, Scott, Sergey, Shashank, Shravya, Simran, Stanislaw, Sudhish, Suraj, Suresh, Swati, Tayeb, Tejas, Utsav, Vanessa, Virendra, and Vladik.

At last but not at least, I want to acknowledge that some research reported in this book was supported in part by the following recent grants provided by:National Science Foundation (award # ACI‐1547301),National Security Agency (award # H98230‐I7‐l‐0200), andUS Military Academy/DoD (award # W911NF2010337).

Introduction

I.1 Who Is This Book For?

This book’s main goal is to provide help to its readers and users:

students in computer security, science, engineering, IT and information systems‐related fields, both undergraduate and graduate, looking for a textbook to gain the knowledge and skills at the intersection of computer security and artificial intelligence, machine learning, and data science domains;

their instructors at the universities, colleges, and institutions of higher education looking for a textbook and curriculum materials (review and test questions, notes, exercises, slides) to use in developing new and modifying existing courses;

professionals in the computer security area looking for a reference book to upgrade their skills and better understand intelligent techniques;

professionals and researchers in the field of artificial intelligence and data science looking for advice on where and how to apply their knowledge and skills in computer security domain.

While reader’s general background in computing, networking, security, and artificial intelligence is desirable, the book is self‐contained and starts with a review of computer security and intelligent techniques that should provide a sufficient foundation for further study.

I.2 What Is This Book About?

This book aims at helping its readers to better understand how to apply artificial intelligence, machine learning, and data science in the computer security domain. It will introduce readers into the current state of an application of intelligent methodologies in computer security and information assurance systems design. As the design and operation of most of computer security systems and tools are based on an application of intelligent techniques, gaining deeper understanding and practical skills in this field would allow the readers to get better prepared either to enter the workforce or to upgrade their skills. The book merges the most advanced methodologies of artificial intelligence and machine learning with their applications in cybersecurity. The readers will gain knowledge in the hottest area of the current computer science and will be able to employ it in solving cybersecurity problems.

Unfortunately, currently there exists a gap between computer security practice, where professionals mostly employ various tools, often without a deep understanding of their design and functionality principles and comprehension of computer science methods and algorithms in general, and artificial intelligence, machine learning techniques, and data science in particular. The students and even the professionals do not realize that most tools they employ in computer security have been designed based on an application of intelligent methodologies. This knowledge lack does not let them design better tools and even employ existing ones more effectively and efficiently. The unique approach of this book is that it is designated to fill this gap by concentrating on the design features of computer security tools and mechanisms on one hand and discussing how intelligent procedures are employed in the industrial practice.

This book idea is innovative and unique. It merges together various knowledge areas as diverse as artificial intelligence and machine learning techniques and computer security systems and applications. By going across traditional border lines between various disciplines, it will allow the readers to acquire a unique knowledge in the very intense knowledge domain intersecting intelligent methods with computer security applications and to become much better prepared for computer security practice challenges. It aims at developing both theoretical knowledge as well as research and practical skills.

The book doubles as both a textbook and a reference book. From the education perspective, the book bridges education in cybersecurity domain with computer science and new data science programs, helping to advance all of them together. The content ranges from an explanation of basic concepts to the brief description of available tools. The writing style includes a traditional narrative as well as formulating and answering essential questions that will guide the presentation. The questions will help in self‐education as well as will assist instructors who might like to use them in their courses to get better prepared for possible student’s inquires. The book includes exercises. Slides will be available on the author’s website, https://www.cs.rit.edu/~lr/. Instructors will be provided with the list of suggested test and exam questions.

I.3 What Is This Book Not About?

The book is oriented toward computer security practice, not its mathematical foundations. The book will teach how to design the prolific computer security systems and tools such as firewalls, intrusion detection systems, anti‐malware protection systems, hacking activities, and attacks recognition tools. The readers will gain deeper understanding of those systems and tools design. While discussing machine learning and data science algorithms, it does not go deep in mathematical details but prefers concentrating on possible applications.

Some other manuscripts claim to provide a comprehensive coverage of either the computer security or the artificial intelligence, machine learning, or data science domain. With both domain’s extremely wide content areas, this book is not aiming at the full review of two of the currently hottest areas in modern engineering and technology. Instead, the book is fully devoted to the exposure of applications of artificial intelligence, machine learning, and data science in the design and analysis of computer security systems, mechanisms, and tools as well as solving other security problems. It will discuss an application of intelligent techniques in firewalls, intrusion detection, malware detection, hacking activity recognition, and system security evaluation. It will review various attacks against computer security, ranging from simple phishing inquires to sophisticated attacks against intelligent classifiers based on machine learning techniques. While not giving 100% exposure of computer security or artificial intelligence domains, the book will deal with the most important growing areas of both fields. And the coverage ratio will increase as a bigger and bigger part of real computer security activities becomes stronger and stronger dependent on the artificial intelligence. With this knowledge, the readers will become frontrunners in the design of novel cybersecurity tools and mechanisms needed to protect computer networks and systems and national infrastructure.

I.4 Book Organization and Navigation

The book consists of six big chapters (see Figure I.1) covering the specialized topics including:

review of the modern state of the computer security and artificial intelligence, machine learning, and data science applications in the area;

firewall design;

intrusion detection systems;

anti‐malware methods and tools;

hacking activities, attack recognition, and prevention;

adversarial attacks against AI‐based computer security tools and systems.

Figure I.1 Book organization.

The book will be accompanied by presentation slides as well as samples of exercises, test and exam questions, research, and tool assignments.

From the computer security perspective, the book moves a reader from reviewing the current situation through the traditional first line of defense (firewalls) and the second line of defense (intrusion detection systems) to the discussion of the modern malware families and anti‐malware protection and toward hacker’s and ordinary user’s profiles and typical activities with finishing up by discussing the privacy protection systems and adversarial attacks using machine learning techniques.

From the artificial intelligence perspective, the book starts with the review of artificial intelligence, machine learning, and data science techniques and technologies, then discusses the logic of the rules‐based and expert systems, and proceeds with machine learning and data science applications in the computer security domain. It presents multiple algorithms and methods, especially focusing on artificial neural networks, including shallow learning models, deep learning procedures, and generative adversarial networks.

While the book content covers major security mechanisms as well as intelligent techniques they employ, they are distributed over all chapters. In respect to the techniques generally, the book moves from older (and possibly, simpler) methods to newer (and possibly, more sophisticated) ones. However, each chapter is self‐contained and could be studied separately from others.

In particular:

Chapter 1 discusses the basic concepts of computer security as well as the taxonomy and classification of the fundamental algorithms in the domains of artificial intelligence, machine learning, and data science in relation to their applications in computer security. It reviews the sources of security threats and the attacks, concentrating on the area of IoT and wireless devices, as well as examines the possible protection mechanisms and tools. The module provides a general classification of intelligent approaches and their relationship to various computer security fields. It focuses on an introduction of the major intelligent techniques and technologies in computer security, such as expert systems, fuzzy logic, machine learning, artificial neural networks, and genetic algorithms. While presenting multiple techniques, the text emphasizes their advantage in comparison to each other as well as the obstacles in their further progress. Short algorithm descriptions and code examples are included.

Chapter 2 introduces a firewall as the first line of defense mechanism. It provides its definition, discusses the functions, possible architectures, and operational models, concentrating on presentation of their advantages and drawbacks. It includes the step‐by‐step guide to firewall design and implementation process ranging from planning to deployment and maintenance. The major emphasis in this chapter is placed on using rules to set up, configure, and modify the firewall’s policy. Both generic and specific rules are discussed as well as their formulation and editing with firewall tools. Substantial rules design principles and conflict avoidance and resolution are presented.

Chapter 3 develops knowledge and practical skills on intrusion detection and prevention systems (IDS) design, their analysis, implementation, and use. It presents IDS definition, discusses their goals and functions as well as their progress from the historical perspective. It advances reader’s design and analysis skills in the computer security domain by discussing artificial intelligence and machine learning techniques and their application in IDS design and implementation as well as in classifying IDS systems, evaluating an IDS performance, choosing the IDS design tools and employing them in practical design exercise. Algorithm and code examples are provided.

Chapter 4 discusses malware types, its detection and recognition techniques and tools. It provides an extensive classification of various malware and virus families, discusses their taxonomy, basic composition, and comparison between them. Beyond pure malware examples, it reviews spam and software vulnerabilities too. Multiple real life cases and examples are provided. Then, it moves to presenting malware detection principles, algorithms and techniques, and anti‐malware tools and technologies. Their examples and use cases are included.

Chapter 5 starts with discussing how hacker’s demography and their culture have been changing over the last years. Then, it proceeds with presenting hacking attacks, techniques, and tools as well as anti‐hacking protection mechanisms. In the second part, it moves to the ordinary user’s profiles and authentication. Here, we show how to employ data science and statistical approaches to find out and analyze user’s characteristics and their influence on the security level of their computer practice. The module presents the computer device security evaluation. It discusses how to conduct analysis, observations, results, and recommendations for users to improve their overall security practices and the security of their devices. Also, it examines the hacking web fingerprinting attacks against the privacy protection TOR technology that utilizes machine learning as well as possible protection mechanisms. Examples and use cases are included.

Module 6 introduces novel adversarial machine learning attacks and their taxonomy when machine learning is used against AI‐based classifiers to make them fail. It investigates a possible data corruption and quality decrease influence on the classifier performance. The module proposes data restoration procedures and other measures to protect against adversarial attacks. Generative adversarial networks are introduced, and their use is discussed. Multiple algorithm examples and use cases are included.

I.5 Glossary of Basic Terms

This section lists standard terms used within the book and where to learn more about them.

Term

Additional term

Definition

Definition source

Book section to learn more

Example

Offense

Attack

Any kind of malicious activity that attempts to collect, disrupt, deny, degrade, or destroy information system resources or the information itself.

NIST SP 800‐12;

1.4

Cyber attack

An attack, via cyberspace, targeting an enterprise’s use of cyberspace for the purpose of disrupting, disabling, destroying, or maliciously controlling a computing environment/infrastructure; or destroying the integrity of the data or stealing controlled information.

NIST SP 800‐30 Rev. 1

5.1.5

Advanced persistent threat (APT)

An adversary with sophisticated levels of expertise and significant resources, allowing it through the use of multiple different attack vectors (e.g. cyber, physical, and deception) to generate opportunities to achieve its objectives, which are typically to establish and extend footholds within the information technology infrastructure of organizations for purposes of continually exfiltrating information and/or to undermine or impede critical aspects of a mission, program, or organization, or place itself in a position to do so in the future; moreover, the advanced persistent threat pursues its objectives repeatedly over an extended period of time, adapting to a defender’s efforts to resist it, and with determination to maintain the level of interaction needed to execute its objectives.

NIST SP 800‐39

1,6

Adversarial machine learning (AML)

AML is concerned with the design of ML algorithms that can resist security challenges, the study of the capabilities of attackers, and the understanding of attack consequences.

NISTIR 8269 (DRAFT)

6

Attack signature

A specific sequence of events indicative of an unauthorized access attempt.

NIST SP 800‐12 Rev. 1;

4.5

Brute force

A method of accessing an obstructed device by attempting multiple combinations of numeric/alphanumeric passwords.

NIST 800‐101

5.1.5.2

Colluded applications

Attack performed by two or more cooperating applications, when an application that individually incorporates only harmless permissions expends them by sending and receiving requests to a collaborating application.

5.1.8

Denial of Service

The prevention of authorized access to resources or the delaying of time‐critical operations. (Time‐critical may be milliseconds or it may be hours, depending upon the service provided.)

NIST 800‐12

5.1.5.2

Ex. 5.4

Eavesdropping

An attack in which an attacker listens passively to the authentication protocol to capture information that can be used in a subsequent active attack to masquerade as the claimant.

NIST 800‐63‐3

5.1.5.2

Impersonation

A scenario where the attacker impersonates the verifier in an authentication protocol, usually to capture information that can be used to masquerade as a claimant to the real verifier.

NIST 800‐63‐2

5.1.5.2

Phishing

Fraudulent attempt to obtain sensitive information or data by impersonating oneself as a trustworthy entity in a digital communication.

5.1.5.2

Ex. 5.3

Spoofing

Faking the sending address of a transmission to gain illegal entry into a secure system.

CNSSI 4009‐2015

5.1.5.2.

Ex. 5.7

Website fingerprinting

Attack that allows an adversary to learn information about a user's web browsing activity by recognizing patterns in his traffic.

5.4.2

Ex. 5.8

Zero day

An attack that exploits a previously unknown hardware, firmware, or software vulnerability.

CNSSI 4009‐2015

5.1.5.3

Cyber crime

Criminal activities carried out by means of computers or the Internet.

1.1

Ex. 5.2

Hacker

Unauthorized user who attempts to or gains access to an information system.

NIST SP 800‐12

5.1

Ex. 5.1, 5.2

Malware

Hardware, firmware, or software that is intentionally included or inserted in a system for a harmful purpose.

NIST SP 800‐12

4

Adware

Software that automatically displays or downloads advertising material (often unwanted) when a user is online.

4.2.6

Ex.4.11

Botnet

Attack conducted with the help of more traditional malware types, such as worms and Trojans.

4.2.9

Ex.4.15, 4.16,

Ransomware

Type of malware, which prevents users from accessing their system functionality or data, either by locking the system's screen or by locking the users' files unless a ransom is paid.

1.3, 4.2.7

Ex. 1.3, 1.4, 4.12, 4.13

Rootkit

A set of tools used by an attacker after gaining root‐level access to a host to conceal the attacker’s activities on the host and permit the attacker to maintain root‐level access to the host through covert means.

NIST SP 800‐150

4.2.8

Ex. 4.14

Spyware

Software that is secretly or surreptitiously installed into a system to gather information on individuals or organizations without their knowledge; a type of malicious code.

NIST SP 800‐12 1.3

4.2.5

Ex.4.10

Trojan horse

A computer program that appears to have a useful function, but also has a hidden and potentially malicious function that evades security mechanisms, sometimes by exploiting legitimate authorizations of a system entity that invokes the program.

NIST SP 800‐12

4.2.4

Ex.4.9

Virus

A computer program that can copy itself and infect a computer without permission or knowledge of the user. A virus might corrupt or delete data on a computer, use email programs to spread itself to other computers, or even erase everything on a hard disk.

NIST 800‐12

4.2.2

Ex. 4.3, 4.4, 4.5, 4.6

Worm

A computer program that can run independently, can propagate a complete working version of itself onto other hosts on a network, and may consume computer resources destructively.

NIST 800‐82

4.2.3

Ex.4.1, 4.2, 4.7, 4.8

Risk

The risk to organizational operations (including mission, functions, image, reputation), organizational assets, individuals, other organizations, and the Nation due to the potential for unauthorized access, use, disclosure, disruption, modification, or destruction of information and/or a system.

NIST 800‐12

1.3

Spam

Electronic junk mail or the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages.

NIST 800‐12

4.3

Threat

Any circumstance or event with the potential to adversely impact organizational operations (including mission, functions, image, or reputation), organizational assets, individuals, other organizations, or the Nation through a system via unauthorized access, destruction, disclosure, modification of information, and/or denial of service.

NIST 800‐12

1.2

Destruction

The process of overwriting, erasing, or physically destroying information (e.g. a cryptographic key) so that it cannot be recovered.

NIST 800‐88

Disclosure

Divulging of, or provision of access to, data.

NISTIR 8053

Unauthorized access

A person gains logical or physical access without permission to a network, system, application, data, or other resource.

NIST 800‐82

1.3

Ex. 5.2

Vulnerability

Weakness in an information system, system security procedures, internal controls, or implementation that could be exploited or triggered by a threat source.

NIST 800‐53

4.4.

Ex. 4.17, 4.18

Defense

Computer security or Cybersecurity

The ability to protect or defend the use of cyberspace from cyberattacks.

NISTIR 8170 under Cybersecurity CNSSI 4009

1.1

Computer security policy

Security policies define the objectives and constraints for the security program. Policies are created at several levels, ranging from organization or corporate policy to specific operational constraints (e.g. remote access). In general, policies provide answers to the questions “what” and “why” without dealing with “how.” Policies are normally stated in terms that are technology‐independent.

NIST 800‐82

1.1

Confidentiality

Preserving authorized restrictions on information access and disclosure, including means for protecting personal privacy and proprietary information.

NIST 800‐53

1.2

Integrity

Guarding against improper information modification or destruction, and includes ensuring information non‐repudiation and authenticity.

NIST 800‐53

1.2

Availability

Ensuring timely and reliable access to and use of information.

NIST 800‐53

1.2

Firewall

A

device

or

program

that controls the flow of network traffic between networks or hosts that employ differing security postures.

NIST SP 800‐41 Rev. 1

2

Application proxy

A firewall capability that combines lower‐layer access control with upper layer‐functionality, and includes a proxy agent that acts as an intermediary between two hosts that wish to communicate with each other.

NIST 800‐41

2.2, 2.3

Demilitarized zone (DMZ)

An interface on a routing firewall that is similar to the interfaces found on the firewall’s protected side. Traffic moving between the DMZ and other interfaces on the protected side of the firewall still goes through the firewall and can have firewall protection policies applied.

NIST 800‐41

2.1, 2.3

Network address translation(NAT)

A routing technology used by many firewalls to hide internal system addresses from an external network through use of an addressing schema.

NIST 800‐41

2.2

Packet filter

A routing device that provides access control functionality for host addresses and communication sessions.

NIST 800‐41

2.2, 2.3

Stateful inspection

Packet filtering that also tracks the state of connections and blocks packets that deviate from the expected state.

NIST 800‐41

2.2, 2.3

Virtual Private Network (VPN)

Protected information system link utilizing tunneling, security controls, and endpoint address translation giving the impression of a dedicated line.

NIST 800‐53

2.1, 2.3

Intrusion detection system (IDS)

A security service that monitors and analyzes network or system events for the purpose of finding, and providing real‐time or near real‐time warning of, attempts to access system resources in an unauthorized manner.

NIST 800‐82

3

Intrusion protection system (IPS)

A system that can detect an intrusive activity and can also attempt to stop the activity, ideally before it reaches its targets.

NIST 800‐82

Rule set

A collection of rules or signatures that network traffic or system activity is compared against to determine an action to take –such as forwarding or rejecting a packet, creating an alert, or allowing a system event.

NIST 800‐115

Ex. 3.1.

False negative or Missing attack

Incorrectly classifying malicious activity as benign.

NIST 800‐83

3.5

False positive or False alarm

Incorrectly classifying benign activity as malicious.

3.5

User authentication

Verifying the identity of a user, process, or device, often as a prerequisite to allowing access to resources in an information system.

NIST 800‐53

5.3

Techniques and Technologies

Internet

The single interconnected worldwide system of commercial, government, educational, and other computer networks that share the set of protocols specified by the Internet Architecture Board (IAB) and the name and address spaces managed by the Internet Corporation for Assigned Names and Numbers (ICANN).

NIST SP 800‐82 Rev. 2 RFC 4949

Algorithm

Formulae given to a computer in order for it to complete a task (i.e. a set of rules for a computer).

Conventional Techniques

String pattern search

Aho–Corasick

Dictionary‐matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text and attempts to match all strings simultaneously.

Ex. 4.20.

Boyer and Moore

An efficient string‐searching algorithm that is the standard benchmark for practical string‐search literature.

Alg 3.4

Knuth, Pratt, and Morris

Algorithm, which checks the characters from left to right, and when a pattern has a sub‐pattern that appears more than one in the sub‐pattern, it uses that property to improve the time complexity.

Alg 3.2

Naïve (brute force)

Very general problem‐solving technique and algorithmic paradigm that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement.

Alg 3.1

Rabin and Karp

A string‐searching algorithm that uses hashing to find patterns in strings.

Alg 3.3.

AI, ML, and Data Science Artificialintelligence(AI)

Interdisciplinary field, usually regarded as a branch of computer science, dealing with models and systems for the performance of functions generally associated with human intelligence, such as reasoning and learning.

1.5.2.

Fuzzy logic

Form of logic, which is much closer to human thinking logic and a natural language than traditional binary logic.

1.5.6, 5.2.4

Expert systems

Intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant human expertise for their solution.

1.5.5, 2.5, 5.2.4

Knowledge‐based

A knowledge‐based system is a computer program that reasons and uses a knowledge base to solve complex problems.

1.5.5, 2.5

Ex. 3.1

Artificial neural networks

A computing system, made up of a number of simple, highly interconnected processing elements, which processes information by its dynamic state response to external inputs.

1.5.8, 3.6.5

Autoencoders

The model that aims to reconstruct data from the input layer into the output layer with a minimal amount of distortion.

Backpropagation

Shorthand for “backward propagation of errors,” is a method of training ANN where the system’s

initial

output is compared to the

desired

output, then adjusted until the difference (between outputs) becomes minimal.

1.5.8

Convolutional

Multilayer topology with a few hidden layers, where each neuron receives its input only from a subset of neurons of the previous layer.

1.5.8, 5.4.2, 6.5.2

Ex.5.8

Deep belief

Composition of

Restricted Boltzmann Machines

(RBM), a class of neural networks with no output layer.

1.5.8

Generative adversarial networks (GAN)

Unsupervised learning technique that is capable to generate data with selected properties similar to a dataset of our choice.

6.5

Long Short Term Memory

Special type of recurrent topology, which has memory cells that maintain information in memory for a longer period.

5.1.8.3

Multilayer perceptron (MLP)

An ANN model, in which neurons compose a layer and layers are connected between each other creating an ANN with certain connectivity organization rules to follow up.

3.6.5.3, 4.6.3

Ex 4.22

Modified time‐based multilayer perceptron (MTBMLP)

ANN topology that consists of multiple time‐based MLPs, all connected to a single‐end MLP, with time series used as inputs.

3.6.5.3, 4.6.3

Ex. 4.22

Radial basis function (RBF)

ANN that uses radial basis functions as activation functions, producing an output, which is a linear combination of radial basis functions of the inputs and neuron parameters.

3.6.5.3

Recurrent

A multilayer topology, which includes the feedback loop that connects its output to the inputs.

1.5.8, 5.1.8.3

Data science

The field that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

1.5.3.

Machine learning

A subfield of AI that comprises the study of algorithms which have a capability to improve themselves automatically through experience to solve problems without external instructions, by using previously trained models.

1.5.3.

Intelligent agents

Autonomous entity, which acts, directing its activity toward achieving goals, upon an environment using observation through sensors and consequent actuators.

3.6.5.4

Deep learning

Machine learning method based on characterization of data learning.

1.5.3

Reinforcement learning

Algorithms, in which an agent decides what to do to perform the given task to maximize the given function.

1.5.7

Shallow learning

Techniques that separate the process of feature extraction from learning itself.

3.6.5.1

Supervised learning

Algorithms, which develop a mathematical model from the input data and known desired outputs.

1.5.7

Alg. 1.1.

Unsupervised learning

Algorithms, which take a set of data consisting only of inputs and then they attempt to cluster the data objects based on the similarities or dissimilarities in them.

1.5.7.

Alg. 1.2.

Decision tree

Tree‐structure resembling a flowchart, where every node represents a test to an attribute, each branch represents the possible outcomes of that test, and the leaves represent the class labels.

J48

Open source Java implementation of the C4.5 algorithm that builds decision trees from a set of training data using the concept of information entropy.

6.6.4

Genetic/evolutionary algorithms

Set of evolutionary algorithms, which take an inspiration from genetic evolution theories.

3.6.4, 3.6.5.4

Alg. 1.3

Hidden Markov models

Algorithm that builds up a set of states producing outputs with different probabilities with the goal to find out the sequence of states that results in the observed outputs.

K‐means

Clustering algorithm that uses a distance function to distribute all data pieces between k clusters defined by their centroid position in the feature space.

3.6.2

K‐nearest neighbor

Classification algorithm that uses a distance function in order to determine to which class to assign the new element by finding K closest elements in the feature space.

3.6.3, 5.3.5.4

Naive Bayes

Algorithm that consists of applying the Bayes theorem in order to find a distribution of conditional probabilities among class labels, with the assumption of independence between features.

Random forest

An ensemble learning method that builds a large group of independent decision trees, and outputs the mode of the label predictions of all the trees.

6.6.4

Sec.6.6.4

Support vector machine

Binary classification algorithm that creates a hyper plane that separates the data into two classes with the objective to maximize the gap perpendicular to the plane, allowing better generalization.

Please note: I realize that there exist various definitions and even understandings of these terms’ meaning. I have chosen to follow up the definitions given in the publications of the NIST Computer Security Resource Center (see https://csrc.nist.gov/glossary), first (see Section I.6) and then proceed with others (see Section I.7). Even those publications are ambiguous in some cases and provide different meanings too. I have chosen ones, which are followed up in this book. I do not intend to make this list all inclusive or exclusive.

I.6 The Cited NIST Publications

NIST SP 800‐12 An Introduction to Information Security, June 2017, available free of charge from: https://doi.org/10.6028/NIST.SP.800‐12r1

NIST SP 800‐30 Guide for Conducting Risk Assessments NIST, Sep. 2012, available at https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800‐30r1.pdf

NIST SP 800‐39 Managing Information Security Risk, March 2011, available at https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800‐39.pdf

NIST SP 800‐41 Rev. 1 Guidelines on Firewalls and Firewall Policy NIST, September 2009, available at https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800‐41r1.pdf

NIST SP 800‐53 Rev. 5 CNSSI 4009 Security and Privacy Controls for Information Systems and Organizations, September 2020, available at doi.org/10.6028/NIST.SP.800‐53r5

NIST 800‐63 Digital Identity Guidelines, June 2017, available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800‐63‐3.pdf

NIST SP 800‐82 Rev. 2 RFC 4949, Guide to Industrial Control Systems (ICS) Security, May 2015, available from: http://dx.doi.org/10.6028/NIST.SP.800‐82r2

NIST 800‐83 Revision 1 Guide to Malware Incident Prevention and Handling for Desktops and Laptops, July 2013, available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800‐83r1.pdf

NIST 800‐88, Revision 1: Guidelines for Media Sanitization, 5 February 2015, available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800‐88r1.pdf

NIST Special Publication 800‐101 Guidelines on Mobile Device Forensics, May 2014, available at http://dx.doi.org/10.6028/NIST.SP. 800‐101r1

NIST Special Publication 800‐115 Technical Guide to Information Security Testing and Assessment, Sep. 2008, available at https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800‐115.pdf

NIST Special Publication 800‐150 Guide to Cyber Threat Information Sharing, October 2016, available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800‐150.pdf

NISTIR 8053 De‐Identification of Personal Information, October 2015, available at https://nvlpubs.nist.gov/nistpubs/ir/2015/NIST.IR.8053.pdf

NISTIR 8170 Approaches for Federal Agencies to Use the Cybersecurity Framework

NIST, March 2020, available at https://doi.org/10.6028/NIST.IR.8170

NISTIR 8269 (DRAFT) A taxonomy and terminology of adversarial machine learning, May 2019, available at https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8269‐draft.pdf

I.7 Data and Information Sources Used

I.7.1 Glossaries in the Area of Cybersecurity

National Institute of Standards and Technology (NIST) provides a keyword searchable glossary of more than 6700 security‐related terms with references to a particular NIST publication. This Glossary consists of terms and definitions extracted verbatim from NIST's cybersecurity‐ and privacy‐related Federal Information Processing Standards (FIPS), NIST Special Publications (SPs), and NIST Internal/Interagency Reports (IRs), as well as from Committee on National Security Systems (CNSS) Instruction CNSSI‐4009 – see

https://csrc.nist.gov/glossary/

The National Initiative for Cybersecurity Careers and Studies of the Department of Homeland Security Portal provides cybersecurity lexicon to serve the cybersecurity communities of practice and interest for both the public and private sectors. It complements other lexicons such as the NISTIR 7298 Glossary of Key Information Security Terms. Objectives for lexicon are to enable clearer communication and common understanding of cybersecurity terms, through use of plain English and annotations on the definitions. The lexicon will evolve through ongoing feedback from end users and stakeholders – see

https://niccs.cisa.gov/about‐niccs/cybersecurity‐glossary#

SANS Institute glossary of terms – see

https://www.sans.org/security‐resources/glossary‐of‐terms/

Canadian Centre for Cyber Security’s glossary – see

https://cyber.gc.ca/en/glossary

I.7.2 Glossaries in the Area of Artificial Intelligence

Council of Europe, Artificial Intelligence Glossary – see

https://www.coe.int/en/web/artificial‐intelligence/glossary

Wikipedia, Glossary of Artificial Intelligence – see

https://en.wikipedia.org/wiki/Glossary_of_artificial_intelligence

I.7.3 Other Data and Information Sources Used

I.7.3.1 Antimalware Tools List and Comparison

Wikipedia https://en.wikipedia.org/wiki/Comparison_of_antivirus_software

Anti‐malware Reviews at http://www.antimalwarereviews.com,

Common Vulnerabilities and Exposures (CVE)® is a list of records – each containing an identification number, a description, and at least one public reference – for publicly known cybersecurity vulnerabilities – https://cve.mitre.org and CVE Details (https://www.cvedetails.com)

Comparison of computer viruses, Wikipedia, accessed at https://en.wikipedia.org/wiki/Comparison_of_computer_viruses – contains a unified list (currently a few dozen virus families) of computer viruses with their origin, isolation dates, and short descriptions

Computer worms, Wikipedia, accessed at https://en.wikipedia.org/wiki/List_of_computer_worms – contains a unified list (currently a few dozen) of computer worms with their origin, isolation dates, and short descriptions

Clam antivirus signature database, www.clamav.net.

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals. Their website provides a transparent repository for public datasets – https://www.kaggle.com/datasets

Spam datasets: LingSpam (csmining.org/index.php/ling‐spam‐datasets.html) and SpamAssasin (http://spamassassin.org/publiccorpus).

1Computer Security with Artificial Intelligence, Machine Learning, and Data Science Combination: What? How? Why? And Why Now and Together?

1.1 The Current Security Landscape

Computer and network security, also called cybersecurity, is one of the most significant subjects to consider when dealing with computers, networking, and data issues. As data and digital technology gains an inclusion into everyone's life in general, their security becomes more important too. While more than two billion people are estimated to use the Internet on a regular basis at a present time, the amount of sensitive and private data collected and stored by government and nongovernment organizations that needs to get protected grows up every day. On the other hand, computer systems and communication networks have always been vulnerable to a myriad of threats that can inflict different types of damage resulting in significant losses. The damage can include anything from a data entry error resulting in violation of data integrity to a planted virus that could destroy an entire database with a possible damage source ranging from an outside hacker to an inside mistrusted employee or just a human error.

A new professional community of computer security specialists who are responsible for protecting the systems against adversary attacks and preventing the damage has been formed. Cybersecurity significantly changes the protection landscape and the range of the defense mechanisms. Nowadays, attackers have a wide selection of devices that they could target and infect through local and global networks such as an entire computer infrastructure, mobile phones, and even computerized automobiles. With each passing year, attacks seem to be escalating, also these threats are evolving as attackers invent newer ways to steal, harm, and destroy. Criminals aim at stealing private as well as financial information, and even government and political entities are not immune. Hence, there exist a rising need to safeguard sensitive information and computer resources from these complex and malicious threats. Hacking has been around for decades but present‐day crimes become often not only financially but also politically or personally motivated. The hacker's activities like identity theft and cyber terrorism create a clear danger to private citizens as well as threats the whole society fabric. The possible hacking sponsorship provided by some government and nongovernment organizations around the world make their ulterior motives and goals even more frightening.

Computer security is a forever evolving field, especially if one takes into account the current technological and societal developments. Two major interrelated trends are observed in a modern technological development: computerization or digitization and interconnection through the Internet (Figure 1.1). Over the last couple of decades, the number of devices connected to the Internet has grown at a large scale. New computerized products, which are released nowadays have an Internet connection capability, and the rate at which they utilize it is high. These devices generate and collect an astronomical amount of data, which needs to be stored, processed, communicated, and accessed (Figure 1.2). However, with time, more and more complex exploits get built.

Figure 1.1 Security threats get close to you through networking.

The computer security has always been a quintessential aspect when it comes to technology. The security has grown stronger in comparison to how it looked a few decades ago but so have the threats. New kinds of threats and attackers have been coming up and testing the resistance of the computer and information systems since then (Figure 1.3). Now, with the wide number of many electronic gadgets such as mobiles and any data storage devices, and also with an immense adaptation of computer technology in all kinds of diversified sectors, the need for the sound security control mechanisms and tools grows faster than ever before too. Initially, attackers targeted the large‐scale organizations in financial sectors but it has changed. Nowadays, they are coming after smaller organizations too. Attackers are looking for any form of assets to steal or modify, they can get an access to.

Big data and pervasive computing are the new areas, which cyberattackers set up to exploit. Mobile phones, tablets, and laptops compose the areas, which are most vulnerable. Immense research is carried out in these fields to make them secure. Not long time ago, the picture looked different. There were limited number of devices and the Internet was not so widespread. Cloud computing was a concept only. Not all the data were stored in the cloud. The viruses were not smart enough to exploit the available vulnerabilities. Nor had the hackers tools so powerful, that people with limited or no knowledge of computing could actually get into systems of other owners. Figure 1.3 demonstrates the growth in attacks diversity and sophistication and at the same time the decrease in the knowledge possession required for conducting cyberattacks due to the availability of automated tools.

Figure 1.2 New technologies and applications, such as self‐driving cars and bikes dramatically increase the data security and privacy protection requirements.

Source: Courtesy of @ L. Reznik and I. Khokhlov.

The major reasons that are commonly given to explain the growing cost of cybercrime are:

quick adoption of new technologies by cyber criminals,

the increased number of new users online, who nowadays mostly come from low‐income communities and countries with weak cybersecurity education and implemented protection mechanisms,

an increased ease of committing cybercrime, with the growth of cyber‐crime‐as‐a‐service development and criminals attempting to monetize their breach success,

an expanding number of cybercrime centers around the world that might get supported by their government agencies,

a growing financial sophistication among top‐tier cyber criminals and availability of cryptocurrencies that, among other things, make crime monetization easier.

The recent Report to the President (2018) on Enhancing the Resilience of the Internet and Communications Ecosystem Against Botnets and Other Automated, Distributed Threats released by the US Department of Commerce and Department of Homeland Security on 22 May 2018 clustered the opportunities and challenges in working toward dramatically reducing threats from automated, distributed attacks in six principal themes and determined possible organizational measures to address them.

Automated, distributed attacks present a global problem.

The majority of the compromised devices in recent noteworthy botnets have been geographically located outside the United States. To increase the resilience of the Internet and communications ecosystem against these threats, we must continue to work closely with international partners.

Effective tools exist, but are not widely used.

While there remains room for improvement, the tools, processes, and practices required to significantly enhance the resilience of the Internet and communications ecosystem are widely available, and are routinely applied in selected market sectors. However, they do not form the common practices for product development and deployment in many other sectors for a variety of reasons, including (but not limited to) lack of awareness, cost avoidance, insufficient technical expertise, and lack of market incentives.

Figure 1.3 Attack sophistication vs. Intruder knowledge.

Source: Courtesy of @ L. Reznik and I. Khokhlov. Modified from https://www.eleceng.adelaide.edu.au/students/wiki/projects/images/c/cd/Fig2.png.

Products should be secured during all stages of the life cycle.