36,59 €
Become a master at penetration testing using machine learning with Python
Key Features
Book Description
Cyber security is crucial for both businesses and individuals. As systems are getting smarter, we now see machine learning interrupting computer security. With the adoption of machine learning in upcoming security products, it's important for pentesters and security researchers to understand how these systems work, and to breach them for testing purposes.
This book begins with the basics of machine learning and the algorithms used to build robust systems. Once you've gained a fair understanding of how security products leverage machine learning, you'll dive into the core concepts of breaching such systems. Through practical use cases, you'll see how to find loopholes and surpass a self-learning security system.
As you make your way through the chapters, you'll focus on topics such as network intrusion detection and AV and IDS evasion. We'll also cover the best practices when identifying ambiguities, and extensive techniques to breach an intelligent system.
By the end of this book, you will be well-versed with identifying loopholes in a self-learning security system and will be able to efficiently breach a machine learning system.
What you will learn
Who this book is for
This book is for pen testers and security professionals who are interested in learning techniques to break an intelligent security system. Basic knowledge of Python is needed, but no prior knowledge of machine learning is necessary.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 177
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Vijin BorichaAcquisition Editor: Heramb BhavsarContent Development Editor: Nithin George VargheseTechnical Editor: Komal KarneCopy Editor: Safis EditingProject Coordinator: Virginia DiasProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics: Tom ScariaProduction Coordinator: Aparna Bhagat
First published: June 2018
Production reference: 1260618
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78899-740-9
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Chiheb Chebbi is an InfoSec enthusiast who has experience in various aspects of information security, focusing on the investigation of advanced cyber attacks and researching cyber espionage and APT attacks. Chiheb is currently pursuing an engineering degree in computer science at TEK-UP university in Tunisia.
His core interests are infrastructure penetration testing, deep learning, and malware analysis. In 2016, he was included in the Alibaba Security Research Center Hall Of Fame. His talk proposals were accepted by DeepSec 2017, Blackhat Europe 2016, and many world-class information security conferences.
Aditya Mukherjee is a proficient information security professional, cybersecurity speaker, entrepreneur, cybercrime investigator, and columnist.
He has 10+ years of experience in different leadership roles across information security domains with various reputed organizations, specializing in the implementation of cybersecurity solutions, cyber transformation projects, and solving problems associated with security architecture, framework, and policies.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Mastering Machine Learning for Penetration Testing
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Machine Learning in Pentesting
Technical requirements
Artificial intelligence and machine learning  
Machine learning models and algorithms 
Supervised
Bayesian classifiers
Support vector machines
Decision trees 
Semi-supervised
Unsupervised
Artificial neural networks 
Linear regression 
Logistic regression
Clustering with k-means 
Reinforcement
Performance evaluation 
Dimensionality reduction
Improving classification with ensemble learning 
Machine learning development environments and Python libraries
NumPy
SciPy
TensorFlow
Keras
pandas
Matplotlib
scikit-learn
NLTK
Theano
Machine learning in penetration testing - promises and challenges
Deep Exploit
Summary
Questions
Further reading
Phishing Domain Detection
Technical requirements
Social engineering overview
Social Engineering Engagement Framework
Steps of social engineering penetration testing
Building real-time phishing attack detectors using different machine learning models
Phishing detection with logistic regression
Phishing detection with decision trees
NLP in-depth overview
Open source NLP libraries
Spam detection with NLTK
Summary
Questions
Malware Detection with API Calls and PE Headers
Technical requirements
Malware overview
Malware analysis      
Static malware analysis
Dynamic malware analysis
Memory malware analysis
Evasion techniques
Portable Executable format files 
Machine learning malware detection using PE headers 
Machine learning malware detection using API calls
Summary
Questions
Further reading
Malware Detection with Deep Learning
Technical requirements
Artificial neural network overview
Implementing neural networks in Python
Deep learning model using PE headers
Deep learning model with convolutional neural networks and malware visualization
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)
Long Short Term Memory networks
Hopfield networks
Boltzmann machine networks
Malware detection with CNNs
Promises and challenges in applying deep learning to malware detection
Summary
Questions
Further reading
Botnet Detection with Machine Learning
Technical requirements
Botnet overview
Building a botnet detector model with multiple machine learning techniques
How to build a Twitter bot detector
Visualization with seaborn
Summary
Questions
Further reading
Machine Learning in Anomaly Detection Systems
Technical requirements
An overview of anomaly detection techniques
Static rules technique
Network attacks taxonomy
The detection of network anomalies
HIDS
NIDS
Anomaly-based IDS
Building your own IDS
The Kale stack
Summary
Questions
Further reading
Detecting Advanced Persistent Threats
Technical requirements
Threats and risk analysis
Threat-hunting methodology
The cyber kill chain
The diamond model of intrusion analysis
Threat hunting with the ELK Stack
Elasticsearch
Kibana
Logstash
Machine learning with the ELK Stack using the X-Pack plugin
Summary
Questions
Evading Intrusion Detection Systems
Technical requirements
Adversarial machine learning algorithms
Overfitting and underfitting
Overfitting and underfitting with Python
Detecting overfitting
Adversarial machine learning
Evasion attacks
Poisoning attacks
Adversarial clustering
Adversarial features
CleverHans
The AML library 
EvadeML-Zoo
Evading intrusion detection systems with adversarial network systems
Summary
Questions
Further reading
Bypassing Machine Learning Malware Detectors
Technical requirements
Adversarial deep learning
Foolbox
Deep-pwning
EvadeML
Bypassing next generation malware detectors with generative adversarial networks
The generator
The discriminator
MalGAN
Bypassing machine learning with reinforcement learning
Reinforcement learning
Summary
Questions
Further reading
Best Practices for Machine Learning and Feature Engineering
Technical requirements
Feature engineering in machine learning
Feature selection algorithms
Filter methods
Pearson's correlation
Linear discriminant analysis
Analysis of variance
Chi-square
Wrapper methods
Forward selection
Backward elimination
Recursive feature elimination
Embedded methods
Lasso linear regression L1
Ridge regression L2
Tree-based feature selection
Best practices for machine learning
Information security datasets
Project Jupyter
Speed up training with GPUs
Selecting models and learning curves
Machine learning architecture
Coding
Data handling
Business contexts
Summary
Questions
Further reading
Assessments
Chapter 1 – Introduction to Machine Learning in Pentesting 
Chapter 2 – Phishing Domain Detection
Chapter 3 – Malware Detection with API Calls and PE Headers 
Chapter 4 – Malware Detection with Deep Learning 
Chapter 5 – Botnet Detection with Machine Learning 
Chapter 6 – Machine Learning in Anomaly Detection Systems 
Chapter 7 – Detecting Advanced Persistent Threats 
Chapter 8 – Evading Intrusion Detection Systems with Adversarial Machine Learning
Chapter 9 – Bypass Machine Learning Malware Detectors
Chapter 10 – Best Practices for Machine Learning and Feature Engineering
Other Books You May Enjoy
Leave a review - let other readers know what you think
Currently, machine learning techniques are some of the hottest trends in information technology. They impact on every aspect of our lives, and they affect every industry and field. Machine learning is a cyber weapon for information security professionals. In this book, you will not only explore the fundamentals of machine learning techniques, but will also learn the secrets to building a fully functional machine learning security system; we will not stop at building defensive layers. We will explore how to attack machine learning models with adversarial learning. Mastering Machine Learning for Penetration Testing will provide educational as well as practical value.
Mastering Machine Learning for Penetration Testing is for pen testers and security professionals who are interested in learning techniques for breaking an intelligent security system. A basic knowledge of Python is needed, but no prior knowledge of machine learning is necessary.
Chapter 1, Introduction to Machine Learning in Pentesting, introduces reader to the fundamental concepts of the different machine learning models and algorithms, in addition to learning how to evaluate them. It then shows us how to prepare a machine learning development environment using many data science Python libraries.
Chapter 2, Phishing Domain Detection, guides us on how to build machine learning models to detect phishing emails and spam attempts using different algorithms and natural language processing (NLP).
Chapter 3, Malware Detection with API Calls and PE Headers, explains the different approaches to analyzing malware and malicious software, and later introduces us to some different techniques for building a machine learning-based malware detector.
Chapter 4, Malware Detection with Deep Learning, extends what we learned in the previous chapter to explore how to build artificial neural networks and deep learning to detect malware.
Chapter 5, Botnet Detection with Machine Learning, demonstrates how to build a botnet detector using the previously discussed techniques and publicly available botnet traffic datasets.
Chapter 6, Machine Learning in Anomaly Detection Systems, introduces us to the most important terminologies in anomaly detection and guides us to build machine learning anomaly detection systems.
Chapter 7, Detecting Advanced Persistent Threats, shows us how to build a fully working real-world threat hunting platform using the ELK stack, which is already loaded by machine learning capabilities.
Chapter 8, Evading Intrusion Detection Systems with Adversarial Machine Learning, demonstrates how to bypass machine learning systems using adversarial learning and studies some real-world cases, including bypassing next-generation intrusion detection systems.
Chapter 9, Bypass Machine Learning Malware Detectors, teaches us how to bypass machine learning-based malware detectors with adversarial learning and generative adversarial networks.
Chapter 10, Best Practices for Machine Learning and Feature Engineering, explores different feature engineering techniques, in addition to introducing readers to machine learning best practices to build reliable systems.
We assume that the readers of this book are familiar with basic information security concepts and Python programming. Some of the demonstrations in this book require more practice and online research to delve into the concepts discussed.
Always check the GitHub repository of this book to check for updated code if you encounter any bugs, typos, or errors.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it from https://www.packtpub.com/sites/default/files/downloads/MasteringMachineLearningforPenetrationTesting_ColorImages.pdf.
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Currently, machine learning techniques are some of the hottest trends in information technology. They impact every aspect of our lives, and they affect every industry and field. Machine learning is a cyber weapon for information security professionals. In this book, readers will not only explore the fundamentals behind machine learning techniques, but will also learn the secrets to building a fully functional machine learning security system. We will not stop at building defensive layers; we will illustrate how to build offensive tools to attack and bypass security defenses. By the end of this book, you will be able to bypass machine learning security systems and use the models constructed in penetration testing (pentesting) missions.
In this chapter, we will cover:
Machine learning models and algorithms
Performance evaluation metrics
Dimensionality reduction
Ensemble learning
Machine learning development environments and Python libraries
Machine learning in penetration testing – promises and challenges
In this chapter, we are going to build a development environment. Therefore, we are going to install the following Python machine learning libraries:
NumPy
SciPy
TensorFlow
Keras
pandas
MatplotLib
scikit-learn
NLTK
Theano
You will also find all of the scripts and installation guides used in this GitHub repository: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter01.
Making a machine think like a human is one of the oldest dreams. Machine learning techniques are used to help make predictions based on experiences and data.
In order to teach machines how to solve a large number of problems by themselves, we need to consider the different machine learning models. As you know, we need to feed the model with data; that is why machine learning models are divided, based on datasets entered (input), into four major categories: supervised learning, semi-supervised learning, unsupervised learning, and reinforcement. In this section, we are going to describe each model in a detailed way, in addition to exploring the most well-known algorithms used in every machine learning model. Before building machine learning systems, we need to know how things work underneath the surface.
We talk about supervised machine learning when we have both the input variables and the output variables. In this case, we need to map the function (or pattern) between the two parties. The following are some of the most often used supervised machine learning algorithms.
According to the Cambridge English Dictionary, bias is the action of supporting or opposing a particular person or thing in an unfair way, allowing personal opinions to influence your judgment. Bayesian machine learning refers to having a prior belief, and updating it later by using data. Mathematically, it is based on the Bayes formula:
One of the simplest Bayesian problems is randomly tossing a coin and trying to predict whether the output will be heads or tails. That is why we can identify Bayesian methodology as being probabilistic. Naive Bayes is very useful when you are using a small amount of data.
A support vector machine (SVM) is a supervised machine learning model that works by identifying a hyperplane between represented data. The data can be represented in a multidimensional space. Thus, SVMs are widely used in classification models. In an SVM, the hyperplane that best separates the different classes will be used. In some cases, when we have different hyperplanes that separate different classes, identification of the correct one will be performed thanks to something called a margin, or a gap. The margin is the nearest distance between the hyperplanes and the data positions. You can take a look at the following representation to check for the margin:
The hyperplane with the highest gap will be selected. If we choose the hyperplane with the shortest margin, we might face misclassification problems later. Don't be distracted by the previous graph; the hyperplane will not always be linear. Consider a case like the following:
In the preceding situation, we can add a new axis, called the z axis, and apply a transformation using a kernel trick called a kernel function, where z=x^2+y^2. If you apply the transformation, the new graph will be as follows:
Now, we can identify the right hyperplane. The transformation is called a kernel
