Learning Python for Forensics - Preston Miller - E-Book

Learning Python for Forensics E-Book

Preston Miller

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Design, develop, and deploy innovative forensic solutions using Python

Key Features

  • Discover how to develop Python scripts for effective digital forensic analysis
  • Master the skills of parsing complex data structures with Python libraries
  • Solve forensic challenges through the development of practical Python scripts

Book Description

Digital forensics plays an integral role in solving complex cybercrimes and helping organizations make sense of cybersecurity incidents. This second edition of Learning Python for Forensics illustrates how Python can be used to support these digital investigations and permits the examiner to automate the parsing of forensic artifacts to spend more time examining actionable data.

The second edition of Learning Python for Forensics will illustrate how to develop Python scripts using an iterative design. Further, it demonstrates how to leverage the various built-in and community-sourced forensics scripts and libraries available for Python today. This book will help strengthen your analysis skills and efficiency as you creatively solve real-world problems through instruction-based tutorials.

By the end of this book, you will build a collection of Python scripts capable of investigating an array of forensic artifacts and master the skills of extracting metadata and parsing complex data structures into actionable reports. Most importantly, you will have developed a foundation upon which to build as you continue to learn Python and enhance your efficacy as an investigator.

What you will learn

  • Learn how to develop Python scripts to solve complex forensic problems
  • Build scripts using an iterative design
  • Design code to accommodate present and future hurdles
  • Leverage built-in and community-sourced libraries
  • Understand the best practices in forensic programming
  • Learn how to transform raw data into customized reports and visualizations
  • Create forensic frameworks to automate analysis of multiple forensic artifacts
  • Conduct effective and efficient investigations through programmatic processing

Who this book is for

If you are a forensics student, hobbyist, or professional seeking to increase your understanding in forensics through the use of a programming language, then Learning Python for Forensics is for you. You are not required to have previous experience in programming to learn and master the content within this book. This material, created by forensic professionals, was written with a unique perspective and understanding for examiners who wish to learn programming.

Preston Miller is a consultant at an internationally recognized risk management firm. Preston holds an undergraduate degree from Vassar College and a master's degree in digital forensics from Marshall University. While at Marshall, Preston unanimously received the prestigious J. Edgar Hoover Foundation's scientific scholarship. Preston is a published author, recently of Python Digital Forensics Cookbook, which won the Forensic 4:cast Digital Forensics Book of the Year award in 2018. Preston is a member of the GIAC advisory board and holds multiple industry-recognized certifications in his field. Chapin Bryce is a consultant at a global firm that is a leader in digital forensics and incident response investigations. After graduating from Champlain College with a bachelor's degree in computer and digital forensics, Chapin dove into the field of digital forensics and incident response joining the GIAC advisory board and earning four GIAC certifications: GCIH, GCFE, GCFA, and GNFA. As a member of multiple ongoing research and development projects, he has authored several books and articles in professional and academic publications, including Python Digital Forensics Cookbook (Forensic 4:Cast Digital Forensics Book of the Year, 2018), Learning Python for Forensics, First Edition, and Digital Forensic Magazine.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 658

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Learning Python for ForensicsSecond Edition

 

 

 

Leverage the power of Python in forensic investigations

 

 

 

 

 

 

 

 

 

 

 

Preston Miller
Chapin Bryce

 

 

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Learning Python for Forensics Second Edition

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

Commissioning Editor: Gebin GeorgeAcquisition Editor:Joshua NadarContent Development Editor:Chris D'cruzTechnical Editor: Dinesh PawarCopy Editor: Safis EditingProject Coordinator:Namrata SwettaProofreader: Safis EditingIndexer:Rekha NairGraphics:Tom ScariaProduction Coordinator:Nilesh Mohite

First published: May 2016 Second edition: January 2019

Production reference: 1310119

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78934-169-0

www.packtpub.com

 
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the authors

Preston Miller is a consultant at an internationally recognized risk management firm. Preston holds an undergraduate degree from Vassar College and a master's degree in digital forensics from Marshall University. While at Marshall, Preston unanimously received the prestigious J. Edgar Hoover Foundation's scientific scholarship. Preston is a published author, recently of Python Digital Forensics Cookbook, which won the Forensic 4:cast Digital Forensics Book of the Year award in 2018. Preston is a member of the GIAC advisory board and holds multiple industry-recognized certifications in his field.

To my grandfather, who taught me the value of hard work, dedication, and the pursuit of excellence, without whose love and support I would not be the person I am today.

 

 

 

 

Chapin Bryce is a consultant at a global firm that is a leader in digital forensics and incident response investigations. After graduating from Champlain College with a bachelor's degree in computer and digital forensics, Chapin dove into the field of digital forensics and incident response joining the GIAC advisory board and earning four GIAC certifications: GCIH, GCFE, GCFA, and GNFA. As a member of multiple ongoing research and development projects, he has authored several books and articles in professional and academic publications, including Python Digital Forensics Cookbook (Forensic 4:Cast Digital Forensics Book of the Year, 2018), Learning Python for Forensics, First Edition, and Digital Forensic Magazine.

To Alexa, who I hope will learn Python in the near future.

About the reviewer

Marek Chmelis an IT consultant and trainer with more than 10 years' experience. He is a frequent speaker, focusing on Microsoft SQL Server, Azure, and security topics. Marek writes for Microsoft's TechnetCZSK blog and has been an MVP: Data Platform since 2012. He has earned numerous certifications, including MCSE: Data Management and Analytics, EC Council Certified Ethical Hacker, and several eLearnSecurity certifications.

Marek earned his MSc (business and informatics) degree from Nottingham Trent University. He started his career as a trainer for Microsoft server courses. Later, he joined AT&T as a principal database administrator specializing in MSSQL Server, data platforms, and machine learning.

 

 

 

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Learning Python for Forensics Second Edition

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Now for Something Completely Different

When to use Python

Development life cycle

Getting started

The omnipresent print() function

Standard data types

Strings and Unicode

Integers and floats

Boolean and none

Structured data types

Lists

Dictionaries

Sets and tuples

Data type conversions

Files

Variables

Understanding scripting flow logic

Conditionals

Loops

The for loop

The while loop

Functions

Summary

Python Fundamentals

Advanced data types and functions

Iterators

datetime objects

Libraries

Installing third-party libraries

Libraries in this book

Python packages

Classes and object-oriented programming

Try and except

The raise function

Creating our first script – unix_converter.py

User input

Using the raw input method and the system module – user_input.py

Understanding Argparse – argument_parser.py

Forensic scripting best practices

Developing our first forensic script – usb_lookup.py

Understanding the main() function

Interpreting the search_key() function

Running our first forensic script

Troubleshooting

Challenge

Summary

Parsing Text Files

Setup API

Introducing our script

Overview

Our first iteration – setupapi_parser_v1.py

Designing the main() function

Crafting the parse_setupapi() function

Developing the print_output() function

Running the script

Our second iteration – setupapi_parser_v2.py

Improving the main() function

Tuning the parse_setupapi() function

Modifying the print_output() function

Running the script

Our final iteration – setupapi_parser.py

Extending the main() function

Adding to the parse_setup_api() function

Creating the parse_device_info() function

Forming the prep_usb_lookup() function

Constructing the get_device_names() function

Enhancing the print_output() function

Running the script

Challenge

Summary

Working with Serialized Data Structures

Serialized data structures

A simple Bitcoin web API

Our first iteration – bitcoin_address_lookup.v1.py

Exploring the main() function

Understanding the get_address() function

Working with the print_transactions() function

The print_header() helper function

The get_inputs() helper function

Running the script

Our second iteration – bitcoin_address_lookup.v2.py

Modifying the main() function

Improving the get_address() function

Elaborating on the print_transactions() function

Running the script

Mastering our final iteration – bitcoin_address_lookup.py

Enhancing the parse_transactions() function

Developing the csv_writer() function

Running the script

Challenge

Summary

Databases in Python

An overview of databases

Using SQLite3

Using SQL

Designing our script

Manually manipulating databases with Python – file_lister.py

Building the main() function

Initializing the database with the init_db() function

Checking for custodians with the get_or_add_custodian() function

Retrieving custodians with the get_custodian() function

Understanding the ingest_directory() function

Exploring the os.stat() method

Developing the format_timestamp() helper function

Configuring the write_output() function

Designing the write_csv() function

Composing the write_html() function

Running the script

Automating databases further – file_lister_peewee.py

Peewee setup

Jinja2 setup

Updating the main() function

Adjusting the init_db() function

Modifying the get_or_add_custodian() function

Improving the ingest_directory() function

A closer look at the format_timestamp() function

Converting the write_output() function

Simplifying the write_csv() function

Condensing the write_html() function

Running our new and improved script

Challenge

Summary

Extracting Artifacts from Binary Files

UserAssist

Understanding the ROT-13 substitution cipher – rot13.py

Evaluating code with timeit

Working with the yarp library

Introducing the struct module

Creating spreadsheets with the xlsxwriter module

Adding data to a spreadsheet

Building a table

Creating charts with Python

The UserAssist framework

Developing our UserAssist logic processor – userassist_parser.py

Evaluating the main() function

Defining the create_dictionary() function

Extracting data with the parse_values() function

Processing strings with the get_name() function

Writing Excel spreadsheets – xlsx_writer.py

Controlling output with the excel_writer() function

Summarizing data with the dashboard_writer() function

Writing artifacts in the userassist_writer() function

Defining the file_time() function

Processing integers with the sort_by_count() function

Processing datetime objects with the sort_by_date() function

Writing generic spreadsheets – csv_writer.py

Understanding the csv_writer() function

Running the UserAssist framework

Challenge

Summary

Fuzzy Hashing

Background on hashing

Hashing files in Python

Hashing large files – hashing_example.py

Creating fuzzy hashes

Context Triggered Piecewise Hashing (CTPH)

Implementing fuzzy_hasher.py

Starting with the main() function

Creating our fuzzy hashes

Generating our rolling hash

Preparing signature generation

Providing the output

Running fuzzy_hasher.py

Using ssdeep in Python – ssdeep_python.py

Revisiting the main() function

Redesigning our output() function

Running ssdeep_python.py

Additional challenges

References

Summary

The Media Age

Creating frameworks in Python

Introduction to EXIF metadata

Introducing the Pillow module

Introduction to ID3 metadata

Introducing the Mutagen module

Introduction to Office metadata

Introducing the lxml module

The Metadata_Parser framework overview

Our main framework controller – metadata_parser.py

Controlling our framework with the main() function

Parsing EXIF metadata – exif_parser.py

Understanding the exif_parser() function

Developing the get_tags() function

Adding the dms_to_decimal() function

Parsing ID3 metdata – id3_parser.py

Understanding the id3_parser() function

Revisiting the get_tags() function

Parsing Office metadata – office_parser.py

Evaluating the office_parser() function

The get_tags() function for the last time

Moving on to our writers

Writing spreadsheets – csv_writer.py

Plotting GPS data with Google Earth – kml_writer.py

Supporting our framework with processors

Creating framework-wide utility functions – utility.py

Framework summary

Additional challenges

Summary

Uncovering Time

About timestamps

What's an epoch?

Using a GUI

Basics of TkInter objects

Implementing the TkInter GUI

Using frame objects

Using classes in TkInter

Developing the date decoder GUI – date_decoder.py

The DateDecoder class setup and __init__() method

Executing the run() method

Implementing the build_input_frame() method

Creating the build_output_frame() method

Building the convert() method

Defining the convert_unix_seconds() method

Conversion using the convert_win_filetime_64() method

Converting with the convert_chrome_time() method

Designing the output method

Running the script

Additional challenges

Summary

Rapidly Triaging Systems

Understanding the value of system information

Querying OS-agnostic process information with psutil

Using WMI

What does the pywin32 module do?

Rapidly triaging systems – pysysinfo.py

Understanding the get_process_info() function

Learning about the get_pid_details() function

Extracting process connection properties with the read_proc_connections() function

Obtaining more process information with the read_proc_files() function

Extracting Windows system information with the wmi_info() function

Writing our results with the csv_writer() function

Executing pysysinfo.py

Challenges

Summary

Parsing Outlook PST Containers

The PST file format

An introduction to libpff

How to install libpff and pypff

Exploring PSTs – pst_indexer.py

An overview

Developing the main() function

Evaluating the make_path() helper function

Iteration with the folder_traverse() function

Identifying messages with the check_for_msgs() function

Processing messages in the process_msg() function

Summarizing data in the folder_report() function

Understanding the word_stats() function

Creating the word_report() function

Building the sender_report() function

Refining the heat map with the date_report() function

Writing the html_report() function

The HTML template

Running the script

Additional challenges

Summary

Recovering Transient Database Records

SQLite WAL files

WAL format and technical specifications

The WAL header

The WAL frame

The WAL cell and varints

Manipulating large objects in Python

Regular expressions in Python

TQDM – a simpler progress bar

Parsing WAL files – wal_crawler.py

Understanding the main() function

Developing the frame_parser() function

Processing cells with the cell_parser() function

Writing the dict_helper() function

The Python debugger – pdb

Processing varints with the single_varint() function

Processing varints with the multi_varint() function

Converting serial types with the type_helper() function

Writing output with the csv_writer() function

Using regular expression in the regular_search() function

Executing wal_crawler.py

Challenge

Summary

Coming Full Circle

Frameworks

Building a framework to last

Data standardization

Forensic frameworks

Colorama

FIGlet

Exploring the framework – framework.py

Exploring the Framework object

Understanding the Framework __init__() constructor

Creating the Framework run() method

Iterating through files with the Framework _list_files() method

Developing the Framework _run_plugins() method

Exploring the Plugin object

Understanding the Plugin __init__() constructor

Working with the Plugin run() method

Handling output with the Plugin write() method

Exploring the Writer object

Understanding the Writer __init__() constructor

Understanding the Writer run() method

Our Final CSV writer – csv_writer.py

The writer – xlsx_writer.py

Changes made to plugins

Executing the framework

Additional challenges

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

At the outset of writing Learning Python for Forensics, we had one goal: to teach the use of Python for forensics in such a way that readers with little to no programming experience could follow along immediately and develop practical code for use in casework. That's not to say that this book is intended for the Python neophyte; throughout, we ease the reader into progressively more challenging code and end by incorporating many of the scripts in previous chapters into a forensic framework. This book makes a few assumptions about the reader's programming experience, and where it does, there will often be a detailed explanation with examples and a list of resources to help bridge the gap in knowledge.

The majority of the book will focus on developing code for various forensic artifacts; however, the first two chapters will teach the basics of the language. This will level the playing field for readers of all skill levels. We intend for the complete Python novice to be able to develop forensically sound and relevant scripts by the end of this book.

Much like in the real world, code development will follow a modular design. Initially, a script might be written one way before rewritten in another to show off the advantages (or disadvantages) of various techniques. Immersing you in this fashion will help build and strengthen the neural links required to retain the process of script design. To allow Python development to become second nature, please retype the exercises shown throughout the chapters for yourself to practice and learn common Python tropes. Never be afraid to modify the code, you will not break anything (except maybe your version of the script) and will have a better understanding of the inner workings of the code as a result.

Who this book is for

If you are a forensics student, hobbyist, or professional that is seeking to increase your understanding of forensics through the use of a programming language, then this book is for you.

You are not required to have previous experience of programming to learn and master the content within this book. This material, created by forensic professionals, was written with a unique perspective to help examiners learn programming.

What this book covers

Chapter 1, Now for Something Completely Different, is an introduction to common Python objects, built-in functions, and tropes. We will also cover basic programming concepts.

Chapter 2, Python Fundamentals, is a continuation of the basics learned in the previous chapter and the development of our first forensic script.

Chapter 3, Parsing Text Files, discusses a basic setup API log parser to identify first use times for USB devices and introduce the iterative development cycle.

Chapter 4, Working with Serialized Data Structures, shows how serialized data structures such as JSON files can be used to store or retrieve data in Python. We will parse JSON-formatted data from the Bitcoin blockchain containing transaction details.

Chapter 5, Databases in Python, shows how databases can be used to store and retrieve data via Python. We will use two different database modules to demonstrate different versions of a script that creates an active file listing with a database backend.

Chapter 6, Extracting Artifacts from Binary Files, is an introduction to the struct module, which will become every examiner's friend. We use the struct module to parse binary data into Python objects from a forensically-relevant source. We will parse the UserAssist key in the registry for user application execution artifacts.

Chapter 7, Fuzzy Hashing, explores how ssdeep compatible hashes are generated and how to use the pre-built ssdeep module to perform similarity analysis.

Chapter 8, The Media Age, helps us understand embedded metadata and parse them from forensic sources. In this chapter, we introduce and design an embedded metadata framework in Python.

Chapter 9, Uncovering Time, provides the first look at the development of the GUI with Python to decode commonly encountered timestamps. This is our introduction to GUI and Python class development.

Chapter 10, Rapidly Triage Systems, shows how you can use Python to collect volatile and other useful information from popular operating systems. This includes an introduction to a very powerful Windows-specific Python API.

Chapter 11, Parsing Outlook PST Containers, demonstrates how to read, index, and report on the contents of an Outlook PST container.

Chapter 12, Recovering Deleted Database Records, introduces SQLite Write-Ahead Logs and how to extract data, including deleted data, from these files.

Chapter 13, Coming Full Circle, is an aggregation of scripts written in previous chapters into a forensic framework. We explore concepts and methods for designing these larger projects.

To get the most out of this book

To follow along with the examples in this book, you will need the following:

A computer with an internet connection

Python 2.7.15 or Python 3.7.1

Optionally

, an IDE for Python

In addition to these requirements, you will need to install various third-party modules that we will make use of in our code. We will indicate which modules need to be installed, the correct version, and, often, how to install them.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

7-Zip/WinRAR for Windows

Keka/Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Python-for-Forensics-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789341690_ColorImages.pdf.

 

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "This chapter outlines the basics of Python, from Hello World to core scripting concepts."

A block of code is set as follows:

# open the database # read from the database using the sqlite3 library # store in variable called records for record in records: # process database records here

Any command-line input or output is written as follows:

>>> type('what am I?')

<class 'str'>

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Now for Something Completely Different

This book presents Python as a necessary tool to optimize digital forensic analysis—written from an examiner's perspective. In the first two chapters, we introduce the basics of Python in preparation for the remainder of this book, where we will develop scripts to accomplish forensic tasks. While focused on the use of the language as a tool, we will also explore the advantages of Python and how they allow many individuals in the field to create solutions for complex forensic challenges. Like Monty Python, Python's namesake, the next 12 chapters aim to present something completely different.

In this fast-paced field, a scripting language provides flexible problem solving in an automated fashion, allowing the examiner additional time to investigate other artifacts that, due to time constraints, may not have been analyzed as thoroughly otherwise. Admittedly, Python may not always be the right tool to complete the task at hand, but it is an invaluable tool to add to anyone's DFIR arsenal. Should you undertake the task of mastering Python, it will more than pay off the time investment as you will increase your analysis capabilities many fold and greatly diversify your skill set. This chapter outlines the basics of Python, from Hello World to core scripting concepts.

This chapter will cover the following topics:

An introduction to Python and healthy development practices

Basic programming concepts

Manipulating and storing objects in Python

Creating simple conditionals, loops, and functions

When to use Python

Python is a powerful forensic tool. However, before deciding to develop a script, it is important to consider the type of analysis that's required and the project timeline. In the examples that follow, we will outline situations where Python is invaluable and, conversely, when it is not worth the development effort. Though rapid development makes it easy to deploy a solution in a tough situation, Python is not always the best tool to implement. If a tool exists that performs the task at hand, and is available, it may be the more appropriate method for analysis.

Python is a preferred programming language for forensics due to its ease of use, library support, detailed documentation, and interoperability among operating systems. There are two main types of programming languages: those that are interpreted and those that are compiled. Compiling code allows the programming language to be converted into machine language. This lower-level language is more efficient for the computer to interpret. Interpreted languages are not as fast as compiled languages at runtime, but do not require compilation, which can take some time. Because Python is an interpreted language, we can make modifications to our code and immediately run and view the results. With a compiled language, we would have to wait for our code to re-compile before viewing the effect of our modifications. For this reason, Python may not run as quickly as a compiled language, but allows for rapid prototyping.

An incident response case presents an excellent example of when to use Python in a real-life setting. For example, let's consider that a client calls, panicked, reporting a data breach and is unsure of how many files were exfiltrated over the past 24 hours from their file server. Once on site, you are instructed to perform the fastest count of files accessed in the past 24 hours as this count, and the list of compromised files, will determine the course of action.

Python fits this bill quite nicely here. Armed with just a laptop, you can open a text editor and begin writing a solution. Python can be built and designed without the need for a fancy editor or toolset. The build process of your script may look like this, with each step building upon the previous one:

 Make the script read a single file's last accessed timestamp

Write a loop that steps through directories and subdirectories

 Test each file to see if that timestamp is from the past 24 hours

 If it has been accessed within 24 hours, then create a list of affected files to display file paths and access times

The process here would result in a script that recurses over the entire server and output files found with a last accessed time in the past 24 hours for manual review. This script will likely be approximately 20 lines of code and have required 10 minutes, or less, for an intermediate scripter to develop and validate—it is apparent this would be more efficient than manually reviewing timestamps on the filesystem.

Before deploying any developed code, it is imperative that you validate its capability first. As Python is not a compiled language, we can easily run the script after adding new lines of code to ensure we haven't broken anything. This approach is known as test-then-code, a method commonly used in script development. Any software, regardless of who wrote it, should be scrutinized and evaluated to ensure accuracy and precision. Validation ensures that the code is operating properly, and although more time-consuming, provides reliable results that are capable of withstanding the courtroom, an important aspect in forensics.

A situation where Python may not be the best tool is for general case analysis. If you are handed a hard drive and asked to find evidence without additional insight, then a pre-existing tool will be the better solution. Python is invaluable for targeted solutions, such as analyzing a given file type and creating a metadata report. Developing a custom all-in-one solution for a given filesystem requires too much time to create when other tools, both paid and free, exist that support such generic analysis.

Python is useful in pre-processing automation. If you find yourself repeating the same tasks for each piece of evidence, it may be worthwhile to develop a system that automates those steps. A great example of suites that perform such analysis is ManTech's analysis and triage system (mantaray: http://github.com/mantarayforensics), which leverages a series of tools to create general reports that can speed up analysis when there is no scope of what data may exist.

When considering whether to commit resources to develop Python scripts, either on the fly or for larger projects, it is important to consider what solutions already exist, the time available to create a solution, and the time saved through automation. Despite best intentions, the development of solutions can go on for much longer than initially conceived without a strong design plan.

Development life cycle

The development cycle involves at least five steps:

Identify

Plan

Program

Validate

Bugs

The first step is self-explanatory; before you develop, you must identify the problem that needs to be solved. Planning is perhaps the most crucial step in the development cycle:

Good planning will help later by decreasing the amount of code required and the number of bugs. Planning becomes even more vital during the learning process. A forensic programmer must begin to answer the following questions: how will data be ingested, what Python data types are most appropriate, are third-party libraries necessary, and how will the results be displayed to the examiner? In the beginning, just as if we were writing a term paper, it is a good idea to write, or draw, an outline of your program. As you become more proficient in Python, planning will become second nature, but initially, it is recommended to create an outline or write pseudocode.

Pseudocode is an informal way of writing code before filling in the details with actual code. Pseudocode can represent the bare bones of the program, such as defining pertinent variables and functions while describing how they will all fit together within the script's framework. Pseudocode for a function might look like this:

# open the database # read from the database using the sqlite3 library # store in variable called records for record in records: # process database records here

After identifying and planning, the next three steps make up the largest part of the development cycle. Once your program has been sufficiently planned, it is time to start writing code! Once the code is written, break in your new program with as much test data as possible. Especially in forensics, it is critical to thoroughly test your code instead of relying on the results of one example. Without comprehensive debugging, the code can crash when it encounters something unexpected, or, even worse, it could provide the examiner with false information and lead them down the wrong path. After the code has been tested, it is time to release it and prepare for bug reports. We are not talking about insects here! Despite a programmer's best efforts, there will always be bugs in the code. Bugs have a nasty way of multiplying even as you squash one, perpetually causing the programming cycle to begin repeatedly.

The omnipresent print() function

Printing in Python is a very common technique as it allows the developer to display text to the console as the script executes. While there are many differences between Python 2 and 3, the way printing is called is the most obvious change, and is the reason why our previous example primarily only works with Python 3 as it is currently written. With Python 3, print became a function rather than a statement, as was the case with older versions of Python 2. Let's revisit our previous script and see a slight difference.

Note the following for Python 3:

001 print("Hello World!")

Note the following for Python 2:

001 print "Hello World!"

The difference is seemingly minor. In Python 2, where print is a statement, you do not need to wrap what is being printed in parentheses. It would be disingenuous to say the difference is just semantics; however, for now just understand that print is written in two different ways, depending on the version of Python being used. The ramifications of this minor change mean that legacy Python 2 scripts that use print as a statement cannot be executed by Python 3.

Where possible, our scripts will be written to be compatible with both versions of Python. This goal, while seemingly impossible due to the difference in print, can be accomplished by importing a special Python library, called __future__, and changing the print statement to a function. To do this, we need to import the print function from the __future__ library and then write all print commands as function.

The following script executes in both Python 2 and 3:

001 from __future__ import print_function

002 print("Hello World!")

In the previous screenshot, you can see the result of this script in Python 2.7.15 and Python 3.7.1.

Standard data types

With our first script complete, it is time to understand the basic data types of Python. These data types are similar to those found in other programming languages, but are invoked with a simple syntax, which is described in the following table and sections. For a full list of standard data types available in Python, visit the official documentation at https://docs.python.org/3/library/stdtypes.html:

Data Type
Description
Example

Str

String

str(), "Hello", 'Hello'

Unicode

Unicode characters

unicode(), u'hello', "world".encode('utf-8')

Int

Integer

int(), 1, 55

Float

Decimal precision integers

float(), 1.0, .032

Bool

Boolean values

bool(), True, False

List

List of elements

list(), [3, 'asd', True, 3]

Dictionary

Set of key:value pairs used to structure data

dict(), {'element': 'Mn', 'Atomic Number': 25, 'Atomic Mass': 54.938}

Set

List of unique elements

set(), [3, 4, 'hello']

Tuple

Organized list of elements

tuple(), (2, 'Hello World!', 55.6, ['element1'])

File

A file object

open('write_output.txt', 'w')

We are about to dive into the usage of data types in Python, and recommend that you repeat this section as needed to help with comprehension. While reading through how data types are handled is important, please be at a computer where you can run Python when you work through it the first few times. We invite you to explore the data type further in your interpreter and test them to see what they are capable of.

You will find that most of our scripts can be accomplished using only the standard data types Python offers. Before we take a look at one of the most common data types, strings, we will introduce comments.

Something that is always said, and can never be said enough, is to comment your code. In Python, comments are formed by any line beginning with the pound, or more recently known as the hashtag, # symbol. When Python encounters this symbol, it skips the remainder of the line and proceeds to the next line. For comments that span multiple lines, we can use three single or double quotes to mark the beginning and end of the comments rather than using a single pound symbol for every line. What follows are examples of types of comments in a file called comments.py. When running this script, we should only see 10 printed to the console as all comments are ignored:

# This is a commentprint(5 + 5) # This is an inline comment.# Everything to the right of the # symbol# does not get executed"""We can use three quotes to create multi-line comments."""

The output is as follows:

When this code is executed, we only see the preceding at the console.

Strings and Unicode

Strings are a data type that contain any character, including alphanumeric characters, symbols, Unicode, and other codecs. With the vast amount of information that can be stored as a string, it is no surprise they are one of the most common data types. Examples of areas where strings are found include reading arguments at the command line, user input, data from files, and outputting data. To begin, let us look at how we can define a string in Python.

There are three ways to create a string: with single quotes, double quotes, or with the built-in str() constructor method. Note that there is no difference between single- and double-quoted strings. Having multiple ways to create a string is advantageous, as it allows us to differentiate between intentional quotes within a string. For example, in the 'I hate when people use "air-quotes"!' string, we use the single quotes to demarcate the beginning and end of the main string. The double quotes inside the string will not cause any issues with the Python interpreter. Let's verify with the type() function that both single and double quotes create the same type of object:

>>> type('Hello World!')

<class 'str'>

>>> type("Foo Bar 1234")

<class 'str'>

As we saw with comments, a block string can be defined by three single or double quotes to create multi-line strings. The only difference is whether we do something with the block-quoted value or not:

>>> """This is also a string"""

This is also a string

>>> '''it

can span

several lines'''

it\ncan span\nseveral lines

The \n character in the returned line signifies a line feed or a new line. The output in the interpreter displays these newline characters as \n, though when fed into a file or console, a new line is created. The \n character is one of the common escape characters in Python. Escape characters are denoted by a backslash following a specific character. Other common escape characters include \t for horizontal tabs, \r for carriage returns, \', \", and \\ for literal single quotes, double quotes, and backslashes, among others. Literal characters allow us to use these characters without unintentionally using their special meaning in Python's context.

We can also use the add (+) or multiply (*) operators with strings. The add operator is used to concatenate strings together, and the multiply operator will repeat the provided string values:

>>> 'Hello' + ' ' + 'World'

Hello World

>>> "Are we there yet? " * 3

Are we there yet? Are we there yet? Are we there yet?

Let's look at some common functions we use with strings. We can remove characters from the beginning or end of a string using the strip() function. The strip() function requires the character we want to remove as its input, otherwise it will replace whitespace by default. Similarly, the replace()function takes two inputs the character to replace and what to replace it with. The major difference between these two functions is that strip() only looks at the beginning and end of a string:

# This will remove colon (`:`) from the beginning and end of the line

>>> ':HelloWorld:'.strip(':')

HelloWorld

# This will remove the colon (`:`) from the line and place a

# space (` `) in it's place

>>> 'Hello:World'.replace(':', ' ')

Hello World

We can check if a character or characters are in a string using the in statement. Or, we can be more specific, and check if a string startswith() or endswith() a specific character(s) instead (you know a language is easy to understand when you can create sensible sentences out of functions). These methods return True or False Boolean objects:

>>> 'a' in 'Chapter 2'

True

>>> 'Chapter 1'.startswith('Chapter')

True

>>> 'Chapter 1'.endswith('1')

True

We can quickly split a string into a list based on some delimiter. This can be helpful to quickly convert data separated by a delimiter into a list. For example, comma-separated values (CSV) data is separated by commas and could be split on that value:

>>> print("Hello, World!".split(','))

["Hello", " World!"]

Formatting parameters can be used on strings to manipulate them and convert them based on provided values. With the .format() function, we can insert values into strings, pad numbers, and display patterns with simple formatting. This chapter will highlight a few examples of the .format() method, and we will introduce more complex features of it throughout this book. The .format() method replaces curly brackets with the provided values in order.

This is the most basic operation for inserting values into a string dynamically:

>>> "{} {} {} {}".format("Formatted", "strings", "are", "easy!")

'Formatted strings are easy!'

Our second example displays some of the expressions we can use to manipulate a string. Inside the curly brackets, we place a colon, which indicates that we are going to specify a format for interpretation. Following this colon, we specify that there should be at least six characters printed. If the supplied input is not six characters long, we prepend zeroes to the beginning of the input. Lastly, the d character specifies that the input will be a base 10 decimal:

>>> "{:06d}".format(42)

'000042'

Our last example demonstrates how we can easily print a string of 20 equal signs by stating that our fill character is the equals symbol, followed by the caret (to center the symbols in the output), and the number of times to repeat the symbol. By providing this format string, we can quickly create visual separators in our outputs:

>>> "{:=^20}".format('')

'===================='

While we will introduce more advanced features of the .format() method, the site https://pyformat.info/ is a great resource for learning more about the capabilities of Python's string formatting.

Integers and floats

The integer is another valuable data type that is frequently used—an integer is any whole positive or negative number. The float data type is similar, but allows us to use numbers requiring decimal-level precision. With integers and floats, we can use standard mathematical operations, such as: +, -, *, and /. These operations return slightly different results based on the object's type (for example, integer or float).

An integer uses whole numbers and rounding, for example dividing two integers will result in another whole number integer. However, by using one float in the equation, even one that has the same value as the integer will result in a float; for example, 3/2=1 and 3/2.0=1.5 in Python. The following are examples of integer and float operations:

>>> type(1010)

<class 'int'>

>>> 127*66

8382

>>> 66/10

6

>>> 10 * (10 - 8)

20

We can use ** to raise an integer by a power. For example, in the following section, we raise 11 by the power of 2. In programming, it can be helpful to determine the numerator resulting from the division between two integers. For this, we use the modulus or percent (%) symbol. With Python, negative numbers are those with a dash character (-) preceding the value. We can use the built-in abs() function to get the absolute value of an integer or float:

>>> 11**2

121

>>> 11 % 2 # 11 divided by 2 is 5.5 or 5 with a remainder of 1

1

>>> abs(-3)

3

A float is defined by any number with a decimal. Floats follow the same rules and operations as we saw with integers, with the exception of the division behavior described previously:

>>> type(0.123)

<class 'float'>

>>> 1.23 * 5.23

6.4329

>>> 27/8.0

3.375

Boolean and none

The integers 0 and 1 can also represent Boolean values in Python. These values are the Boolean False or True objects, respectively. To define a Boolean, we can use the bool() constructor statement. These data types are used extensively in program logic to evaluate statements for conditionals, as covered later in this chapter.

Another built-in data type is the null type, which is defined by the keyword None. When used, it represents an empty object, and when evaluated will return False. This is helpful when initializing a variable that may use several data types throughout execution. By assigning a null value, the variable remains sanitized until reassigned:

>>> bool(0)

False

>>> bool(1)

True

>>> None

>>>

Structured data types

There are several data types that are more complex and allow us to create structures of raw data. This includes lists, dictionaries, sets, and tuples. Most of these structures are comprised of the previously mentioned data types. These structures are very useful in creating powerful units of values, allowing raw data to be stored in a manageable manner.

Lists

Lists are a series of ordered elements. Lists support any data type as an element and will maintain the order of data as it is appended to the list. Elements can be called by position or a loop can be used to step through each item. In Python, unlike other languages, printing a list takes one line. In languages like Java or C++, it can take three or more lines to print a list. Lists in Python can be as long as needed and can expand or contract on the fly, another feature uncommon in other languages.

We can create lists by using brackets with elements separated by commas. Or, we can use the list() class constructor with an iterable object. List elements can be accessed by index where 0 is the first element. To access an element by position, we place the desired index in brackets following the list object. Rather than needing to know how long a list is (which can be accomplished with the len() function), we can use negative index numbers to access list elements in reference to the end (that is, -3 would retrieve the third to last element):

>>> type(['element1', 2, 6.0, True, None, 234])

<class 'list'>

>>> list((4, 'element 2', None, False, .2))

[4, 'element 2', None, False, 0.2]

>>> len([0,1,2,3,4,5,6])

7

>>> ['hello_world', 'foo bar'][0]

hello_world

>>> ['hello_world', 'foo_bar'][-1]

foo_bar

We can add, remove, or check if a value is in a list using a couple of different functions. The append() method adds data to the end of the list. Alternatively, the insert() method allows us to specify an index when adding data to the list. For example, we can add the string fish to the beginning, or 0 index, of our list:

>>> ['cat', 'dog'].append('fish')

# The list becomes: ['cat', 'dog', 'fish']

>>> ['cat', 'dog'].insert(0, 'fish')

# The list becomes: ['fish', 'cat', 'dog']

The pop() and remove() functions delete data from a list either by index or by a specific object, respectively. If an index is not supplied with the pop() function, the last element in the list is popped. Note that the remove() function only gets rid of the first instance of the supplied object in the list:

>>> [0, 1, 2].pop()

2

# The list is now [0, 1]

>>> [3, 4, 5].pop(1)

4

# The list is now [3, 5]

>>> [1, 1, 2, 3].remove(1)

# The list becomes: [1, 2, 3]

We can use the in statement to check if some object is in the list. The count() function tells us how many instances of an object are in the list:

>>> 'cat' in ['mountain lion', 'ox', 'cat']

True

>>> ['fish', 920.5, 3, 5, 3].count(3)

2

If we want to access a subset of elements, we can use list slice notation. Other objects, such as strings, also support this same slice notation to obtain a subset of data. Slice notation has the following format, where a is our list or string object:

a[x:y:z]

In the preceding example, x represents the start of the slice, y represents the end of the slice, and z represents the step of the slice. Note that each segment is separated by colons and enclosed in square brackets. A negative step is a quick way to reverse the contents of an object that supports slice notation and would be triggered by a negative number as z. Each of these arguments is optional. In the first example, our slice returns the second element and up to, but not including, the fifth element in the list. Using just one of these slice elements returns a list containing everything from the second index forward or everything up to the fifth index:

>>> [0,1,2,3,4,5,6][2:5]

[2, 3, 4]

>>> [0,1,2,3,4,5,6][2:]

[2, 3, 4, 5, 6]

>>> [0,1,2,3,4,5,6][:5]

[0, 1, 2, 3, 4]

Using the third slice element, we can skip every other element or simply reverse the list with a negative one. We can use a combination of these slice elements to specify how to carve a subset of data from the list:

>>> [0,1,2,3,4,5,6][::2]

[0, 2, 4, 6]

>>> [0,1,2,3,4,5,6][::-1]

[6, 5, 4, 3, 2, 1, 0]

Sets and tuples