20,39 €
Python is easy to learn and extensible programming language that allows any manner of secret agent to work with a variety of data. Agents from beginners to seasoned veterans will benefit from Python's simplicity and sophistication. The standard library provides numerous packages that move beyond simple beginner missions. The Python ecosystem of related packages and libraries supports deep information processing.
This book will guide you through the process of upgrading your Python-based toolset for intelligence gathering, analysis, and communication. You'll explore the ways Python is used to analyze web logs to discover the trails of activities that can be found in web and database servers. We'll also look at how we can use Python to discover details of the social network by looking at the data available from social networking websites.
Finally, you'll see how to extract history from PDF files, which opens up new sources of data, and you’ll learn about the ways you can gather data using an Arduino-based sensor device.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 278
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author,nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2014
Second edition: December 2015
Production reference: 1011215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-340-6
www.packtpub.com
Author
Steven F. Lott
Reviewer
Shubham Sharma
Commissioning Editor
Julian Ursell
Acquisition Editor
Subho Gupta
Content Development Editor
Riddhi Tuljapurkar
Technical Editor
Danish Shaikh
Copy Editor
Vibha Shukla
Project Coordinator
Sanchita Mandal
Proofreader
Safis Editing
Indexer
Priya Sane
Graphics
Kirk D'Penha
Production Coordinator
Komal Ramchandani
Cover Work
Komal Ramchandani
Steven F. Lott has been programming since the 70s, when computers were large, expensive, and rare. As a contract software developer and architect, he has worked on hundreds of projects from very small to very large. He's been using Python to solve business problems for over 10 years.
He's currently leveraging Python to implement microservices and ETL pipelines.
His other titles with Packt Publishing include Python Essentials, Mastering Object-Oriented Python, Functional Python Programming, and Python for Secret Agents.
Steven is currently a technomad who lives in various places on the East Coast of the U.S. His technology blog is http://slott-softwarearchitect.blogspot.com.
Shubham Sharma holds a bachelor's degree in computer science engineering with specialization in business analytics and optimization from UPES, Dehradun. He has a good skill set of programming languages. He also has an experience in web development ,Android, and ERP development and works as a freelancer.
Shubham also loves writing and blogs at www.cyberzonec.in/blog. He is currently working on Python for the optimal specifications and identifications of mobile phones from customer reviews.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and readPackt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Secret agents are dealers and brokers of information. Information that's rare or difficult to acquire has the most value. Getting, analyzing, and sharing this kind of intelligence requires a skilled use of specialized tools. This often includes programming languages such as Python and its vast ecosystem of add-on libraries.
The best agents keep their toolkits up to date. This means downloading and installing the very latest in updated software. An agent should be able to analyze logs and other large sets of data to locate patterns and trends. Social network applications such as Twitter can reveal a great deal of useful information.
An agent shouldn't find themselves stopped by arcane or complex document formats. With some effort, the data in a PDF file can be as accessible as the data in a plain text file. In some cases, agents need to build specialized devices to gather data. A small processing such as an Arduino can gather raw data for analysis and dissemination; it moves the agent to the Internet of Things.
Chapter 1, New Missions – New Tools, addresses the tools that we're going to use. It's imperative that agents use the latest and most sophisticated tools. We'll guide field agents through the procedures required to get Python 3.4. We'll install the Beautiful Soup package, which helps you analyze and extract data from HTML pages. We'll install the Twitter API so that we can extract data from the social network. We'll add PDFMiner3K so that we can dig data out of PDF files. We'll also add the Arduino IDE so that we can create customized gadgets based on the Arduino processor.
Chapter 2, Tracks, Trails, and Logs, looks at the analysis of bulk data. We'll focus on the kinds of logs produced by web servers as they have an interesting level of complexity and contain valuable information on who's providing intelligence data and who's gathering this data. We'll leverage Python's regular expression module, re, to parse log data files. We'll also look at ways in which we can process compressed files using the gzip module.
Chapter 3, Following the Social Network, discusses one of the social networks. A field agent should know who's communicating and what they're communicating about. A network such as Twitter will reveal social connections based on who's following whom. We can also extract meaningful content from a Twitter stream, including text and images.
Chapter 4, Dredging Up History, provides you with essential pointers on extracting useful data from PDF files. Many agents find that a PDF file is a kind of dead-end because the data is inaccessible. There are tools that allow us to extract useful data from PDF. As PDF is focused on high-quality printing and display, it can be challenging to extract data suitable for analysis. We'll show some techniques with the PDFMiner package that can yield useful intelligence. Our goal is to transform a complex file into a simple CSV file, very much similar to the logs that we analyzed in Chapter 2, Tracks, Trails, and Logs.
Chapter 5, Data Collection Gadgets, expands the field agent's scope of operations to the Internet of Things (IoT). We'll look at ways to create simple Arduino sketches in order to read a typical device; in this case, an infrared distance sensor. We'll look at how we will gather and analyze raw data to do instrument calibration.
A field agent needs a computer over which they have administrative privileges. We'll be installing additional software. A secret agent without the administrative password may have trouble installing Python 3 or any of the additional packages that we'll be using.
For agents using Windows, most of the packages will come prebuilt using the .EXE installers.
For agents using Linux, developer's tools are required. The complete suite of developer's tools is generally needed. The Gnu C Compiler (GCC) is the backbone of these tools.
For agents using Mac OS X, the developer's tool, XCode, is required and can be found at https://developer.apple.com/xcode/. We'll also need to install a tool called homebrew (http://brew.sh) to help us add Linux packages to Mac OS X.
Python 3 is available from the Python download page at https://www.python.org/download.
We'll download and install several things beyond Python 3.4 itself:
This book is for field agents who know a little bit of Python and are very comfortable installing new software. Agents must be ready, willing, and able to write some new and clever programs in Python. An agent who has never done any programming before may find some of this a bit advanced; a beginner's tutorial in the basics of Python may be helpful as preparation.
We'll expect that an agent using this book is comfortable with simple mathematics. This involves some basic statistics and elementary geometry.
We expect that secret agents using this book will be doing their own investigations as well. The book's examples are designed to get the agent started down the road to develop interesting and useful applications. Each agent will have to explore further afield on their own.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
The espionage job is to gather and analyze data. This requires us to use computers and software tools.
However, a secret agent's job is not limited to collecting data. It involves processing, filtering, and summarizing data, and also involves confirming the data and assuring that it contains meaningful and actionable information.
Any aspiring agent would do well to study the history of the World War II English secret agent, code-named Garbo. This is an inspiring and informative story of how secret agents operated in war time.
We're going to look at a variety of complex missions, all of which will involve Python 3 to collect, analyze, summarize, and present data. Due to our previous successes, we've been asked to expand our role in a number of ways.
HQ's briefings are going to help agents make some technology upgrades. We're going to locate and download new tools for new missions that we're going to be tackling. While we're always told that a good agent doesn't speculate, the most likely reason for new tools is a new kind of mission and dealing with new kinds of data or new sources. The details will be provided in the official briefings.
Field agents are going to be encouraged to branch out into new modes of data acquisition. Internet of Things leads to a number of interesting sources of data. HQ has identified some sources that will push the field agents in new directions. We'll be asked to push the edge of the envelope.
We'll look at the following topics:
This will give us the tools for a number of data gathering and analytical missions.
The organization responsible for tools and technology is affectionately known as The Puzzle Palace. They have provided some suggestions on what we'll need for the missions that we've been assigned. We'll start with an overview of the state of art in Python tools that are handed down from one of the puzzle solvers.
Some agents have already upgraded to Python 3.4. However, not all agents have done this. It's imperative that we use the latest and greatest tools.
There are four good reasons for this, as follows:
Some agents may want to start looking at Python 3.5. This release is anticipated to include some optional features to provide data type hints. We'll look at this in a few specific cases as we go forward with the mission briefings. The type-analysis features can lead to improvements in the quality of the Python programming that an agent creates. The puzzle palace report is based on intelligence gathered at PyCon 2015 in Montreal, Canada. Agents are advised to follow the Python Enhancement Proposals (PEP) closely. Refer to https://www.python.org/dev/peps/.
We'll focus on Python 3.4. For any agent who hasn't upgraded to Python 3.4.3, we'll look at the best way to approach this.
If you're comfortable with working on your own, you can try to move further and download and install Python 3.5. Here, the warning is that it's very new and it may not be quite as robust as the Python version 3.4. Refer to PEP 478 (https://www.python.org/dev/peps/pep-0478/) for more information about this release.
It's important to consider each major release of Python as an add-on and not a replacement. Any release of Python 2 should be left in place. Most field agents will have several side-by-side versions of Python on their computers. The following are the two common scenarios:
We have to distinguish between the major, minor, and micro versions of Python. Python 3.4.3 and 3.4.2 have the same minor version (3.4). We can replace the micro version 3.4.2 with 3.4.3 without a second thought; they're always compatible with each other. However, we don't treat the minor versions quite so casually. We often want to leave 3.3 in place.
Generally, we do a field upgrade as shown in the following:
At this point, we should be able to confirm that our basic toolset works. Linux and Mac OS agents can use the following command:
This should confirm that we've downloaded and installed Python and made it a part of our OS settings. The greeting will show which micro version of Python 3.4 have we installed.
For Windows, the command's name is usually just python. It would look similar to the following:
The Mac OS X interaction should include the version; it will look similar to the following code:
We've entered the python3.4 command. This shows us that things are working very nicely. We have Python 3.4.3 successfully installed.
We don't want to make a habit of using the python or python3 commands in order to run Python from the command line. These names are too generic and we could accidentally use Python 3.3 or Python 3.5, depending on what we have installed. We need to be intentional about using Python3.4.
The first time that we try to use pip3.4, we may see an interaction as shown in the following:
The version numbers may be slightly different; this is not too surprising. The packaged version of pip isn't always the latest and greatest version. Once we've installed the Python package, we can upgrade pip3.4 to the recent release. We'll use pip to upgrade itself.
It looks similar to the following code:
We've run the pip installer to upgrade pip. We're shown some details about the files that are downloaded and new is version installed. We were able to do this with a simple pip3.4 under Mac OS X.
Some packages will require system privileges that are available via the sudo command. While it's true that a few packages don't require system privileges, it's easy to assume that privileges are always required. For Windows, of course, we don't use sudo at all.
On Mac OS X, we'll often need to use sudo -H instead of simply using sudo. This option will make sure that the proper HOME environment variable is used to manage a cache directory.
Note that your actual results may differ from this example, depending on how out-of-date your copy of pip turns out to be. This pip install --upgrade pip is a pretty frequent operation as the features advance.