36,59 €
Learn to implement complex data structures and algorithms using Python
Key Features
Book Description
Data structures allow you to store and organize data efficiently. They are critical to any problem, provide a complete solution, and act like reusable code. Hands-On Data Structures and Algorithms with Python teaches you the essential Python data structures and the most common algorithms for building easy and maintainable applications.
This book helps you to understand the power of linked lists, double linked lists, and circular linked lists. You will learn to create complex data structures, such as graphs, stacks, and queues. As you make your way through the chapters, you will explore the application of binary searches and binary search trees, along with learning common techniques and structures used in tasks such as preprocessing, modeling, and transforming data. In the concluding chapters, you will get to grips with organizing your code in a manageable, consistent, and extendable way. You will also study how to bubble sort, selection sort, insertion sort, and merge sort algorithms in detail.
By the end of the book, you will have learned how to build components that are easy to understand, debug, and use in different applications. You will get insights into Python implementation of all the important and relevant algorithms.
What you will learn
Who this book is for
This book is for developers who want to learn data structures and algorithms in Python to write complex and flexible programs. Basic Python programming knowledge is expected.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 478
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Richa TripathiAcquisition Editor: Denim PintoContent Development Editor: Tiksha SarangTechnical Editor: Mehul SinghCopy Editor:Safis EditingProject Coordinator: Prajakta NaikProofreader: Safis EditingIndexer: Rekha NairGraphics: Jisha ChirayilProduction Coordinator: Shraddha Falebhai
First published: May 2017 Second edition: October 2018
Production reference: 2221118
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78899-557-3
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and videos from over 4,000 industry professionals
Improve your learning with skill plans designed especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and, as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Dr. Basant Agarwal works as an associate professor at Swami Keshvanand Institute of Technology, Management, and Gramothan, India. He has been awarded an M.Tech and Ph.D. from MNIT, Jaipur, India, and has more than 8 years' experience in academia and research. He has been awarded the prestigious PostDoc Fellowship by ERCIM (the European Research Consortium for Informatics and Mathematics) through the Alain Bensoussan Fellowship Programme. He has also worked at Temasek Laboratories, the National University of Singapore. He has authored a book on sentiment analysis in the Springer Book Series: Socio-Affective Computing series, and is published in more than 50 reputed conferences and journals. His research interests are focused on NLP, machine learning, and deep learning.
Benjamin Baka works as a software developer who considers himself to be language agnostic and thus seeks out the elegant solutions to which his toolset can enable him to accomplish. Notable amongst ones are C, Java, Python, and Ruby. With a huge interest in algorithms, he seeks to always write code that borrows from Dr. Knuth's words, both simple and elegant. He also enjoys playing the bass guitar and listening to silence. He currently works with mPedigree Network.
David Julian has written two books Designing Machine Learning Systems with Python, and Deep Learning with Pytorch Quickstart Guide both published by Packt. He has worked for Urban Ecological Systems Pty Ltd on a project to detect insect outbreaks in greenhouse environments using machine learning. He currently works as a technical consultant and information technology trainer for several private and non-government organizations.
Yogendra Sharma is a developer with experience in architecture, design, and the development of scalable and distributed applications, with a core interest in microservices and Spring. He is currently working as an IoT and cloud architect at Intelizign Engineering Services, Pune. He also has hands-on experience with technologies such as AWS Cloud, IoT, Python, J2SE, J2EE, NodeJS, Angular, MongoDB, and Docker. He is constantly exploring technical novelties, and is open-minded and eager to learn about new technologies and frameworks.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
This book would not have been possible without the contribution of many individuals, to whom I would like to express my sincere appreciation and gratitude. First of all, I would like to acknowledge the great support of the Packt Publishing team. I am very grateful to the editor of the book, Tiksha Sarang, whose support throughout its development was marvelous. I would like to express my sincere gratitude to the acquisition editor of the book, Denim Pinto, who has given me the chance to contribute to this book. I would also like to thank Benjamin Baka for great work in the first edition of the book.
I would especially like to thank the editors and technical reviewers of the book for performing the extensive review process. I would like to express my thanks to all the reviewers for their constructive comments and suggestions. I am also grateful to Mehul Singh for his fantastic efforts as a technical reviewer. I would also like to express my gratitude to all those who were involved in the copy editing, proofreading, and production of this book.
I am grateful to the Swami Keshvanand Institute of Technology for providing me with a fantastic work environment and their kind cooperation. I would also like to express my sincere gratitude and thanks to Prof. S. L. Surana, Director (Academics), SKIT, for being a constant source of support and motivation. I am especially thankful to Prof. C. M. Choudhary, head of the Department of Computer Science and Engineering, for his help, advice, support, and encouragement. Special thanks also to all my wonderful friends and colleagues for helping me in eradicating the minor errors and for proofreading the book. I would also like to thank Dr. Mukesh Gupta, Dr. S.R. Dogiwal, and Gaurav Arora for their help.
And last, but by no means least, my sincere thanks go to my family and friends, for their endless encouragement and support throughout the production of this book. They always motivated me to work harder on the book. For any inadequacies that may remain in this book, the responsibility is entirely my own.
Title Page
Copyright and Credits
Hands-On Data Structures and Algorithms with Python Second Edition
Dedication
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Acknowledgments
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Python Objects, Types, and Expressions
Technical requirements
Installing Python
Understanding data structures and algorithms
Python for data
The Python environment
Variables and expressions
Variable scope
Flow control and iteration
Overview of data types and objects
Strings
Lists
Functions as first class objects
Higher order functions
Recursive functions
Generators and co-routines
Classes and object programming
Special methods
Inheritance
Data encapsulation and properties
Summary
Further reading
Python Data Types and Structures
Technical requirements
Built-in data types
None type
Numeric types
Representation error
Membership, identity, and logical operations
Sequences
Learning about tuples
Beginning with dictionaries
Python
Sorting dictionaries
Dictionaries for text analysis
Sets
Immutable sets
Modules for data structures and algorithms
Collections
Deques
ChainMap objects
Counter objects
Ordered dictionaries
defaultdict
Learning about named tuples
Arrays
Summary
Principles of Algorithm Design
Technical requirements
An introduction to algorithms
Algorithm design paradigms
Recursion and backtracking
Backtracking
Divide and conquer – long multiplication
The recursive approach
Runtime analysis
Asymptotic analysis
Big O notation
Composing complexity classes
Omega notation (Ω)
Theta notation (ϴ )
Amortized analysis
Summary
Lists and Pointer Structures
Technical requirements
Beginning with an example
Arrays
Pointer structures
Nodes
Finding endpoints
Node class
Other node types
Introducing lists
Singly linked lists
Singly linked list class
The append operation
A faster append operation
Getting the size of the list
Improving list traversal
Deleting nodes
List search
Clearing a list
Doubly linked lists
A doubly linked list node
Doubly linked list class
Append operation
The delete operation
List search
Circular lists
Appending elements
Deleting an element in a circular list
Iterating through a circular list
Summary
Stacks and Queues
Technical requirements
Stacks
Stack implementation
Push operation
Pop operation
Peek operation
Bracket-matching application
Queues
List-based queues
The enqueue operation
The dequeue operation
Stack-based queues
Enqueue operation
Dequeue operation
Node-based queues
Queue class
The enqueue operation
The dequeue operation
Application of queues
Media player queues
Summary
Trees
Technical requirements
Terminology
Tree nodes
Tree traversal
Depth-first traversal
In-order traversal and infix notation
Pre-order traversal and prefix notation
Post-order traversal and postfix notation
Breadth-first traversal
Binary trees
Binary search trees
Binary search tree implementation
Binary search tree operations
Finding the minimum and maximum nodes
Inserting nodes
Deleting nodes
Searching the tree
Benefits of a binary search tree
Balancing trees
Expression trees
Parsing a reverse Polish expression
Heaps
Ternary search tree
Summary
Hashing and Symbol Tables
Technical requirements
Hashing
Perfect hashing functions
Hash tables 
Storing elements in a hash table
Retrieving elements from the hash table
Testing the hash table
Using [] with the hash table
Non-string keys
Growing a hash table
Open addressing
Chaining
Symbol tables
Summary
Graphs and Other Algorithms
Technical requirements
Graphs
Directed and undirected graphs
Weighted graphs
Graph representations
Adjacency lists
Adjacency matrices
Graph traversals
Breadth-first traversal 
Depth-first search
Other useful graph methods
Priority queues and heaps
Insert operation
Pop operation
Testing the heap
Selection algorithms
Summary
Searching
Technical requirements
Introduction to searching
Linear search
Unordered linear search
Ordered linear search
Binary search
Interpolation search
Choosing a search algorithm
Summary
Sorting
Technical requirements
Sorting algorithms
Bubble sort algorithms
Insertion sort algorithms
Selection sort algorithms
Quick sort algorithms
List partitioning
Pivot selection
An illustration with an example
Implementation
Heap sort algorithms
Summary
Selection Algorithms
Technical requirements
Selection by sorting
Randomized selection
Quick select
Understanding the partition step
Deterministic selection
Pivot selection
Median of medians
Partitioning step
Summary
String Algorithms and Techniques
Technical requirements
String notations and concepts
Pattern matching algorithms
The brute-force algorithm
The Rabin-Karp algorithm
Implementing the Rabin-Karp algorithm
The Knuth-Morris-Pratt algorithm
The prefix function
Understanding KMP algorithms
Implementing the KMP algorithm
The Boyer-Moore algorithm
Understanding the Boyer-Moore algorithm
Bad character heuristic
Good suffix heuristic
Implementing the Boyer-Moore algorithm
Summary
Design Techniques and Strategies
Technical requirements
Classification of algorithms
Classification by implementation
Recursion
Logic
Serial or parallel algorithms
Deterministic versus nondeterministic algorithms
Classification by complexity
Complexity curves
Classification by design
Divide and conquer
Dynamic programming
Greedy algorithms
Technical implementation
Implementation using dynamic programming
Memoization
Tabulation
The Fibonacci series
The memoization technique
The tabulation technique
Implementation using divide and conquer
Divide
Conquer
Merge
Merge sort
Implementation using greedy algorithms
Coin-counting problem
Shortest path algorithm
Complexity classes
P versus NP
NP-Hard
NP-Complete
Summary
Implementations, Applications, and Tools
Technical requirements
Knowledge discovery in data
Data preprocessing
Processing raw data
Missing data
Feature scaling
Min-max scalar form of normalization
Standard scalar
Binarizing data
Learning about machine learning
Types of machine learning
The hello classifier
A supervised learning example
Gathering data
Bag of words
Prediction
An unsupervised learning example
K-means algorithm
Prediction
Data visualization
Bar chart
Multiple bar charts
Box plot
Pie chart
Bubble chart
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Data structures and algorithms are two of the most important core subjects in the study of information technology and computer science engineering. This book aims to provide in-depth knowledge, along with programming implementation experience, of data structures and algorithms. It is designed for graduates and undergraduates who are studying data structures with Python programming at beginner and intermediate level, and explains the complex algorithms through the use of examples.
In this book, you will learn the essential Python data structures and the most common algorithms. This book will provide a basic knowledge of Python and give the reader an insight into data algorithms. In it, we provide Python implementations and explain them in relation to almost every important and popular data structure algorithm. We will look at algorithms that provide solutions to the most common problems in data analysis, including searching and sorting data, as well as being able to extract important statistics from data. With this easy-to-read book, you will learn how to create complex data structures, such as linked lists, stacks, heaps, and queues, as well as sort algorithms, including bubble sort, insertion sort, heapsort, and quicksort. We also describe a variety of selection algorithms, including randomized and deterministic selection. We provide a detailed discussion of various data structure algorithms and design paradigms, such as greedy algorithms, divide-conquer algorithms, and dynamic programming, along with how they can be used in real-time applications. In addition, complex data structures, such as trees and graphs, are explained using straightforward pictorial examples to explore the concepts of these useful data structures. You will also learn various important string processing and pattern matching algorithms, such as KMP, and Boyer-Moore algorithms, along with their easy implementation in Python. You will learn the common techniques and structures used in tasks, including preprocessing, modeling, and transforming data.
The importance of having a good understanding of data structures and algorithms cannot be overemphasized. It is an important arsenal to have at your disposal in order to understand new problems and find elegant solutions to them. By gaining a deeper understanding of algorithms and data structures, you may find uses for them in many more ways than originally intended. You will develop a consideration for the code you write and how it affects the amount of memory. Python has further opened the door to many professionals and students to come to appreciate programming. The language is fun to work with and concise in its description of problems. We leverage the language's mass appeal to examine a number of widely studied and standardized data structures and algorithms. The book begins with a concise tour of the Python programming language. As such, it is not required that you know Python before picking up this book.
This book is intended for Python developers who are studying courses concerned with data structures and algorithms at a beginner or intermediate level. The book is also designed for all those undergraduate and graduate engineering students who attend, or have attended, courses on data structures and algorithms, as it covers almost all the algorithms, concepts, and designs that are studied in this course. Thus, this book can also be adapted as a textbook for data structure and algorithm courses. This book is also a useful tool for generic software developers who want to deploy various applications using a specific data structure as it provides efficient ways of storing the relevant data. It also provides a practical and straightforward approach to learning complex algorithms.
It is assumed that the reader has some basic knowledge of Python. However, it is not mandatory as we provide a quick overview of Python and its object-oriented concepts in this book. There is no requirement to have any prior knowledge of computer-related concepts in order to understand this book, since all the concepts and algorithms are explained in sufficient detail, with a good number of examples and pictorial representations. Most of the concepts are explained with the help of everyday scenarios to make the concepts and algorithms easy to understand.
Chapter 1, Python Objects, Types, and Expressions, introduces you to the basic types and objects of Python. We will give an overview of the language features, execution environment, and programming styles. We will also review the common programming techniques and language functionality.
Chapter 2, Python Data Types and Structures, explains Python's various built-in data types. It also describes each of the five numeric and five sequence data types, as well as one mapping and two set data types, and examines the operations and expressions applicable to each type. We will also provide many examples of typical use cases.
Chapter 3, Principles of Algorithm Design, covers various important data structure design paradigms, such as greedy algorithms, dynamic programming, divide and conquer, recursion, and backtracking. In general, the data structures we create need to conform to a number of principles. These principles include robustness, adaptability, reusability, and separating the structure from a function. We look at the role iteration plays and introduce recursive data structures. We also discuss various Big O notations and complexity classes.
Chapter 4, Lists and Pointer Structures, covers linked lists, which are one of the most common data structures, and are often used to implement other structures, such as stacks and queues. In this chapter, we describe their operation and implementation. We compare their behavior to arrays and discuss the relative advantages and disadvantages of each.
Chapter 5, Stacks and Queues, discusses the behavior of these linear data structures and demonstrates a number of implementations. We give examples of typical real-life example applications.
Chapter 6, Trees, looks at how to implement a binary tree. Trees form the basis of many of the most important advanced data structures. We will examine how to traverse trees and retrieve and insert values. We discuss binary and ternary search trees. We will also look at how to create structures such as heaps.
Chapter 7, Hashing and Symbol Tables, describes symbol tables, gives some typical implementations, and discusses various applications. We will look at the process of hashing, provide an implementation of a hash table, and discuss the various design considerations.
Chapter 8, Graphs and Other Algorithms, looks at some of the more specialized structures, including graphs and spatial structures. Representing data as a set of nodes and vertices is convenient in a number of applications and, from this, we can create structures including directed and undirected graphs. We will also introduce a number of other structures and concepts, such as priority queues, heaps, and selection algorithms.
Chapter 9, Searching, discusses the most common searching algorithms, for example, binary search and interpolation searching algorithms. We also give examples of their uses in relation to various data structures. Searching for a data structure is a key task and there are a number of different approaches.
Chapter 10, Sorting, looks at the most common approaches to sorting. These approaches include bubble sort, insertion sort, selection sort, quick sort, and heap sort algorithms. This chapter provides a detailed explanation of each, along with their Python implementation.
Chapter 11, Selection Algorithms, covers algorithms that involve finding statistics, such as the minimum, maximum, or median elements in a list. The chapter also discusses various selection algorithms for locating a specific element in a list by sorting, as well as randomized and deterministic selection algorithms.
Chapter 12, String Algorithms and Techniques, covers basic concepts and definitions related to strings. Various string and pattern matching algorithms are discussed in detailed, such as the naïve approach, Knuth-Morris-Pratt (KMP), and Boyer-Moore pattern matching algorithms.
Chapter 13, Design Techniques and Strategies, relates to how we look for solutions for similar problems when we are trying to solve a new problem. Understanding how we can classify algorithms and the types of problem that they most naturally solve is a key aspect of algorithm design. There are many ways in which we can classify algorithms, but the most useful classifications tend to revolve around either the implementation method or the design method. This chapter explains various algorithm design paradigms using many important applications, such as mergesort, Dijkstra's shortest path algorithm, and the coin-counting problem.
Chapter 14, Implementations, Applications, and Tools, discusses a variety of real-world applications. These include data analysis, machine learning, prediction, and visualization. In addition, there are libraries and tools that make our work with algorithms more productive and enjoyable.
The code in the book will require you to run on Python 3.7 or higher.
The Python interactive environment can also be used to run the code snippets.
Readers are advised to learn the algorithms and concepts by executing the codes provided in the book that are designed to facilitate understanding of the algorithms.
The book aims to give readers practical exposure, so it is recommended that you carry out programming for all the algorithms in order to get the maximum out of this book.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Python-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781788995573_ColorImages.pdf
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
Data structures and algorithms are two of the core elements of a large and complex software project. They are a systematic way of storing and organizing data in software so that it can be used efficiently. Python has efficient high-level data structures and an effective object-oriented programming language. Python is the language of choice for many advanced data tasks, for a very good reason. It is one of the easiest advanced programming languages to learn. Intuitive structures and semantics mean that for people who are not computer scientists, but maybe biologists, statisticians, or the directors of a start-up, Python is a straightforward way to perform a wide variety of data tasks. It is not just a scripting language, but a full-featured, object-oriented programming language.
In Python, there are many useful data structures and algorithms built into the language. Also, because Python is an object-based language, it is relatively easy to create custom data objects. In this book, we will examine Python's internal libraries and some of the external libraries, and we'll learn how to build your own data objects from first principles.
In this chapter, we will look at the following topics:
Obtaining a general working knowledge of data structures and algorithms
Understanding core data types and their functions
Exploring the object-oriented aspects of the Python programming language
The data structures and algorithms are presented using the Python programming language (version 3.7) in this book. This book does assume that you know Python. However, if you are a bit rusty, coming from another language, or do not know Python at all, don't worry—this first chapter should get you quickly up to speed.
The following is the GitHub link: https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Python-Second-Edition/tree/master/Chapter01.
To install Python, we use the following method.
Python is an interpreted language, and statements are executed line by line. A programmer can typically write down the series of commands in a source code file. For Python, the source code is stored in a file with a .py file extension.
Python is fully integrated and usually already installed on most of the Linux and Mac operating systems. Generally, the pre-installed Python version is 2.7. You can check the version installed on the system using the following commands:
>>> import sys>>> print(sys.version)3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)]
You can also install a different version of Python using the following commands on Linux:
Open the Terminal
sudo apt-get update
sudo apt-get install -y python3-pip
pip3 install <package_name>
Python has to be installed on systems with Windows operating systems, as it is not pre-installed, unlike Linux/macOS. Any version of Python can be downloaded from this link: https://www.python.org/downloads/. You can download the software installer and run it—select Install for all users and then click on Next. You need to specify the location where you want to install the package, then click Next. After that, select the option Add Python to environment variables in the Customize Python dialog box, then just click Next again for final installation. When the installation is finished, you can confirm the installation by opening up Command Prompt and typing the following command:
python -V
The latest stable Python version is Python 3.7.0. The Python program can be executed by typing the following in the command line:
python <sourcecode_filename>.py
Algorithms and data structures are the most fundamental concepts in computing. They are the main building blocks from which complex software is built. Having an understanding of these foundation concepts is extremely important in software design and this involves the following three characteristics:
How algorithms manipulate information contained within data structures
How data is arranged in memory
What the performance characteristics of particular data structures are
In this book, we will examine the topic from several perspectives. Firstly, we will look at the fundamentals of the Python programming language from the perspective of data structures and algorithms. Secondly, it is important that we have the correct mathematical tools. We need to understand the fundamental concepts of computer science and for this we need mathematics. By taking a heuristic approach, developing some guiding principles means that, in general, we do not need any more than high school mathematics to understand the principles of these key ideas.
Another important aspect is an evaluation. Measuring the performance of algorithms requires an understanding of how the increase in data size affects operations on that data. When we are working on large datasets or real-time applications, it is essential that our algorithms and structures are as efficient as they can be.
Finally, we need a strong experimental design strategy. Being able to conceptually translate a real-world problem into the algorithms and data structures of a programming language involves being able to understand the important elements of a problem and a methodology for mapping these elements to programming structures.
To better understand the importance of algorithmic thinking, let's consider a real-world example. Imagine we are at an unfamiliar market and we are given the task of purchasing a list of items. We assume that the market is laid out randomly, each vendor sells a random subset of items, and some of these items may be on our list. Our aim is to minimize the price for each item we buy, as well as minimize the time spent at the market. One way to approach this problem is to write an algorithm like the following:
1. Does the vendor have items that are on our list and the cost is less than a predicted cost for that item?
2. If yes, buy and remove from list; if no, move on to the next vendor.
3. If no more vendors, end.
This is a simple iterator, with a decision and an action. If we have to implement this using programming language, we would need data structures to define and store in memory both the list of items we want to buy and the list of items the vendor is selling. We would need to determine the best way of matching items in each list and we need some sort of logic to decide whether to purchase or not.
There are several observations that we can make regarding this algorithm. Firstly, since the cost calculation is based on a prediction, we don't know what the real cost is. As such, we do not purchase an item because we underpredicted the cost of the item, and we reach the end of the market with items remaining on our list. To handle this situation, we need an effective way of storing the data so that we can efficiently backtrack to the vendor with the lowest cost.
Also, we need to understand the time taken to compare items on our shopping list with the items being sold by each vendor. It is important because as the number of items on our shopping list, or the number of items sold by each vendor, increases, searching for an item takes a lot more time. The order in which we search through items and the shape of the data structures can make a big difference to the time it takes to do a search. Clearly, we would like to arrange our list as well as the order we visit each vendor in such a way that we minimize the search time.
Also, consider what happens when we change the buy condition to purchase at the cheapest price, not just the below-average predicted price. This changes the problem entirely. Instead of sequentially going from one vendor to the next, we need to traverse the market once and, with this knowledge, we can order our shopping list with regards to the vendors we want to visit.
Obviously, there are many more subtleties involved in translating a real-world problem into an abstract construct such as a programming language. For example, as we progress through the market, our knowledge of the cost of a product improves, so our predicted average-price variable becomes more accurate until, by the last stall, our knowledge of the market is perfect. Assuming any kind of backtracking algorithm incurs a cost, we can see cause to review our entire strategy. Conditions such as high price variability, the size and shape of our data structures, and the cost of backtracking all determine the most appropriate solution. The whole discussion clearly demonstrates the importance of data structures and algorithms in building a complex solution.
Python has several built-in data structures, including lists, dictionaries, and sets, which we use to build customized objects. In addition, there are a number of internal libraries, such as collections and math object, which allow us to create more advanced structures as well as perform calculations on those structures. Finally, there are the external libraries such as those found in the SciPy packages. These allow us to perform a range of advanced data tasks such as logistic and linear regression, visualization, and mathematical calculations, such as operations on matrices and vectors. External libraries can be very useful for an out-of-the-box solution. However, we must also be aware that there is often a performance penalty compared to building customized objects from the ground up. By learning how to code these objects ourselves, we can target them to specific tasks, making them more efficient. This is not to exclude the role of external libraries and we will look at this in Chapter 12, Design Techniques and Strategies.
To begin, we will take an overview of some of the key language features that make Python such a great choice for data programming.
Python is one of the most popular and extensively used programming languages all over the world due to its readability and flexibility. A feature of the Python environment is its interactive console, allowing you to both use Python as a desktop-programmable calculator and also as an environment to write and test snippets of code.
The read...evaluate...print loop of the console is a very convenient way to interact with a larger code base, such as to run functions and methods or to create instances of classes. This is one of the major advantages of Python over compiled languages such as C/C++ or Java, where the write...compile...test...recompile cycle can increase development time considerably compared to Python's read...evaluate...print loop. Being able to type in expressions and get an immediate response can greatly speed up data science tasks.
There are some excellent distributions of Python apart from the official CPython version. Two of the most popular are available at: Anaconda (https://www.continuum.io/downloads) and Canopy (https://www.enthought.com/products/canopy/). Most distributions come with their own developer environments. Both Canopy and Anaconda include libraries for scientific, machine learning, and other data applications. Most distributions come with an editor.
There are also a number of implementations of the Python console, apart from the CPython version. Most notable among these is the IPython/Jupyter platform which is based on a web-based computational environment.
To solve a real-world problem through algorithm implementation, we first have to select the variables and then apply the operations on these variables. Variables are labels that are attached to the objects. Variables are not objects nor containers for objects; they only act as a pointer or a reference to the object. For example, consider the following code:
Here, we have created a variable, a, that points to a list object. We create another variable, b, that points to this same list object. When we append an element to this list object, this change is reflected in both a and b.
In Python, variable names are attached to different data types during the program execution; it is not required to first declare the datatype for the variables. Each value is of a type (for example, a string or integer); however, the variable name that points to this value does not have a specific type. More specifically, variables point to an object that can change their type depending on the kind of values assigned to them. Consider the following example:
In the preceding code example, the type of a is changed from int to float, depending upon the value stored in the variable.
Python contains various built-in data types. These include four numeric types (int, float, complex, bool), four sequence types (str, list, tuple, range), one mapping type (dict), and two set types. It is also possible to create user-defined objects, such as functions or classes. We will look at the string and the list data types in this chapter and the remaining built-in types in the next chapter.
All data types in Python are objects. In fact, pretty much everything is an object in Python, including modules, classes, and functions, as well as literals such as strings and integers. Each object in Python has a type, a value, and an identity. When we write greet= "helloworld", we are creating an instance of a string object with the value "hello world" and the identity of greet. The identity of an object acts as a pointer to the object's location in memory. The type of an object, also known as the object's class, describes the object's internal representation, as well as the methods and operations it supports. Once an instance of an object is created, its identity and type cannot be changed.
We can get the identity of an object by using the built-in function id(). This returns an identifying integer and on most systems, this refers to its memory location, although you should not rely on this in any of your code.
Also, there are a number of ways to compare objects; for example, see the following:
if a==b: # a and b have the same valueif a is b: # if a and b are the same objectif type(a) is type(b): #a and b are the same type
An important distinction needs to be made between mutable and immutable objects. Mutable objects such as lists can have their values changed. They have methods, such as insert() or append(), that change an object's value. Immutable objects such as strings cannot have their values changed, so when we run their methods, they simply return a value rather than change the value of an underlying object. We can, of course, use this value by assigning it to a variable or using it as an argument in a function. For example, the int class is immutable—once an instance of it is created, its value cannot be changed, however, an identifier referencing this object can be reassigned another value.
Strings are immutable sequence objects, with each character representing an element in the sequence. As with all objects, we use methods to perform operations. Strings, being immutable, do not change the instance; each method simply returns a value. This value can be stored as another variable or given as an argument to a function or method.
The following table is a list of some of the most commonly used string methods and their descriptions:
Method
Description
s.capitalize
Returns a string with only the first character capitalized, the rest remaining lowercase.
s.count(substring,[start,end])
Counts occurrences of a substring.
s.expandtabs([tabsize])
Replaces tabs with spaces.
s.endswith(substring,[start, end]
Returns
True
if a string ends with a specified substring.
s.find(substring,[start,end])
Returns index of first presence of a substring.
s.isalnum()
Returns
True
if all chars are alphanumeric of string
s
.
s.isalpha()
Returns
True
if all chars are alphabetic of string
s
.
s.isdigit()
Returns
True
if all chars are digits in the string.
s.split([separator],[maxsplit])
Splits a string separated by whitespace or an optional separator. Returns a list.
s.join(t)
Joins the strings in sequence
t
.
s.lower()
Converts the string to all lowercase.
s.replace(old, new[maxreplace])
Replaces old substring with a new substring.
s.startswith(substring, [start, end]])
Returns
True
if the string starts with a specified substring.
s.swapcase()
Returns a copy of the string with swapped case in the string.
s.strip([characters])
Removes whitespace or optional characters.
s.lstrip([characters])
Returns a copy of the string with leading characters removed.
Strings, like all sequence types, support indexing and slicing. We can retrieve any character from a string by using its index s[i]. We can retrieve a slice of a string by using s[i:j], where i and j are the start and end points of the slice. We can return an extended slice by using a stride, as in the following—s[i:j:stride]. The following code should make this clear:
The first two examples are pretty straightforward, returning the character located at index 1 and the first seven characters of the string, respectively. Notice that indexing begins at 0. In the third example, we are using a stride of 2. This results in every second character being returned. In the final example, we omit the end index and the slice returns every second character in the entire string.
You can use any expression, variable, or operator as an index as long as the value is an integer:
Another common operation is traversing through a string with a loop:
Given that strings are immutable, a common question that arises is how we perform operations such as inserting values. Rather than changing a string, we need to think of ways to build new string objects for the results we need. For example, if we wanted to insert a word into our greeting, we could assign a variable to the following:
As this code shows, we use the slice operator to split the string at index position 5 and use + to concatenate. Python never interprets the contents of a string as a number. If we need to perform mathematical operations on a string, we need to first convert them to a numeric type:
List is one of the most commonly used built-in data structures, as they can store any number of different data types. They are simple representations of objects and are indexed by integers starting from zero, as we saw in the case of strings.
The following table contains the most commonly used list methods and their descriptions:
Method
Description
list(s)
Returns a list of sequence s.
s.append(x)
Appends element x at the end of list s.
s.extend(x)
Appends list x at the end of list s.
s.count(x)
Returns the count of the occurrence of x in list s.
s.index(x,[start],[stop])
Returns the smallest index i, where s[i]==x. We can include an optional start and stop index for the lookup.
s.insert(i,e)
Inserts x at index i.
s.pop(i)
Returns the element i and removes it from the list s.
s.remove(x)
Removes element x from the list s.
s.reverse()
Reverses the order of list s.
s.sort(key,[reverse])
Sorts list s with optional key and reverses it.
In Python, lists implementation is different when compared to other languages. Python does not create multiple copies of a variable. For example, when we assign a value of one variable in another variable, both variables point to the same memory address where the value is stored. A copy would only be allocated if the variables change their values. This feature makes Python memory efficient, in the sense that it only creates multiple copies when it is required.
This has important consequences for mutable compound objects such as lists. Consider the following code:
In the preceding code, both the list1 and list2 variables are pointing to the same memory location. However, when we change the y through list2 to 4, we are actually changing the same y variable that list1 is pointing to as well.
An important feature of list is that it can contain nested structures; that is, list can contain other lists. For example, in the following code, list items contains three other lists:
We can access the values of the list using the bracket operators and, since lists are mutable, they are copied in place. The following example demonstrates how we can use this to update elements; for example, here we are raising the price of flour by 20 percent:
We can create a list from expressions using a very common and intuitive method; that is, list comprehensions. It allows us to create a list through an expression directly into the list. Consider the following example, where a list l is created using this expression:
List comprehensions can be quite flexible; for example, consider the following code. It essentially shows two different ways to performs a function composition, where we apply one function (x*4) to another (x*2). The following code prints out two lists representing the function composition of f1 and f2, calculated first using a for loop and then using a list comprehension:
def f1(x): return x*2 def f2(x): return x*4lst=[]for i in range(16): lst.append(f1(f2(i)))print(lst)print([f1(x) for x in range(64) if x in [f2(j) for j in range(16)]])
The first line of output is from the for loop construct. The second is from the list comprehension expression:
List comprehensions can also be used to replicate the action of nested loops in a more compact form. For example, we multiply each of the elements contained within list1 with each other:
We can also use list comprehensions with other objects such as strings, to build more complex structures. For example, the following code creates a list of words and their letter count:
As we will see, lists form the foundation of many of the data structures we will look at. Their versatility, ease of creation, and use enable them to build more specialized and complex data structures.
In Python, it is not only data types that are treated as objects. Both functions and classes are what are known as first class objects, allowing them to be manipulated in the same ways as built-in data types. By definition, first class objects are the following:
Created at runtime
Assigned as a variable or in a data structure
Passed as an argument to a function
Returned as the result of a function
In Python, the term first class object is a bit of a misnomer, since it implies some sort of hierarchy, whereas all Python objects are essentially first class.
To have a look at how this works, let's define a simple function:
def greeting(language): if language=='eng': return 'hello world' if language =='fr' return 'Bonjour le monde' else: return 'language not supported'
Since user-defined functions are objects, we can do things such as include them in other objects, such as lists:
Functions can also be used as arguments for other functions. For example, we can define the following function:
Here, callf() takes a function as an argument, sets a language variable to 'eng', and then calls the function with the language variable as its argument. We could see how this would be useful if, for example, we wanted to produce a program that returns specific sentences in a variety of languages, perhaps for some sort of natural language application. Here, we have a central place to set the language. As well as our greeting function, we could create similar functions that return different sentences. By having one point where we set the language, the rest of the program logic does not have to worry about this. If we want to change the language, we simply change the language variable and we can keep everything else the same.
Functions that take other functions as arguments, or that return functions, are called higher order functions. Python 3 contains two built-in higher order functions—filter() and map(). Note that in earlier versions of Python, these functions returned lists; in Python 3, they return an iterator, making them much more efficient. The map() function provides an easy way to transform each item into an iterable object. For example, here is an efficient, compact way to perform an operation on a sequence. Note the use of the lambda anonymous function:
Similarly, we can use the filter built-in function to filter items in a list:
