E-Book
31,19 €

Hands-On Data Science and Python Machine Learning E-Book

Frank Kane

0,0

31,19 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Lebensstil
Sprache: Englisch

Beschreibung

Join Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them.
Based on Frank’s successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 558

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Hands-On Data Science and Python Machine Learning

Perform data mining and machine learning efficiently using Python and Spark

Frank Kane

BIRMINGHAM - MUMBAI

Hands-On Data Science and Python Machine Learning

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2017

Production reference: 1300717

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78728-074-8

www.packtpub.com

Credits

Author

Frank Kane

Proofreader

Safis Editing

Acquisition Editor

Ben Renow-Clarke

Indexer

Tejal Daruwale Soni

Content Development Editor

Khushali Bhangde

Graphics

Jason Monteiro

Technical Editor

Nidhisha Shetty

Production Coordinator

Arvindkumar Gupta

Copy Editor

Tom Jacob

About the Author

My name is Frank Kane. I spent nine years at amazon.com and imdb.com, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787280748.

If you'd like to join our team of regular reviewers, you can email us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Getting Started

Installing Enthought Canopy

Giving the installation a test run

If you occasionally get problems opening your IPNYB files

Using and understanding IPython (Jupyter) Notebooks

Python basics - Part 1

Understanding Python code

Importing modules

Data structures

Experimenting with lists

Pre colon

Post colon

Negative syntax

Adding list to list

The append function

Complex data structures

Dereferencing a single element

The sort function

Reverse sort

Tuples

Dereferencing an element

List of tuples

Dictionaries

Iterating through entries

Python basics - Part 2

Functions in Python

Lambda functions - functional programming

Understanding boolean expressions

The if statement

The if-else loop

Looping

The while loop

Exploring activity

Running Python scripts

More options than just the IPython/Jupyter Notebook

Running Python scripts in command prompt

Using the Canopy IDE

Summary

Statistics and Probability Refresher, and Python Practice

Types of data

Numerical data

Discrete data

Continuous data

Categorical data

Ordinal data

Mean, median, and mode

Mean

Median

The factor of outliers

Mode

Using mean, median, and mode in Python

Calculating mean using the NumPy package

Visualizing data using matplotlib

Calculating median using the NumPy package

Analyzing the effect of outliers

Calculating mode using the SciPy package

Some exercises

Standard deviation and variance

Variance

Measuring variance

Standard deviation

Identifying outliers with standard deviation

Population variance versus sample variance

The Mathematical explanation

Analyzing standard deviation and variance on a histogram

Using Python to compute standard deviation and variance

Try it yourself

Probability density function and probability mass function

The probability density function and probability mass functions

Probability density functions

Probability mass functions

Types of data distributions

Uniform distribution

Normal or Gaussian distribution

The exponential probability distribution or Power law

Binomial probability mass function

Poisson probability mass function

Percentiles and moments

Percentiles

Quartiles

Computing percentiles in Python

Moments

Computing moments in Python

Summary

Matplotlib and Advanced Probability Concepts

A crash course in Matplotlib

Generating multiple plots on one graph

Saving graphs as images

Adjusting the axes

Adding a grid

Changing line types and colors

Labeling axes and adding a legend

A fun example

Generating pie charts

Generating bar charts

Generating scatter plots

Generating histograms

Generating box-and-whisker plots

Try it yourself

Covariance and correlation

Defining the concepts

Measuring covariance

Correlation

Computing covariance and correlation in Python

Computing correlation – The hard way

Computing correlation – The NumPy way

Correlation activity

Conditional probability

Conditional probability exercises in Python

Conditional probability assignment

My assignment solution

Bayes' theorem

Summary

Predictive Models

Linear regression

The ordinary least squares technique

The gradient descent technique

The co-efficient of determination or r-squared

Computing r-squared

Interpreting r-squared

Computing linear regression and r-squared using Python

Activity for linear regression

Polynomial regression

Implementing polynomial regression using NumPy

Computing the r-squared error

Activity for polynomial regression

Multivariate regression and predicting car prices

Multivariate regression using Python

Activity for multivariate regression

Multi-level models

Summary

Machine Learning with Python

Machine learning and train/test

Unsupervised learning

Supervised learning

Evaluating supervised learning

K-fold cross validation

Using train/test to prevent overfitting of a polynomial regression

Activity

Bayesian methods - Concepts

Implementing a spam classifier with Naïve Bayes

Activity

K-Means clustering

Limitations to k-means clustering

Clustering people based on income and age

Activity

Measuring entropy

Decision trees - Concepts

Decision tree example

Walking through a decision tree

Random forests technique

Decision trees - Predicting hiring decisions using Python

Ensemble learning – Using a random forest

Activity

Ensemble learning

Support vector machine overview

Using SVM to cluster people by using scikit-learn

Activity

Summary

Recommender Systems

What are recommender systems?

User-based collaborative filtering

Limitations of user-based collaborative filtering

Item-based collaborative filtering

Understanding item-based collaborative filtering

How item-based collaborative filtering works?

Collaborative filtering using Python

Finding movie similarities

Understanding the code

The corrwith function

Improving the results of movie similarities

Making movie recommendations to people

Understanding movie recommendations with an example

Using the groupby command to combine rows

Removing entries with the drop command

Improving the recommendation results

Summary

More Data Mining and Machine Learning Techniques

K-nearest neighbors - concepts

Using KNN to predict a rating for a movie

Activity

Dimensionality reduction and principal component analysis

Dimensionality reduction

Principal component analysis

A PCA example with the Iris dataset

Activity

Data warehousing overview

ETL versus ELT

Reinforcement learning

Q-learning

The exploration problem

The simple approach

The better way

Fancy words

Markov decision process

Dynamic programming

Summary

Dealing with Real-World Data

Bias/variance trade-off

K-fold cross-validation to avoid overfitting

Example of k-fold cross-validation using scikit-learn

Data cleaning and normalisation

Cleaning web log data

Applying a regular expression on the web log

Modification one - filtering the request field

Modification two - filtering post requests

Modification three - checking the user agents

Filtering the activity of spiders/robots

Modification four - applying website-specific filters

Activity for web log data

Normalizing numerical data

Detecting outliers

Dealing with outliers

Activity for outliers

Summary

Apache Spark - Machine Learning on Big Data

Installing Spark

Installing Spark on Windows

Installing Spark on other operating systems

Installing the Java Development Kit

Installing Spark

Spark introduction

It's scalable

It's fast

It's young

It's not difficult

Components of Spark

Python versus Scala for Spark

Spark and Resilient Distributed Datasets (RDD)

The SparkContext object

Creating RDDs

Creating an RDD using a Python list

Loading an RDD from a text file

More ways to create RDDs

RDD operations

Transformations

Using map()

Actions

Introducing MLlib

Some MLlib Capabilities

Special MLlib data types

The vector data type

LabeledPoint data type

Rating data type

Decision Trees in Spark with MLlib

Exploring decision trees code

Creating the SparkContext

Importing and cleaning our data

Creating a test candidate and building our decision tree

Running the script

K-Means Clustering in Spark

Within set sum of squared errors (WSSSE)

Running the code

TF-IDF

TF-IDF in practice

Using TF- IDF

Searching wikipedia with Spark MLlib

Import statements

Creating the initial RDD

Creating and transforming a HashingTF object

Computing the TF-IDF score

Using the Wikipedia search engine algorithm

Running the algorithm

Using the Spark 2.0 DataFrame API for MLlib

How Spark 2.0 MLlib works

Implementing linear regression

Summary

Testing and Experimental Design

A/B testing concepts

A/B tests

Measuring conversion for A/B testing

How to attribute conversions

Variance is your enemy

T-test and p-value

The t-statistic or t-test

The p-value

Measuring t-statistics and p-values using Python

Running A/B test on some experimental data

When there's no real difference between the two groups

Does the sample size make a difference?

Sample size increased to six-digits

Sample size increased seven-digits

A/A testing

Determining how long to run an experiment for

A/B test gotchas

Novelty effects

Seasonal effects

Selection bias

Auditing selection bias issues

Data pollution

Attribution errors

Summary

Preface

Being a data scientist in the tech industry is one of the most rewarding careers on the planet today. I went and studied actual job descriptions for data scientist roles at tech companies and I distilled those requirements down into the topics that you'll see in this course.

Hands-On Data Science and Python Machine Learning is really comprehensive. We'll start with a crash course on Python and do a review of some basic statistics and probability, but then we're going to dive right into over 60 topics in data mining and machine learning. That includes things such as Bayes' theorem, clustering, decision trees, regression analysis, experimental design; we'll look at them all. Some of these topics are really fun.

We're going to develop an actual movie recommendation system using actual user movie rating data. We're going to create a search engine that actually works for Wikipedia data. We're going to build a spam classifier that can correctly classify spam and nonspam emails in your email account, and we also have a whole section on scaling this work up to a cluster that runs on big data using Apache Spark.

If you're a software developer or programmer looking to transition into a career in data science, this course will teach you the hottest skills without all the mathematical notation and pretense that comes along with these topics. We're just going to explain these concepts and show you some Python code that actually works that you can dive in and mess around with to make those concepts sink home, and if you're working as a data analyst in the finance industry, this course can also teach you to make the transition into the tech industry. All you need is some prior experience in programming or scripting and you should be good to go.

The general format of this book is I'll start with each concept, explaining it in a bunch of sections and graphical examples. I will introduce you to some of the notations and fancy terminologies that data scientists like to use so you can talk the same language, but the concepts themselves are generally pretty simple. After that, I'll throw you into some actual Python code that actually works that we can run and mess around with, and that will show you how to actually apply these ideas to actual data. These are going to be presented as IPython Notebook files, and that's a format where I can intermix code and notes surrounding the code that explain what's going on in the concepts. You can take these notebook files with you after going through this book and use that as a handy-quick reference later on in your career, and at the end of each concept, I'll encourage you to actually dive into that Python code, make some modifications, mess around with it, and just gain more familiarity by getting hands-on and actually making some modifications, and seeing the effects they have.

Who this book is for

If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply email feedback@packtpub.com, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

Enter the name of the book in the

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Data-Science-and-Python-Machine-Learning. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/HandsOnDataScienceandPythonMachineLearning_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at copyright@packtpub.com with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.

Getting Started

Since there's going to be code associated with this book and sample data that you need to get as well, let me first show you where to get that and then we'll be good to go. We need to get some setup out of the way first. First things first, let's get the code and the data that you need for this book so you can play along and actually have some code to mess around with. The easiest way to do that is by going right to this - Getting Started.

In this chapter, we will first install and get ready in a working Python environment:

Installing Enthought Canopy

Installing Python libraries

How to work with the IPython/Jupyter Notebook

How to use, read and run the code files for this book

Then we'll dive into a crash course into understanding Python code:

Python basics - part 1

Understanding Python code

Importing modules

Experimenting with lists

Tuples

Python basics - part 2

Running Python scripts

You'll have everything you need for an amazing journey into data science with Python, once we've set up your environment and familiarized you with Python in this chapter.

Installing Enthought Canopy

Let's dive right in and get what you need installed to actually develop Python code with data science on your desktop. I'm going to walk you through installing a package called Enthought Canopy which has both the development environment and all the Python packages you need pre-installed. It makes life really easy, but if you already know Python you might have an existing Python environment already on your PC, and if you want to keep using it, maybe you can.

The most important thing is that your Python environment has Python 3.5 or newer, that it supports Jupyter Notebooks (because that's what we're going to use in this course), and that you have the key packages you need for this book installed on your environment. I'll explain exactly how to achieve a full installation in a few simple steps - it's going to be very easy.

Let's first overview those key packages, most of which Canopy will be installing for us automatically for us. Canopy will install Python 3.5 for us, and some further packages we need including: scikit_learn, xlrd, and statsmodels. We'll need to manually use the pip command, to install a package called pydot2plus. And that will be it - it's very easy with Canopy!

Once the following installation steps are complete, we'll have everything we need to actually get up and running, and so we'll open up a little sample file and do some data science for real. Now let's get you set up with everything you need to get started as quickly as possible:

The first thing you will need is a development environment, called an IDE, for Python code. What we're going to use for this book is Enthought Canopy. It's a scientific computing environment, and it's going to work well with this book:

To get Canopy installed, just go to

www.enthought.com

and click on

DOWNLOADS: Canopy

Enthought Canopy is free, for the Canopy Express edition - which is what you want for this book. You must then select your operating system and architecture. For me, that's Windows 64-bit, but you'll want to click on corresponding Download button for your operating system and with the Python 3.5 option:

We don't have to give them any personal information at this step. There's a pretty standard Windows installer, so just let that download:

After that's downloaded we go ahead and open up the Canopy installer, and run it! You might want to read the license before you agree to it, that's up to you, and then just wait for the installation to complete.

Once you hit the

Finish

button at the end of the install process, allow it to launch Canopy automatically. You'll see that Canopy then sets up the Python environment by itself, which is great, but this will take a minute or two.

Once the installer is done setting up your Python environment, you should get a screen that looks like the one below. It says welcome to Canopy and a bunch of big friendly buttons:

The beautiful thing is that pretty much everything you need for this book comes pre-installed with Enthought Canopy, that's why I recommend using it!

There is just one last thing we need to set up, so go ahead and click the Editor button there on the Canopy Welcome screen. You'll then see the Editor screen come up, and if you click down in the window at the bottom, I want you to just type in:

!pip install pydotplus

Here's how that's going to look on your screen as you type the above line in at the bottom of the Canopy Editor window; don't forget to press the Return button of course:

One you hit the Return button, this will install that one extra module that we need for later on in the book, when we get to talking about decision trees, and rendering decision trees.

Once it has finished installing

pydotplus

, it should come back and say it's successfully installed and, voila, you have everything you need now to get started! The installation is done, at this point - but let's just take a few more steps to confirm our installation is running nicely.

Giving the installation a test run

Let's now give your installation a test run. The first thing to do is actually to entirely close the Canopy window! This is because we're not actually going to be editing and using our code within this Canopy editor. Instead we're going to be using something called an IPython Notebook, which is also now known as the Jupyter Notebook.

Let me show you how that works. If you now open a window in your operating system to view the accompanying book files that you downloaded, as described in the Preface of this book. It should look something like this, with the set of

.ipynb

code files you downloaded for this book:

Now go down to the Outliers file in the list, that's the Outliers.ipynb file, double-click it, and what should happen is it's going to start up Canopy first and then it's going to kick off your web browser! This is because IPython/Jupyter Notebooks actually live within your web browser. There can be a small pause at first, and it can be a little bit confusing first time, but you'll soon get used to the idea.

You should soon see Canopy come up and for me my default web browser Chrome comes up. You should see the following Jupyter Notebook page, since we double-clicked on the Outliers.ipynb file:

If you see this screen, it means that everything's working great in your installation and you're all set for the journey across rest of this book!

If you occasionally get problems opening your IPNYB files

Just occasionally, I've noticed that things can go a little bit wrong when you double-click on a .ipynb file. Don't panic! Just sometimes, Canopy can get a little bit flaky, and you might see a screen that is looking for some password or token, or you might occasionally see a screen that says it can't connect at all. Don't panic if either of those things happen to you, they are just random quirks, sometimes things just don't start up in the right order or they don't start up in time on your PC and it's okay.

All you have to do is go back and try to open that file a second time. Sometimes it takes two or three tries to actually get it loaded up properly, but if you do it a couple of times it should pop up eventually, and a Jupyter Notebook screen like the one we saw previously about Dealing with Outliers, is what you should see.

Python basics - Part 1

If you already know Python, you can probably skip the next two sections. However, if you need a refresher, or if you haven't done Python before, you'll want to go through these. There are a few quirky things about the Python scripting language that you need to know, so let's dive in and just jump into the pool and learn some Python by writing some actual code.

Like I said before, in the requirements for this book, you should have some sort of programming background to be successful in this book. You've coded in some sort of language, even if it's a scripting language, JavaScript, I don't care whether it is C++, Java, or something, but if you're new to Python, I'm going to give you a little bit of a crash course here. I'm just going to dive right in and go right into some examples in this section.

There are a few quirks about Python that are a little bit different than other languages you might have seen; so I just want to walk through what's different about Python from other scripting languages you may have worked with, and the best way to do that is by looking at some real examples. Let's dive right in and look at some Python code:

If you open up the DataScience folder for this class, which you downloaded earlier in the earlier section, you should find a Python101.ipynb file; go ahead and double-click on that. It should open right up in Canopy if you have everything installed properly, and it should look a little bit something like the following screenshot:

New versions of Canopy will open the code in your web browser, not the Canopy editor! This is okay!

One cool thing about Python is that there are several ways to run code with Python. You can run it as a script, like you would with a normal programming language. You can also write in this thing called the IPython Notebook, which is what we're using here. So it's this format where you actually have a web browser-like view where you can actually write little notations and notes to yourself in HTML markup stuff, and you can also embed actual code that really runs using the Python interpreter.

Pre colon

If, for example, you want to take the first three elements of a list, everything before element number 3, we can say :3 to get the first three elements, 1, 2, and 3, and if you think about what's going on there, as far as indices go, like in most languages, we start counting from 0. So element 0 is 1, element 1 is 2, and element 2 is 3. Since we're saying we want everything before element 3, that's what we're getting.

So, you know, never forget that in most languages, you start counting at 0 and not 1.

Now this can confuse matters, but in this case, it does make intuitive sense. You can think of that colon as meaning I want everything, I want the first three elements, and I could change that to four just again to make the point that we're actually doing something real here:

x[:4]

The output of the above code example is as follows:

[1, 2, 3, 4]

Post colon

Now if I put the colon on the other side of the 3, that says I want everything after 3, so 3 and after. If I say x[3:], that's giving me the third element, 0, 1, 2, 3, and everything after it. So that's going to return 4, 5, and 6 in that example, OK?

x[3:]

The output is as follows:

[4, 5, 6]

You might want to keep this IPython/Jupyter Notebook file around. It's a good reference, because sometimes it can get confusing as to whether the slicing operator includes that element or if it's up to or including it or not. So the best way is to just play around with it here and remind yourself.

Negative syntax

One more thing you can do is have this negative syntax:

x[-2:]

The output is as follows:

[5, 6]

By saying x[-2:], this means that I want the last two elements in the list. This means that go backwards two from the end, and that will give me 5 and 6, because those are the last two things on my list.

Adding list to list

You can also change lists around. Let's say I want to add a list to the list. I can use the extend function for that, as shown in the following code block:

x.extend([7,8])x

The output of the above code is as follows:

[1, 2, 3, 4, 5, 6, 7, 8]

I have my list of 1, 2, 3, 4, 5, 6. If I want to extend it, I can say I have a new list here, [7, 8], and that bracket indicates this is a new list of itself. This could be a list implicit, you know, that's inline there, it could be referred to by another variable. You can see that once I do that, the new list I get actually has that list of 7, 8 appended on to the end of it. So I have a new list by extending that list with another list.

The append function

If you want to just add one more thing to that list, you can use the append function. So I just want to stick the number 9 at the end, there we go:

x.append(9)x

The output of the above code is as follows:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Dereferencing a single element

If you want to dereference a single element of the list you can just use the bracket like that:

y[1]

The output of the above code is as follows:

So y[1] will return element 1. Remember that y had 10, 11, 12 in it - observe the previous example, and we start counting from 0, so element 1 will actually be the second element in the list, or the number 11 in this case, alright?

Reverse sort

z.sort(reverse=True)z

The output of the above code is as follows:

[3, 2, 1]

If you need to do a reverse sort, you can just say reverse=True as an attribute, as a parameter in that sort function, and that will put it back to 3, 2, 1.

If you need to let that sink in a little bit, feel free to go back and read it a little bit more.

Iterating through entries

for ship in captains: print (ship + ": " + captains[ship])

The output of the above code is as follows:

Let's look at a little example of iterating through the entries in a dictionary. If I want to iterate through every ship that I have in my dictionary and print out captains, I can type for ship in captains, and this will iterate through every single key in my dictionary. Then I can print out the lookup value of each ship's captain, and that's the output that I get there.

There you have it. This is basically the main data structures that you'll encounter in Python. There are some others, such as sets, but we'll not really use them in this book, so I think that's enough to get you started. Let's dive into some more Python nuances in our next section.

Python basics - Part 2

In addition to Python Basics - Part 1, let us now try to grasp more Python concepts in detail.

Functions in Python

Let's talk about functions in Python. Like with other languages, you can have functions that let you repeat a set of operations over and over again with different parameters. In Python, the syntax for doing that looks like this:

def SquareIt(x): return x * xprint (SquareIt(2))

The output of the above code is as follows:

You declare a function using the def keyword. It just says this is a function, and we'll call this function SquareIt, and the parameter list is then followed inside parentheses. This particular function only takes one parameter that we'll call x. Again, remember that whitespace is important in Python. There's not going to be any curly brackets or anything enclosing this function. It's strictly defined by whitespace. So we have a colon that says that this function declaration line is over, but then it's the fact that it's tabbed by one or more tabs that tells the interpreter that we are in fact within the SquareIt function.

So def SquareIt(x): tab returns x * x, and that will return the square of x in this function. We can go ahead and give that a try. print squareIt(2) is how we call that function. It looks just like it would be in any other language, really. This should return the number 4; we run the code, and in fact it does. Awesome! That's pretty simple, that's all there is to functions. Obviously, I could have more than one parameter if I wanted to, even as many parameters as I need.

Now there are some weird things you can do with functions in Python, that are kind of cool. One thing you can do is to pass functions around as though they were parameters. Let's take a closer look at this example:

#You can pass functions around as parametersdef DoSomething(f, x): return f(x)print (DoSomething(SquareIt, 3))

The output of the preceding code is as follows:

Now I have a function called DoSomething, def DoSomething, and it will take two parameters, one that I'll call f and the other I'll call x, and if I happen, I can actually pass in a function for one of these parameters. So, think about that for a minute. Look at this example with a bit more sense. Here, DoSomething(f,x): will return f of x; it will basically call the f function with x as a parameter, and there's no strong typing in Python, so we have to just kind of make sure that what we are passing in for that first parameter is in fact a function for this to work properly.

For example, we'll say print DoSomething, and for the first parameter, we'll pass in SquareIt, which is actually another function, and the number 3. What this should do is to say do something with the SquareIt function and the 3 parameter, and that will return (SquareIt, 3), and 3 squared last time I checked was 9, and sure enough, that does in fact work.

This might be a little bit of a new concept to you, passing functions around as parameters, so if you need to stop for a minute there, wait and let that sink in, play around with it, please feel free to do so. Again, I encourage you to stop and take this at your own pace.

Lambda functions - functional programming

One more thing that's kind of a Python-ish sort of a thing to do, which you might not see in other languages is the concept of lambda functions, and it's kind of called functional programming. The idea is that you can include a simple function into a function. This makes the most sense with an example:

#Lambda functions let you inline simple functionsprint (DoSomething(lambda x: x * x * x, 3))

The output of the above code is as follows:

We'll print DoSomething, and remember that our first parameter is a function, so instead of passing in a named function, I can declare this function inline using the lambda keyword. Lambda basically means that I'm defining an unnamed function that just exists for now. It's transitory, and it takes a parameter x. In the syntax here, lambda