31,19 €
Join Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them.
Based on Frank’s successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 558
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2017
Production reference: 1300717
ISBN 978-1-78728-074-8
www.packtpub.com
Author
Frank Kane
Proofreader
Safis Editing
Acquisition Editor
Ben Renow-Clarke
Indexer
Tejal Daruwale Soni
Content Development Editor
Khushali Bhangde
Graphics
Jason Monteiro
Technical Editor
Nidhisha Shetty
Production Coordinator
Arvindkumar Gupta
Copy Editor
Tom Jacob
Â
My name is Frank Kane. I spent nine years at amazon.com and imdb.com, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.
For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787280748.
If you'd like to join our team of regular reviewers, you can email us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Getting Started
Installing Enthought Canopy
Giving the installation a test run
If you occasionally get problems opening your IPNYB files
Using and understanding IPython (Jupyter) Notebooks
Python basics - Part 1
Understanding Python code
Importing modules
Data structures
Experimenting with lists
Pre colon
Post colon
Negative syntax
Adding list to list
The append function
Complex data structures
Dereferencing a single element
The sort function
Reverse sort
Tuples
Dereferencing an element
List of tuples
Dictionaries
Iterating through entries
Python basics - Part 2
Functions in Python
Lambda functions - functional programming
Understanding boolean expressions
The if statement
The if-else loop
Looping
The while loop
Exploring activity
Running Python scripts
More options than just the IPython/Jupyter Notebook
Running Python scripts in command prompt
Using the Canopy IDE
Summary
Statistics and Probability Refresher, and Python Practice
Types of data
Numerical data
Discrete data
Continuous data
Categorical data
Ordinal data
Mean, median, and mode
Mean
Median
The factor of outliers
Mode
Using mean, median, and mode in Python
Calculating mean using the NumPy package
Visualizing data using matplotlib
Calculating median using the NumPy package
Analyzing the effect of outliers
Calculating mode using the SciPy package
Some exercises
Standard deviation and variance
Variance
Measuring variance
Standard deviation
Identifying outliers with standard deviation
Population variance versus sample variance
The Mathematical explanation
Analyzing standard deviation and variance on a histogram
Using Python to compute standard deviation and variance
Try it yourself
Probability density function and probability mass function
The probability density function and probability mass functions
Probability density functions
Probability mass functions
Types of data distributions
Uniform distribution
Normal or Gaussian distribution
The exponential probability distribution or Power law
Binomial probability mass function
Poisson probability mass function
Percentiles and moments
Percentiles
Quartiles
Computing percentiles in Python
Moments
Computing moments in Python
Summary
Matplotlib and Advanced Probability Concepts
A crash course in Matplotlib
Generating multiple plots on one graph
Saving graphs as images
Adjusting the axes
Adding a grid
Changing line types and colors
Labeling axes and adding a legend
A fun example
Generating pie charts
Generating bar charts
Generating scatter plots
Generating histograms
Generating box-and-whisker plots
Try it yourself
Covariance and correlation
Defining the concepts
Measuring covariance
Correlation
Computing covariance and correlation in Python
Computing correlation – The hard way
Computing correlation – The NumPy way
Correlation activity
Conditional probability
Conditional probability exercises in Python
Conditional probability assignment
My assignment solution
Bayes' theorem
Summary
Predictive Models
Linear regression
The ordinary least squares technique
The gradient descent technique
The co-efficient of determination or r-squared
Computing r-squared
Interpreting r-squared
Computing linear regression and r-squared using Python
Activity for linear regression
Polynomial regression
Implementing polynomial regression using NumPy
Computing the r-squared error
Activity for polynomial regression
Multivariate regression and predicting car prices
Multivariate regression using Python
Activity for multivariate regression
Multi-level models
Summary
Machine Learning with Python
Machine learning and train/test
Unsupervised learning
Supervised learning
Evaluating supervised learning
K-fold cross validation
Using train/test to prevent overfitting of a polynomial regression
Activity
Bayesian methods - Concepts
Implementing a spam classifier with Naïve Bayes
Activity
K-Means clustering
Limitations to k-means clustering
Clustering people based on income and age
Activity
Measuring entropy
Decision trees - Concepts
Decision tree example
Walking through a decision tree
Random forests technique
Decision trees - Predicting hiring decisions using Python
Ensemble learning – Using a random forest
Activity
Ensemble learning
Support vector machine overview
Using SVM to cluster people by using scikit-learn
Activity
Summary
Recommender Systems
What are recommender systems?
User-based collaborative filtering
Limitations of user-based collaborative filtering
Item-based collaborative filtering
Understanding item-based collaborative filtering
How item-based collaborative filtering works?
Collaborative filtering using Python
Finding movie similarities
Understanding the code
The corrwith function
Improving the results of movie similarities
Making movie recommendations to people
Understanding movie recommendations with an example
Using the groupby command to combine rows
Removing entries with the drop command
Improving the recommendation results
Summary
More Data Mining and Machine Learning Techniques
K-nearest neighbors - concepts
Using KNN to predict a rating for a movie
Activity
Dimensionality reduction and principal component analysis
Dimensionality reduction
Principal component analysis
A PCA example with the Iris dataset
Activity
Data warehousing overview
ETL versus ELT
Reinforcement learning
Q-learning
The exploration problem
The simple approach
The better way
Fancy words
Markov decision process
Dynamic programming
Summary
Dealing with Real-World Data
Bias/variance trade-off
K-fold cross-validation to avoid overfitting
Example of k-fold cross-validation using scikit-learn
Data cleaning and normalisation
Cleaning web log data
Applying a regular expression on the web log
Modification one - filtering the request field
Modification two - filtering post requests
Modification three - checking the user agents
Filtering the activity of spiders/robots
Modification four - applying website-specific filters
Activity for web log data
Normalizing numerical data
Detecting outliers
Dealing with outliers
Activity for outliers
Summary
Apache Spark - Machine Learning on Big Data
Installing Spark
Installing Spark on Windows
Installing Spark on other operating systems
Installing the Java Development Kit
Installing Spark
Spark introduction
It's scalable
It's fast
It's young
It's not difficult
Components of Spark
Python versus Scala for Spark
Spark and Resilient Distributed Datasets (RDD)
The SparkContext object
Creating RDDs
Creating an RDD using a Python list
Loading an RDD from a text file
More ways to create RDDs
RDD operations
Transformations
Using map()
Actions
Introducing MLlib
Some MLlib Capabilities
Special MLlib data types
The vector data type
LabeledPoint data type
Rating data type
Decision Trees in Spark with MLlib
Exploring decision trees code
Creating the SparkContext
Importing and cleaning our data
Creating a test candidate and building our decision tree
Running the script
K-Means Clustering in Spark
Within set sum of squared errors (WSSSE)
Running the code
TF-IDF
TF-IDF in practice
Using TF- IDF
Searching wikipedia with Spark MLlib
Import statements
Creating the initial RDD
Creating and transforming a HashingTF object
Computing the TF-IDF score
Using the Wikipedia search engine algorithm
Running the algorithm
Using the Spark 2.0 DataFrame API for MLlib
How Spark 2.0 MLlib works
Implementing linear regression
Summary
Testing and Experimental Design
A/B testing concepts
A/B tests
Measuring conversion for A/B testing
How to attribute conversions
Variance is your enemy
T-test and p-value
The t-statistic or t-test
The p-value
Measuring t-statistics and p-values using Python
Running A/B test on some experimental data
When there's no real difference between the two groups
Does the sample size make a difference?
Sample size increased to six-digits
Sample size increased seven-digits
A/A testing
Determining how long to run an experiment for
A/B test gotchas
Novelty effects
Seasonal effects
Selection bias
Auditing selection bias issues
Data pollution
Attribution errors
Summary
Being a data scientist in the tech industry is one of the most rewarding careers on the planet today. I went and studied actual job descriptions for data scientist roles at tech companies and I distilled those requirements down into the topics that you'll see in this course.
Hands-On Data Science and Python Machine Learning is really comprehensive. We'll start with a crash course on Python and do a review of some basic statistics and probability, but then we're going to dive right into over 60 topics in data mining and machine learning. That includes things such as Bayes' theorem, clustering, decision trees, regression analysis, experimental design; we'll look at them all. Some of these topics are really fun.
We're going to develop an actual movie recommendation system using actual user movie rating data. We're going to create a search engine that actually works for Wikipedia data. We're going to build a spam classifier that can correctly classify spam and nonspam emails in your email account, and we also have a whole section on scaling this work up to a cluster that runs on big data using Apache Spark.
If you're a software developer or programmer looking to transition into a career in data science, this course will teach you the hottest skills without all the mathematical notation and pretense that comes along with these topics. We're just going to explain these concepts and show you some Python code that actually works that you can dive in and mess around with to make those concepts sink home, and if you're working as a data analyst in the finance industry, this course can also teach you to make the transition into the tech industry. All you need is some prior experience in programming or scripting and you should be good to go.
The general format of this book is I'll start with each concept, explaining it in a bunch of sections and graphical examples. I will introduce you to some of the notations and fancy terminologies that data scientists like to use so you can talk the same language, but the concepts themselves are generally pretty simple. After that, I'll throw you into some actual Python code that actually works that we can run and mess around with, and that will show you how to actually apply these ideas to actual data. These are going to be presented as IPython Notebook files, and that's a format where I can intermix code and notes surrounding the code that explain what's going on in the concepts. You can take these notebook files with you after going through this book and use that as a handy-quick reference later on in your career, and at the end of each concept, I'll encourage you to actually dive into that Python code, make some modifications, mess around with it, and just gain more familiarity by getting hands-on and actually making some modifications, and seeing the effects they have.
If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply email feedback@packtpub.com, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Data-Science-and-Python-Machine-Learning. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/HandsOnDataScienceandPythonMachineLearning_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at copyright@packtpub.com with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.
Since there's going to be code associated with this book and sample data that you need to get as well, let me first show you where to get that and then we'll be good to go. We need to get some setup out of the way first. First things first, let's get the code and the data that you need for this book so you can play along and actually have some code to mess around with. The easiest way to do that is by going right to this - Getting Started.
In this chapter, we will first install and get ready in a working Python environment:
Installing Enthought Canopy
Installing Python libraries
How to work with the IPython/Jupyter Notebook
How to use, read and run the code files for this book
Then we'll dive into a crash course into understanding Python code:
Python basics - part 1
Understanding Python code
Importing modules
Experimenting with lists
Tuples
Python basics - part 2
Running Python scripts
You'll have everything you need for an amazing journey into data science with Python, once we've set up your environment and familiarized you with Python in this chapter.
Let's dive right in and get what you need installed to actually develop Python code with data science on your desktop. I'm going to walk you through installing a package called Enthought Canopy which has both the development environment and all the Python packages you need pre-installed. It makes life really easy, but if you already know Python you might have an existing Python environment already on your PC, and if you want to keep using it, maybe you can.
The most important thing is that your Python environment has Python 3.5 or newer, that it supports Jupyter Notebooks (because that's what we're going to use in this course), and that you have the key packages you need for this book installed on your environment. I'll explain exactly how to achieve a full installation in a few simple steps - it's going to be very easy.
Let's first overview those key packages, most of which Canopy will be installing for us automatically for us. Canopy will install Python 3.5 for us, and some further packages we need including: scikit_learn, xlrd, and statsmodels. We'll need to manually use the pip command, to install a package called pydot2plus. And that will be it - it's very easy with Canopy!
Once the following installation steps are complete, we'll have everything we need to actually get up and running, and so we'll open up a little sample file and do some data science for real. Now let's get you set up with everything you need to get started as quickly as possible:
The first thing you will need is a development environment, called an IDE, for Python code. What we're going to use for this book is Enthought Canopy. It's a scientific computing environment, and it's going to work well with this book:
To get Canopy installed, just go to
www.enthought.com
and click on
DOWNLOADS: Canopy
:
Enthought Canopy is free, for the Canopy Express edition - which is what you want for this book. You must then select your operating system and architecture. For me, that's Windows 64-bit, but you'll want to click on corresponding Download button for your operating system and with the Python 3.5 option:
We don't have to give them any personal information at this step. There's a pretty standard Windows installer, so just let that download:
After that's downloaded we go ahead and open up the Canopy installer, and run it! You might want to read the license before you agree to it, that's up to you, and then just wait for the installation to complete.
Once you hit the
Finish
button at the end of the install process, allow it to launch Canopy automatically. You'll see that Canopy then sets up the Python environment by itself, which is great, but this will take a minute or two.
Once the installer is done setting up your Python environment, you should get a screen that looks like the one below. It says welcome to Canopy and a bunch of big friendly buttons:
The beautiful thing is that pretty much everything you need for this book comes pre-installed with Enthought Canopy, that's why I recommend using it!
There is just one last thing we need to set up, so go ahead and click the Editor button there on the Canopy Welcome screen. You'll then see the Editor screen come up, and if you click down in the window at the bottom, I want you to just type in:
!pip install pydotplus
Here's how that's going to look on your screen as you type the above line in at the bottom of the Canopy Editor window; don't forget to press the Return button of course:
One you hit the Return button, this will install that one extra module that we need for later on in the book, when we get to talking about decision trees, and rendering decision trees.
Once it has finished installing
pydotplus
, it should come back and say it's successfully installed and, voila, you have everything you need now to get started! The installation is done, at this point - but let's just take a few more steps to confirm our installation is running nicely.
Let's now give your installation a test run. The first thing to do is actually to entirely close the Canopy window! This is because we're not actually going to be editing and using our code within this Canopy editor. Instead we're going to be using something called an IPython Notebook, which is also now known as the Jupyter Notebook.
Let me show you how that works. If you now open a window in your operating system to view the accompanying book files that you downloaded, as described in the Preface of this book. It should look something like this, with the set of
.ipynb
code files you downloaded for this book:
Now go down to the Outliers file in the list, that's the Outliers.ipynb file, double-click it, and what should happen is it's going to start up Canopy first and then it's going to kick off your web browser! This is because IPython/Jupyter Notebooks actually live within your web browser. There can be a small pause at first, and it can be a little bit confusing first time, but you'll soon get used to the idea.
You should soon see Canopy come up and for me my default web browser Chrome comes up. You should see the following Jupyter Notebook page, since we double-clicked on the Outliers.ipynb file:
If you see this screen, it means that everything's working great in your installation and you're all set for the journey across rest of this book!
Just occasionally, I've noticed that things can go a little bit wrong when you double-click on a .ipynb file. Don't panic! Just sometimes, Canopy can get a little bit flaky, and you might see a screen that is looking for some password or token, or you might occasionally see a screen that says it can't connect at all. Don't panic if either of those things happen to you, they are just random quirks, sometimes things just don't start up in the right order or they don't start up in time on your PC and it's okay.
All you have to do is go back and try to open that file a second time. Sometimes it takes two or three tries to actually get it loaded up properly, but if you do it a couple of times it should pop up eventually, and a Jupyter Notebook screen like the one we saw previously about Dealing with Outliers, is what you should see.
If you already know Python, you can probably skip the next two sections. However, if you need a refresher, or if you haven't done Python before, you'll want to go through these. There are a few quirky things about the Python scripting language that you need to know, so let's dive in and just jump into the pool and learn some Python by writing some actual code.
Like I said before, in the requirements for this book, you should have some sort of programming background to be successful in this book. You've coded in some sort of language, even if it's a scripting language, JavaScript, I don't care whether it is C++, Java, or something, but if you're new to Python, I'm going to give you a little bit of a crash course here. I'm just going to dive right in and go right into some examples in this section.
There are a few quirks about Python that are a little bit different than other languages you might have seen; so I just want to walk through what's different about Python from other scripting languages you may have worked with, and the best way to do that is by looking at some real examples. Let's dive right in and look at some Python code:
If you open up the DataScience folder for this class, which you downloaded earlier in the earlier section, you should find a Python101.ipynb file; go ahead and double-click on that. It should open right up in Canopy if you have everything installed properly, and it should look a little bit something like the following screenshot:
One cool thing about Python is that there are several ways to run code with Python. You can run it as a script, like you would with a normal programming language. You can also write in this thing called the IPython Notebook, which is what we're using here. So it's this format where you actually have a web browser-like view where you can actually write little notations and notes to yourself in HTML markup stuff, and you can also embed actual code that really runs using the Python interpreter.
If, for example, you want to take the first three elements of a list, everything before element number 3, we can say :3 to get the first three elements, 1, 2, and 3, and if you think about what's going on there, as far as indices go, like in most languages, we start counting from 0. So element 0 is 1, element 1 is 2, and element 2 is 3. Since we're saying we want everything before element 3, that's what we're getting.
Now this can confuse matters, but in this case, it does make intuitive sense. You can think of that colon as meaning I want everything, I want the first three elements, and I could change that to four just again to make the point that we're actually doing something real here:
x[:4]
The output of the above code example is as follows:
[1, 2, 3, 4]
Now if I put the colon on the other side of the 3, that says I want everything after 3, so 3 and after. If I say x[3:], that's giving me the third element, 0, 1, 2, 3, and everything after it. So that's going to return 4, 5, and 6 in that example, OK?
x[3:]
The output is as follows:
[4, 5, 6]
You might want to keep this IPython/Jupyter Notebook file around. It's a good reference, because sometimes it can get confusing as to whether the slicing operator includes that element or if it's up to or including it or not. So the best way is to just play around with it here and remind yourself.
One more thing you can do is have this negative syntax:
x[-2:]
The output is as follows:
[5, 6]
By saying x[-2:], this means that I want the last two elements in the list. This means that go backwards two from the end, and that will give me 5 and 6, because those are the last two things on my list.
You can also change lists around. Let's say I want to add a list to the list. I can use the extend function for that, as shown in the following code block:
x.extend([7,8])x
The output of the above code is as follows:
[1, 2, 3, 4, 5, 6, 7, 8]
I have my list of 1, 2, 3, 4, 5, 6. If I want to extend it, I can say I have a new list here, [7, 8], and that bracket indicates this is a new list of itself. This could be a list implicit, you know, that's inline there, it could be referred to by another variable. You can see that once I do that, the new list I get actually has that list of 7, 8 appended on to the end of it. So I have a new list by extending that list with another list.
If you want to just add one more thing to that list, you can use the append function. So I just want to stick the number 9 at the end, there we go:
x.append(9)x
The output of the above code is as follows:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
If you want to dereference a single element of the list you can just use the bracket like that:
y[1]
The output of the above code is as follows:
11
So y[1] will return element 1. Remember that y had 10, 11, 12 in it - observe the previous example, and we start counting from 0, so element 1 will actually be the second element in the list, or the number 11 in this case, alright?
z.sort(reverse=True)z
The output of the above code is as follows:
[3, 2, 1]
If you need to do a reverse sort, you can just say reverse=True as an attribute, as a parameter in that sort function, and that will put it back to 3, 2, 1.
If you need to let that sink in a little bit, feel free to go back and read it a little bit more.
for ship in captains: print (ship + ": " + captains[ship])
The output of the above code is as follows:
Let's look at a little example of iterating through the entries in a dictionary. If I want to iterate through every ship that I have in my dictionary and print out captains, I can type for ship in captains, and this will iterate through every single key in my dictionary. Then I can print out the lookup value of each ship's captain, and that's the output that I get there.
There you have it. This is basically the main data structures that you'll encounter in Python. There are some others, such as sets, but we'll not really use them in this book, so I think that's enough to get you started. Let's dive into some more Python nuances in our next section.
In addition to Python Basics - Part 1, let us now try to grasp more Python concepts in detail.
Let's talk about functions in Python. Like with other languages, you can have functions that let you repeat a set of operations over and over again with different parameters. In Python, the syntax for doing that looks like this:
def SquareIt(x): return x * xprint (SquareIt(2))
The output of the above code is as follows:
4
You declare a function using the def keyword. It just says this is a function, and we'll call this function SquareIt, and the parameter list is then followed inside parentheses. This particular function only takes one parameter that we'll call x. Again, remember that whitespace is important in Python. There's not going to be any curly brackets or anything enclosing this function. It's strictly defined by whitespace. So we have a colon that says that this function declaration line is over, but then it's the fact that it's tabbed by one or more tabs that tells the interpreter that we are in fact within the SquareIt function.
So def SquareIt(x): tab returns x * x, and that will return the square of x in this function. We can go ahead and give that a try. print squareIt(2) is how we call that function. It looks just like it would be in any other language, really. This should return the number 4; we run the code, and in fact it does. Awesome! That's pretty simple, that's all there is to functions. Obviously, I could have more than one parameter if I wanted to, even as many parameters as I need.
Now there are some weird things you can do with functions in Python, that are kind of cool. One thing you can do is to pass functions around as though they were parameters. Let's take a closer look at this example:
#You can pass functions around as parametersdef DoSomething(f, x): return f(x)print (DoSomething(SquareIt, 3))
The output of the preceding code is as follows:
9
Now I have a function called DoSomething, def DoSomething, and it will take two parameters, one that I'll call f and the other I'll call x, and if I happen, I can actually pass in a function for one of these parameters. So, think about that for a minute. Look at this example with a bit more sense. Here, DoSomething(f,x): will return f of x; it will basically call the f function with x as a parameter, and there's no strong typing in Python, so we have to just kind of make sure that what we are passing in for that first parameter is in fact a function for this to work properly.
For example, we'll say print DoSomething, and for the first parameter, we'll pass in SquareIt, which is actually another function, and the number 3. What this should do is to say do something with the SquareIt function and the 3 parameter, and that will return (SquareIt, 3), and 3 squared last time I checked was 9, and sure enough, that does in fact work.
This might be a little bit of a new concept to you, passing functions around as parameters, so if you need to stop for a minute there, wait and let that sink in, play around with it, please feel free to do so. Again, I encourage you to stop and take this at your own pace.
One more thing that's kind of a Python-ish sort of a thing to do, which you might not see in other languages is the concept of lambda functions, and it's kind of called functional programming. The idea is that you can include a simple function into a function. This makes the most sense with an example:
#Lambda functions let you inline simple functionsprint (DoSomething(lambda x: x * x * x, 3))
The output of the above code is as follows:
27
We'll print DoSomething, and remember that our first parameter is a function, so instead of passing in a named function, I can declare this function inline using the lambda keyword. Lambda basically means that I'm defining an unnamed function that just exists for now. It's transitory, and it takes a parameter x. In the syntax here, lambda
Tausende von E-Books und Hörbücher
Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.
Sie haben über uns geschrieben: