Python: Real-World Data Science - Dusty Phillips - E-Book

Python: Real-World Data Science E-Book

Dusty Phillips

0,0
62,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Unleash the power of Python and its robust data science capabilities

About This Book

  • Unleash the power of Python 3 objects
  • Learn to use powerful Python libraries for effective data processing and analysis
  • Harness the power of Python to analyze data and create insightful predictive models
  • Unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics

Who This Book Is For

Entry-level analysts who want to enter in the data science world will find this course very useful to get themselves acquainted with Python's data science capabilities for doing real-world data analysis.

What You Will Learn

  • Install and setup Python
  • Implement objects in Python by creating classes and defining methods
  • Get acquainted with NumPy to use it with arrays and array-oriented computing in data analysis
  • Create effective visualizations for presenting your data using Matplotlib
  • Process and analyze data using the time series capabilities of pandas
  • Interact with different kind of database systems, such as file, disk format, Mongo, and Redis
  • Apply data mining concepts to real-world problems
  • Compute on big data, including real-time data from the Internet
  • Explore how to use different machine learning models to ask different questions of your data

In Detail

The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you'll have gained key skills and be ready for the material in the next module.

The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it's time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls.

Style and approach

This course includes all the resources that will help you jump into the data science field with Python and learn how to make sense of data. The aim is to create a smooth learning path that will teach you how to get started with powerful Python libraries and perform various data science techniques in depth.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1516

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Python: Real-World Data Science
Meet Your Course Guide
What's so cool about Data Science?
Course Structure
Course Journey
The Course Roadmap and Timeline
1. Course Module 1: Python Fundamentals
1. Introduction and First Steps – Take a Deep Breath
A proper introduction
Enter the Python
About Python
Portability
Coherence
Developer productivity
An extensive library
Software quality
Software integration
Satisfaction and enjoyment
What are the drawbacks?
Who is using Python today?
Setting up the environment
Python 2 versus Python 3 – the great debate
What you need for this course
Installing Python
Installing IPython
Installing additional packages
How you can run a Python program
Running Python scripts
Running the Python interactive shell
Running Python as a service
Running Python as a GUI application
How is Python code organized
How do we use modules and packages
Python's execution model
Names and namespaces
Scopes
Guidelines on how to write good code
The Python culture
A note on the IDEs
2. Object-oriented Design
Introducing object-oriented
Objects and classes
Specifying attributes and behaviors
Data describes objects
Behaviors are actions
Hiding details and creating the public interface
Composition
Inheritance
Inheritance provides abstraction
Multiple inheritance
Case study
3. Objects in Python
Creating Python classes
Adding attributes
Making it do something
Talking to yourself
More arguments
Initializing the object
Explaining yourself
Modules and packages
Organizing the modules
Absolute imports
Relative imports
Organizing module contents
Who can access my data?
Third-party libraries
Case study
4. When Objects Are Alike
Basic inheritance
Extending built-ins
Overriding and super
Multiple inheritance
The diamond problem
Different sets of arguments
Polymorphism
Abstract base classes
Using an abstract base class
Creating an abstract base class
Demystifying the magic
Case study
5. Expecting the Unexpected
Raising exceptions
Raising an exception
The effects of an exception
Handling exceptions
The exception hierarchy
Defining our own exceptions
Case study
6. When to Use Object-oriented Programming
Treat objects as objects
Adding behavior to class data with properties
Properties in detail
Decorators – another way to create properties
Deciding when to use properties
Manager objects
Removing duplicate code
In practice
Case study
7. Python Data Structures
Empty objects
Tuples and named tuples
Named tuples
Dictionaries
Dictionary use cases
Using defaultdict
Counter
Lists
Sorting lists
Sets
Extending built-ins
Queues
FIFO queues
LIFO queues
Priority queues
Case study
8. Python Object-oriented Shortcuts
Python built-in functions
The len() function
Reversed
Enumerate
File I/O
Placing it in context
An alternative to method overloading
Default arguments
Variable argument lists
Unpacking arguments
Functions are objects too
Using functions as attributes
Callable objects
Case study
9. Strings and Serialization
Strings
String manipulation
String formatting
Escaping braces
Keyword arguments
Container lookups
Object lookups
Making it look right
Strings are Unicode
Converting bytes to text
Converting text to bytes
Mutable byte strings
Regular expressions
Matching patterns
Matching a selection of characters
Escaping characters
Matching multiple characters
Grouping patterns together
Getting information from regular expressions
Making repeated regular expressions efficient
Serializing objects
Customizing pickles
Serializing web objects
Case study
10. The Iterator Pattern
Design patterns in brief
Iterators
The iterator protocol
Comprehensions
List comprehensions
Set and dictionary comprehensions
Generator expressions
Generators
Yield items from another iterable
Coroutines
Back to log parsing
Closing coroutines and throwing exceptions
The relationship between coroutines, generators, and functions
Case study
11. Python Design Patterns I
The decorator pattern
A decorator example
Decorators in Python
The observer pattern
An observer example
The strategy pattern
A strategy example
Strategy in Python
The state pattern
A state example
State versus strategy
State transition as coroutines
The singleton pattern
Singleton implementation
The template pattern
A template example
12. Python Design Patterns II
The adapter pattern
The facade pattern
The flyweight pattern
The command pattern
The abstract factory pattern
The composite pattern
13. Testing Object-oriented Programs
Why test?
Test-driven development
Unit testing
Assertion methods
Reducing boilerplate and cleaning up
Organizing and running tests
Ignoring broken tests
Testing with py.test
One way to do setup and cleanup
A completely different way to set up variables
Skipping tests with py.test
Imitating expensive objects
How much testing is enough?
Case study
Implementing it
14. Concurrency
Threads
The many problems with threads
Shared memory
The global interpreter lock
Thread overhead
Multiprocessing
Multiprocessing pools
Queues
The problems with multiprocessing
Futures
AsyncIO
AsyncIO in action
Reading an AsyncIO future
AsyncIO for networking
Using executors to wrap blocking code
Streams
Executors
Case study
2. Course Module 2: Data Analysis
1. Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
NumPy
pandas
Matplotlib
PyMongo
The scikit-learn library
2. NumPy Arrays and Vectorized Computation
NumPy arrays
Data types
Array creation
Indexing and slicing
Fancy indexing
Numerical operations on arrays
Array functions
Data processing using arrays
Loading and saving data
Saving an array
Loading an array
Linear algebra with NumPy
NumPy random numbers
3. Data Analysis with pandas
An overview of the pandas package
The pandas data structure
Series
The DataFrame
The essential basic functionality
Reindexing and altering labels
Head and tail
Binary operations
Functional statistics
Function application
Sorting
Indexing and selecting data
Computational tools
Working with missing data
Advanced uses of pandas for data analysis
Hierarchical indexing
The Panel data
4. Data Visualization
The matplotlib API primer
Line properties
Figures and subplots
Exploring plot types
Scatter plots
Bar plots
Contour plots
Histogram plots
Legends and annotations
Plotting functions with pandas
Additional Python data visualization tools
Bokeh
MayaVi
5. Time Series
Time series primer
Working with date and time objects
Resampling time series
Downsampling time series data
Upsampling time series data
Timedeltas
Time series plotting
6. Interacting with Databases
Interacting with data in text format
Reading data from text format
Writing data to text format
Interacting with data in binary format
HDF5
Interacting with data in MongoDB
Interacting with data in Redis
The simple value
List
Set
Ordered set
7. Data Analysis Application Examples
Data munging
Cleaning data
Filtering
Merging data
Reshaping data
Data aggregation
Grouping data
3. Course Module 3: Data Mining
1. Getting Started with Data Mining
Introducing data mining
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Loading the dataset with NumPy
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
Testing the algorithm
2. Classifying with scikit-learn Estimators
scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing using pipelines
An example
Standard preprocessing
Putting it all together
Pipelines
3. Predicting Sports Winners with Decision Trees
Loading the dataset
Collecting the data
Using pandas to load the dataset
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Sports outcome prediction
Putting it all together
Random forests
How do ensembles work?
Parameters in Random forests
Applying Random forests
Engineering new features
4. Recommending Movies Using Affinity Analysis
Affinity analysis
Algorithms for affinity analysis
Choosing parameters
The movie recommendation problem
Obtaining the dataset
Loading with pandas
Sparse data formats
The Apriori implementation
The Apriori algorithm
Implementation
Extracting association rules
Evaluation
5. Extracting Features with Transformers
Feature extraction
Representing reality in models
Common feature patterns
Creating good features
Feature selection
Selecting the best individual features
Feature creation
Creating your own transformer
The transformer API
Implementation details
Unit testing
Putting it all together
6. Social Media Insight Using Naive Bayes
Disambiguation
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Text transformers
Bag-of-words
N-grams
Other features
Naive Bayes
Bayes' theorem
Naive Bayes algorithm
How it works
Application
Extracting word counts
Converting dictionaries to a matrix
Training the Naive Bayes classifier
Putting it all together
Evaluation using the F1-score
Getting useful features from models
7. Discovering Accounts to Follow Using Graph Mining
Loading the dataset
Classifying with an existing model
Getting follower information from Twitter
Building the network
Creating a graph
Creating a similarity graph
Finding subgraphs
Connected components
Optimizing criteria
8. Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Creating the dataset
Drawing basic CAPTCHAs
Splitting the image into individual letters
Creating a training dataset
Adjusting our training dataset to our methodology
Training and classifying
Back propagation
Predicting words
Improving accuracy using a dictionary
Ranking mechanisms for words
Putting it all together
9. Authorship Attribution
Attributing documents to authors
Applications and use cases
Attributing authorship
Getting the data
Function words
Counting function words
Classifying with function words
Support vector machines
Classifying with SVMs
Kernels
Character n-grams
Extracting character n-grams
Using the Enron dataset
Accessing the Enron dataset
Creating a dataset loader
Putting it all together
Evaluation
10. Clustering News Articles
Obtaining news articles
Using a Web API to get data
Reddit as a data source
Getting the data
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Putting it all together
Grouping news articles
The k-means algorithm
Evaluating the results
Extracting topic information from clusters
Using clustering algorithms as transformers
Clustering ensembles
Evidence accumulation
How it works
Implementation
Online learning
An introduction to online learning
Implementation
11. Classifying Objects in Images Using Deep Learning
Object classification
Application scenario and goals
Use cases
Deep neural networks
Intuition
Implementation
An introduction to Theano
An introduction to Lasagne
Implementing neural networks with nolearn
GPU optimization
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Application
Getting the data
Creating the neural network
Putting it all together
12. Working with Big Data
Big data
Application scenario and goals
MapReduce
Intuition
A word count example
Hadoop MapReduce
Application
Getting the data
Naive Bayes prediction
The mrjob package
Extracting the blog posts
Training Naive Bayes
Putting it all together
Training on Amazon's EMR infrastructure
13. Next Steps…
Chapter 1 – Getting Started with Data Mining
Scikit-learn tutorials
Extending the IPython Notebook
Chapter 2 – Classifying with scikit-learn Estimators
More complex pipelines
Comparing classifiers
Chapter 3: Predicting Sports Winners with Decision Trees
More on pandas
Chapter 4 – Recommending Movies Using Affinity Analysis
The Eclat algorithm
Chapter 5 – Extracting Features with Transformers
Vowpal Wabbit
Chapter 6 – Social Media Insight Using Naive Bayes
Natural language processing and part-of-speech tagging
Chapter 7 – Discovering Accounts to Follow Using Graph Mining
More complex algorithms
Chapter 8 – Beating CAPTCHAs with Neural Networks
Deeper networks
Reinforcement learning
Chapter 9 – Authorship Attribution
Local n-grams
Chapter 10 – Clustering News Articles
Real-time clusterings
Chapter 11 – Classifying Objects in Images Using Deep Learning
Keras and Pylearn2
Mahotas
Chapter 12 – Working with Big Data
Courses on Hadoop
Pydoop
Recommendation engine
More resources
4. Course Module 4: Machine Learning
1. Giving Computers the Ability to Learn from Data
How to transform data into knowledge
The three different types of machine learning
Making predictions about the future with supervised learning
Classification for predicting class labels
Regression for predicting continuous outcomes
Solving interactive problems with reinforcement learning
Discovering hidden structures with unsupervised learning
Finding subgroups with clustering
Dimensionality reduction for data compression
An introduction to the basic terminology and notations
A roadmap for building machine learning systems
Preprocessing – getting data into shape
Training and selecting a predictive model
Evaluating models and predicting unseen data instances
Using Python for machine learning
2. Training Machine Learning Algorithms for Classification
Artificial neurons – a brief glimpse into the early history of machine learning
Implementing a perceptron learning algorithm in Python
Training a perceptron model on the Iris dataset
Adaptive linear neurons and the convergence of learning
Minimizing cost functions with gradient descent
Implementing an Adaptive Linear Neuron in Python
Large scale machine learning and stochastic gradient descent
3. A Tour of Machine Learning Classifiers Using scikit-learn
Choosing a classification algorithm
First steps with scikit-learn
Training a perceptron via scikit-learn
Modeling class probabilities via logistic regression
Logistic regression intuition and conditional probabilities
Learning the weights of the logistic cost function
Training a logistic regression model with scikit-learn
Tackling overfitting via regularization
Maximum margin classification with support vector machines
Maximum margin intuition
Dealing with the nonlinearly separable case using slack variables
Alternative implementations in scikit-learn
Solving nonlinear problems using a kernel SVM
Using the kernel trick to find separating hyperplanes in higher dimensional space
Decision tree learning
Maximizing information gain – getting the most bang for the buck
Building a decision tree
Combining weak to strong learners via random forests
K-nearest neighbors – a lazy learning algorithm
4. Building Good Training Sets – Data Preprocessing
Dealing with missing data
Eliminating samples or features with missing values
Imputing missing values
Understanding the scikit-learn estimator API
Handling categorical data
Mapping ordinal features
Encoding class labels
Performing one-hot encoding on nominal features
Partitioning a dataset in training and test sets
Bringing features onto the same scale
Selecting meaningful features
Sparse solutions with L1 regularization
Sequential feature selection algorithms
Assessing feature importance with random forests
5. Compressing Data via Dimensionality Reduction
Unsupervised dimensionality reduction via principal component analysis
Total and explained variance
Feature transformation
Principal component analysis in scikit-learn
Supervised data compression via linear discriminant analysis
Computing the scatter matrices
Selecting linear discriminants for the new feature subspace
Projecting samples onto the new feature space
LDA via scikit-learn
Using kernel principal component analysis for nonlinear mappings
Kernel functions and the kernel trick
Implementing a kernel principal component analysis in Python
Example 1 – separating half-moon shapes
Example 2 – separating concentric circles
Projecting new data points
Kernel principal component analysis in scikit-learn
6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning
Streamlining workflows with pipelines
Loading the Breast Cancer Wisconsin dataset
Combining transformers and estimators in a pipeline
Using k-fold cross-validation to assess model performance
The holdout method
K-fold cross-validation
Debugging algorithms with learning and validation curves
Diagnosing bias and variance problems with learning curves
Addressing overfitting and underfitting with validation curves
Fine-tuning machine learning models via grid search
Tuning hyperparameters via grid search
Algorithm selection with nested cross-validation
Looking at different performance evaluation metrics
Reading a confusion matrix
Optimizing the precision and recall of a classification model
Plotting a receiver operating characteristic
The scoring metrics for multiclass classification
7. Combining Different Models for Ensemble Learning
Learning with ensembles
Implementing a simple majority vote classifier
Combining different algorithms for classification with majority vote
Evaluating and tuning the ensemble classifier
Bagging – building an ensemble of classifiers from bootstrap samples
Leveraging weak learners via adaptive boosting
8. Predicting Continuous Target Variables with Regression Analysis
Introducing a simple linear regression model
Exploring the Housing Dataset
Visualizing the important characteristics of a dataset
Implementing an ordinary least squares linear regression model
Solving regression for regression parameters with gradient descent
Estimating the coefficient of a regression model via scikit-learn
Fitting a robust regression model using RANSAC
Evaluating the performance of linear regression models
Using regularized methods for regression
Turning a linear regression model into a curve – polynomial regression
Modeling nonlinear relationships in the Housing Dataset
Dealing with nonlinear relationships using random forests
Decision tree regression
Random forest regression
A. Reflect and Test Yourself! Answers
Module 2: Data Analysis
Chapter 1: Introducing Data Analysis and Libraries
Chapter 2: Object-oriented Design
Chapter 3: Data Analysis with pandas
Chapter 4: Data Visualization
Chapter 5: Time Series
Chapter 6: Interacting with Databases
Chapter 7: Data Analysis Application Examples
Module 3: Data Mining
Chapter 1: Getting Started with Data Mining
Chapter 2: Classifying with scikit-learn Estimators
Chapter 3: Predicting Sports Winners with Decision Trees
Chapter 4: Recommending Movies Using Affinity Analysis
Chapter 5: Extracting Features with Transformers
Chapter 6: Social Media Insight Using Naive Bayes
Chapter 7: Discovering Accounts to Follow Using Graph Mining
Chapter 8: Beating CAPTCHAs with Neural Networks
Chapter 9: Authorship Attribution
Chapter 10: Clustering News Articles
Chapter 11: Classifying Objects in Images Using Deep Learning
Chapter 12: Working with Big Data
Module 4: Machine Learning
Chapter 1: Giving Computers the Ability to Learn from Data
Chapter 2: Training Machine Learning
Chapter 3: A Tour of Machine Learning Classifiers Using scikit-learn
Chapter 4: Building Good Training Sets – Data Preprocessing
Chapter 5: Compressing Data via Dimensionality Reduction
Chapter 6: Learning Best Practices for Model Evaluation and Hyperparameter Tuning
Chapter 7: Combining Different Models for Ensemble Learning
Chapter 8: Predicting Continuous Target Variables with Regression Analysis
B. Bibliography
Index

Python: Real-World Data Science

Python: Real-World Data Science

A course in four modules

Unleash the power of Python and its robust data science capabilities with your Course Guide Ankita Thakur

Learn to use powerful Python libraries for effective data processing and analysis

To contact your Course Guide

Email: <[email protected]>

Meet Your Course Guide

Hello and welcome to this Data Science with Python course. You now have a clear pathway from learning Python core features right through to getting acquainted with the concepts and techniques of the data science field—all using Python!

This course has been planned and created for you by me Ankita Thakur – I am your Course Guide, and I am here to help you have a great journey along the pathways of learning that I have planned for you.

I've developed and created this course for you and you'll be seeing me through the whole journey, offering you my thoughts and ideas behind what you're going to learn next and why I recommend each step. I'll provide tests and quizzes to help you reflect on your learning, and code challenges that will be pitched just right for you through the course.

If you have any questions along the way, you can reach out to me over e-mail or telephone and I'll make sure you get everything from the course that we've planned – for you to start your career in the field of data science. Details of how to contact me are included on the first page of this course.

What's so cool about Data Science?

What is Data Science and why is there so much of buzz about this in the world? Is it of great importance? Well, the following sentence will answer all such questions:

 

"This hot new field promises to revolutionize industries from business to government, health care to academia."

  --The New York Times

The world is generating data at an increasing pace. Consumers, sensors, or scientific experiments emit data points every day. In finance, business, administration, and the natural or social sciences, working with data can make up a significant part of the job. Being able to efficiently work with small or large datasets has become a valuable skill. Also, we live in a world of connected things where tons of data is generated and it is humanly impossible to analyze all the incoming data and make decisions. Human decisions are increasingly replaced by decisions made by computers. Thanks to the field of Data Science!

Data science has penetrated deeply in our connected world and there is a growing demand in the market for people who not only understand data science algorithms thoroughly, but are also capable of programming these algorithms. A field that is at the intersection of many fields, including data mining, machine learning, and statistics, to name a few. This puts an immense burden on all levels of data scientists; from the one who is aspiring to become a data scientist and those who are currently practitioners in this field.

Treating these algorithms as a black box and using them in decision-making systems will lead to counterproductive results. With tons of algorithms and innumerable problems out there, it requires a good grasp of the underlying algorithms in order to choose the best one for any given problem.

Python as a programming language has evolved over the years and today, it is the number one choice for a data scientist. Python has become the most popular programming language for data science because it allows us to forget about the tedious parts of programming and offers us an environment where we can quickly jot down our ideas and put concepts directly into action. It has been used in industry for a long time, but it has been popular among researchers as well.

In contrast to more specialized applications and environments, Python is not only about data analysis. The list of industrial-strength libraries for many general computing tasks is long, which makes working with data in Python even more compelling. Whether your data lives inside SQL or NoSQL databases or is out there on the Web and must be crawled or scraped first, the Python community has already developed packages for many of those tasks.

Course Structure

Frankly speaking, it's a wise decision to know the nitty-gritty of Python as it's a trending language. I'm sure you'll gain lot of knowledge through this course and be able to implement all those in practice. However, I want to highlight that the road ahead may be bumpy on occasions, and some topics may be more challenging than others, but I hope that you will embrace this opportunity and focus on the reward. Remember that we are on this journey together, and throughout this course, we will add many powerful techniques to your arsenal that will help us solve even the toughest problems the data-driven way.

I've created this learning path for you that consist of four models. Each of these modules are a mini-course in their own way, and as you complete each one, you'll have gained key skills and be ready for the material in the next module.

So let's now look at the pathway these modules create—basically all the topics that will be exploring in this learning journey.

Course Journey

We start the course with our very first module, Python Fundamentals, to help you get familiar with Python. Installing Python correctly is equal to half job done. This module starts with the installation of Python, IPython, and all the necessary packages. Then, we'll see the fundamentals of object-oriented programming because Python itself is an object-oriented programming language. Finally, we'll make friends with some of the core concepts of Python—how to get Python programming basics nailed down.

Then we'll move towards the analysis part. The second module, Data Analysis, will get you started with Python data analysis in a practical and example-driven way. You'll see how we can use Python libraries for effective data processing and analysis. So, if you want to to get started with basic data processing tasks or time series, then you can find lot of hands-on knowledge in the examples of this module.

The third module, Data Mining, is designed in a way that you have a good understanding of the basics, some best practices to jump into solving problems with data mining, and some pointers on the next steps you can take. Now, you can harness the power of Python to analyze data and create insightful predictive models.

Finally, we'll move towards exploring more advanced topics. Sometimes an analysis task is too complex to program by hand. Machine learning is a modern technique that enables computers to discover patterns and draw conclusions for themselves. The aim of our fourth module, Machine Learning, is to provide you with a module where we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls. So, if you want to become a machine-learning practitioner, a better problem solver, or maybe even consider a career in machine learning research, I'm sure there is lot for you in this module!

The Course Roadmap and Timeline

Here's a view of the entire course plan before we begin. This grid gives you a topic overview of the whole course and its modules, so you can see how we will move through particular phases of learning to use Python, what skills you'll be learning along the way, and what you can do with those skills at each point. I also offer you an estimate of the time you might want to take for each module, although a lot depends on your learning style how much you're able to give the course each week!

Part 1. Course Module 1: Python Fundamentals

Chapter 1. Introduction and First Steps – Take a Deep Breath

 

"Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime."

  --Chinese proverb

According to Wikipedia, computer programming is:

"...a process that leads from an original formulation of a computing problem to executable computer programs. Programming involves activities such as analysis, developing understanding, generating algorithms, verification of requirements of algorithms including their correctness and resources consumption, and implementation (commonly referred to as coding) of algorithms in a target programming language".

In a nutshell, coding is telling a computer to do something using a language it understands.

Computers are very powerful tools, but unfortunately, they can't think for themselves. So they need to be told everything. They need to be told how to perform a task, how to evaluate a condition to decide which path to follow, how to handle data that comes from a device such as the network or a disk, and how to react when something unforeseen happens, say, something is broken or missing.

You can code in many different styles and languages. Is it hard? I would say "yes" and "no". It's a bit like writing. Everybody can learn how to write, and you can too. But what if you wanted to become a poet? Then writing alone is not enough. You have to acquire a whole other set of skills and this will take a longer and greater effort.

In the end, it all comes down to how far you want to go down the road. Coding is not just putting together some instructions that work. It is so much more!

Good code is short, fast, elegant, easy to read and understand, simple, easy to modify and extend, easy to scale and refactor, and easy to test. It takes time to be able to write code that has all these qualities at the same time, but the good news is that you're taking the first step towards it at this very moment by reading this module. And I have no doubt you can do it. Anyone can, in fact, we all program all the time, only we aren't aware of it.

Would you like an example?

Say you want to make instant coffee. You have to get a mug, the instant coffee jar, a teaspoon, water, and the kettle. Even if you're not aware of it, you're evaluating a lot of data. You're making sure that there is water in the kettle as well as the kettle is plugged-in, that the mug is clean, and that there is enough coffee in the jar. Then, you boil the water and maybe in the meantime you put some coffee in the mug. When the water is ready, you pour it into the cup, and stir.

So, how is this programming?

Well, we gathered resources (the kettle, coffee, water, teaspoon, and mug) and we verified some conditions on them (kettle is plugged-in, mug is clean, there is enough coffee). Then we started two actions (boiling the water and putting coffee in the mug), and when both of them were completed, we finally ended the procedure by pouring water in the mug and stirring.

Can you see it? I have just described the high-level functionality of a coffee program. It wasn't that hard because this is what the brain does all day long: evaluate conditions, decide to take actions, carry out tasks, repeat some of them, and stop at some point. Clean objects, put them back, and so on.

All you need now is to learn how to deconstruct all those actions you do automatically in real life so that a computer can actually make some sense of them. And you need to learn a language as well, to instruct it.

So this is what this module is for. I'll tell you how to do it and I'll try to do that by means of many simple but focused examples (my favorite kind).

A proper introduction

I love to make references to the real world when I teach coding; I believe they help people retain the concepts better. However, now is the time to be a bit more rigorous and see what coding is from a more technical perspective.

When we write code, we're instructing a computer on what are the things it has to do. Where does the action happen? In many places: the computer memory, hard drives, network cables, CPU, and so on. It's a whole "world", which most of the time is the representation of a subset of the real world.

If you write a piece of software that allows people to buy clothes online, you will have to represent real people, real clothes, real brands, sizes, and so on and so forth, within the boundaries of a program.

In order to do so, you will need to create and handle objects in the program you're writing. A person can be an object. A car is an object. A pair of socks is an object. Luckily, Python understands objects very well.

The two main features any object has are properties and methods. Let's take a person object as an example. Typically in a computer program, you'll represent people as customers or employees. The properties that you store against them are things like the name, the SSN, the age, if they have a driving license, their e-mail, gender, and so on. In a computer program, you store all the data you need in order to use an object for the purpose you're serving. If you are coding a website to sell clothes, you probably want to store the height and weight as well as other measures of your customers so that you can suggest the appropriate clothes for them. So, properties are characteristics of an object. We use them all the time: "Could you pass me that pen?" – "Which one?" – "The black one." Here, we used the "black" property of a pen to identify it (most likely amongst a blue and a red one).

Methods are things that an object can do. As a person, I have methods such as speak, walk, sleep, wake-up, eat, dream, write, read, and so on. All the things that I can do could be seen as methods of the objects that represents me.

So, now that you know what objects are and that they expose methods that you can run and properties that you can inspect, you're ready to start coding. Coding in fact is simply about managing those objects that live in the subset of the world that we're reproducing in our software. You can create, use, reuse, and delete objects as you please.

According to the Data Model chapter on the official Python documentation:

"Objects are Python's abstraction for data. All data in a Python program is represented by objects or by relations between objects."

We'll take a closer look at Python objects in the upcoming chapter. For now, all we need to know is that every object in Python has an ID (or identity), a type, and a value.

Once created, the identity of an object is never changed. It's a unique identifier for it, and it's used behind the scenes by Python to retrieve the object when we want to use it.

The type as well, never changes. The type tells what operations are supported by the object and the possible values that can be assigned to it.

The value can either change or not. If it can, the object is said to be mutable, while when it cannot, the object is said to be immutable.

How do we use an object? We give it a name of course! When you give an object a name, then you can use the name to retrieve the object and use it.

In a more generic sense, objects such as numbers, strings (text), collections, and so on are associated with a name. Usually, we say that this name is the name of a variable. You can see the variable as being like a box, which you can use to hold data.

So, you have all the objects you need: what now? Well, we need to use them, right? We may want to send them over a network connection or store them in a database. Maybe display them on a web page or write them into a file. In order to do so, we need to react to a user filling in a form, or pressing a button, or opening a web page and performing a search. We react by running our code, evaluating conditions to choose which parts to execute, how many times, and under which circumstances.

And to do all this, basically we need a language. That's what Python is for. Python is the language we'll use together throughout this module to instruct the computer to do something for us.

Now, enough of this theoretical stuff, let's get started.

Enter the Python

Python is the marvelous creature of Guido Van Rossum, a Dutch computer scientist and mathematician who decided to gift the world with a project he was playing around with over Christmas 1989. The language appeared to the public somewhere around 1991, and since then has evolved to be one of the leading programming languages used worldwide today.

I started programming when I was 7 years old, on a Commodore VIC 20, which was later replaced by its bigger brother, the Commodore 64. The language was BASIC. Later on, I landed on Pascal, Assembly, C, C++, Java, JavaScript, Visual Basic, PHP, ASP, ASP .NET, C#, and other minor languages I cannot even remember, but only when I landed on Python, I finally had that feeling that you have when you find the right couch in the shop. When all of your body parts are yelling, "Buy this one! This one is perfect for us!"

It took me about a day to get used to it. Its syntax is a bit different from what I was used to, and in general, I very rarely worked with a language that defines scoping with indentation. But after getting past that initial feeling of discomfort (like having new shoes), I just fell in love with it. Deeply. Let's see why.

About Python

Before we get into the gory details, let's get a sense of why someone would want to use Python (I would recommend you to read the Python page on Wikipedia to get a more detailed introduction).

To my mind, Python exposes the following qualities.

Portability

Python runs everywhere, and porting a program from Linux to Windows or Mac is usually just a matter of fixing paths and settings. Python is designed for portability and it takes care of operating system (OS) specific quirks behind interfaces that shield you from the pain of having to write code tailored to a specific platform.

Coherence

Python is extremely logical and coherent. You can see it was designed by a brilliant computer scientist. Most of the time you can just guess how a method is called, if you don't know it.

You may not realize how important this is right now, especially if you are at the beginning, but this is a major feature. It means less cluttering in your head, less skimming through the documentation, and less need for mapping in your brain when you code.

Developer productivity

According to Mark Lutz (Learning Python, 5th Edition, O'Reilly Media), a Python program is typically one-fifth to one-third the size of equivalent Java or C++ code. This means the job gets done faster. And faster is good. Faster means a faster response on the market. Less code not only means less code to write, but also less code to read (and professional coders read much more than they write), less code to maintain, to debug, and to refactor.

Another important aspect is that Python runs without the need of lengthy and time consuming compilation and linkage steps, so you don't have to wait to see the results of your work.

An extensive library

Python has an incredibly wide standard library (it's said to come with "batteries included"). If that wasn't enough, the Python community all over the world maintains a body of third party libraries, tailored to specific needs, which you can access freely at thePython Package Index (PyPI). When you code Python and you realize that you need a certain feature, in most cases, there is at least one library where that feature has already been implemented for you.

Software quality

Python is heavily focused on readability, coherence, and quality. The language uniformity allows for high readability and this is crucial nowadays where code is more of a collective effort than a solo experience. Another important aspect of Python is its intrinsic multi-paradigm nature. You can use it as scripting language, but you also can exploit object-oriented, imperative, and functional programming styles. It is versatile.

Software integration

Another important aspect is that Python can be extended and integrated with many other languages, which means that even when a company is using a different language as their mainstream tool, Python can come in and act as a glue agent between complex applications that need to talk to each other in some way. This is kind of an advanced topic, but in the real world, this feature is very important.

Satisfaction and enjoyment

Last but not least, the fun of it! Working with Python is fun. I can code for 8 hours and leave the office happy and satisfied, alien to the struggle other coders have to endure because they use languages that don't provide them with the same amount of well-designed data structures and constructs. Python makes coding fun, no doubt about it. And fun promotes motivation and productivity.

These are the major aspects why I would recommend Python to everyone for. Of course, there are many other technical and advanced features that I could have talked about, but they don't really pertain to an introductory section like this one. They will come up naturally, chapter after chapter, in this module.

What are the drawbacks?

Probably, the only drawback that one could find in Python, which is not due to personal preferences, is the execution speed. Typically, Python is slower than its compiled brothers. The standard implementation of Python produces, when you run an application, a compiled version of the source code called byte code (with the extension .pyc), which is then run by the Python interpreter. The advantage of this approach is portability, which we pay for with a slowdown due to the fact that Python is not compiled down to machine level as are other languages.

However, Python speed is rarely a problem today, hence its wide use regardless of this suboptimal feature. What happens is that in real life, hardware cost is no longer a problem, and usually it's easy enough to gain speed by parallelizing tasks. When it comes to number crunching though, one can switch to faster Python implementations, such as PyPy, which provides an average 7-fold speedup by implementing advanced compilation techniques (check http://pypy.org/ for reference).

When doing data science, you'll most likely find that the libraries that you use with Python, such as Pandas and Numpy, achieve native speed due to the way they are implemented.

If that wasn't a good enough argument, you can always consider that Python is driving the backend of services such as Spotify and Instagram, where performance is a concern. Nonetheless, Python does its job perfectly adequately.

Who is using Python today?

Not yet convinced? Let's take a very brief look at the companies that are using Python today: Google, YouTube, Dropbox, Yahoo, Zope Corporation, Industrial Light & Magic, Walt Disney Feature Animation, Pixar, NASA, NSA, Red Hat, Nokia, IBM, Netflix, Yelp, Intel, Cisco, HP, Qualcomm, and JPMorgan Chase, just to name a few.

Even games such as Battlefield 2, Civilization 4, and QuArK are implemented using Python.

Python is used in many different contexts, such as system programming, web programming, GUI applications, gaming and robotics, rapid prototyping, system integration, data science, database applications, and much more.

Setting up the environment

Before we talk about installing Python on your system, let me tell you about which Python version I'll be using in this module.

Python 2 versus Python 3 – the great debate

Python comes in two main versions—Python 2, which is the past—and Python 3, which is the present. The two versions, though very similar, are incompatible on some aspects.

In the real world, Python 2 is actually quite far from being the past. In short, even though Python 3 has been out since 2008, the transition phase is still far from being over. This is mostly due to the fact that Python 2 is widely used in the industry, and of course, companies aren't so keen on updating their systems just for the sake of updating, following the if it ain't broke, don't fix it philosophy. You can read all about the transition between the two versions on the Web.

Another issue that was hindering the transition is the availability of third-party libraries. Usually, a Python project relies on tens of external libraries, and of course, when you start a new project, you need to be sure that there is already a version 3 compatible library for any business requirement that may come up. If that's not the case, starting a brand new project in Python 3 means introducing a potential risk, which many companies are not happy to take.

At the time of writing, the majority of the most widely used libraries have been ported to Python 3, and it's quite safe to start a project in Python 3 for most cases. Many of the libraries have been rewritten so that they are compatible with both versions, mostly harnessing the power of the six (2 x 3) library, which helps introspecting and adapting the behavior according to the version used.

All the examples in this module will be run using this Python 3.4.0. Most of them will run also in Python 2 (I have version 2.7.6 installed as well), and those that won't will just require some minor adjustments to cater for the small incompatibilities between the two versions.

Don't worry about this version thing though: it's not that big an issue in practice.

Note

If any of the URLs or resources I'll point you to are no longer there by the time you read this course, just remember: Google is your friend.

What you need for this course

As you've seen there are too many requirements to get started, so I've prepared a table that will give you an overview of what you'll need for each module of the course:

Module 1

Module 2

Module 3

Module 4

All the examples in this module rely on the Python 3 interpreter. Some of the examples in this module rely on third-party libraries that do not ship with Python. These are introduced within the module at the time they are used, so you do not need to install them in advance. However, for completeness, here is a list:

piprequestspillowbitarray

While all the examples can be run interactively in a Python shell however, we recommend using IPython for this module. The version of libraries used in this module are:

NumPy 1.9.2pandas 0.16.2matplotlib 1.4.3tables 3.2.2pymongo 3.0.3redis 2.10.3scikit-learn 0.16.1

Any modern processor (from about 2010 onwards) and 4 GB of RAM will suffice, and you can probably run almost all of the code on a slower system too.

The exception here is with the final two chapters. In these chapters, I step through using Amazon Web Services (AWS) to run the code. This will probably cost you some money, but the advantage is less system setup than running the code locally.

If you don't want to pay for those services, the tools used can all be set up on a local computer, but you will definitely need a modern system to run it. A processor built in at least 2012 and with more than 4 GB of RAM is necessary.

Although the code examples will also be compatible with Python 2.7, it's better if you have the latest version of Python 3 (may be 3.4.3 or newer).

Installing Python

Python is a fantastic, versatile, and an easy-to-use language. It's available for all three major operating systems—Microsoft Windows, Mac OS X, and Linux—and the installer, as well as the documentation, can be downloaded from the official Python website: https://www.python.org.

Note

Windows users will need to set an environment variable in order to use Python from the command line. First, find where Python 3 is installed; the default location is C:\Python34. Next, enter this command into the command line (cmd program): set the environment to PYTHONPATH=%PYTHONPATH%;C:\Python34. Remember to change the C:\Python34 if Python is installed into a different directory.

Once you have Python running on your system, you should be able to open a command prompt and run the following code:

$ python3 Python 3.4.0 (default, Apr 11 2014, 13:05:11) [GCC 4.8.2] on Linux Type "help", "copyright", "credits" or "license" for more information. >>> print("Hello, world!") Hello, world! >>> exit()

Note that we will be using the dollar sign ($) to denote that a command is to be typed into the terminal (also called a shell or cmd on Windows). You do not need to type this character (or the space that follows it). Just type in the rest of the line and press Enter.

After you have the above "Hello, world!" example running, exit the program and move on to installing a more advanced environment to run Python code, the IPython Notebook.

Installing IPython

IPython is a platform for Python development that contains a number of tools and environments for running Python and has more features than the standard interpreter. It contains the powerful IPython Notebook, which allows you to write programs in a web browser. It also formats your code, shows output, and allows you to annotate your scripts. It is a great tool for exploring datasets.

To install IPython on your computer, you can type the following into a command-line prompt (not into Python):

$ pip install ipython[all]

You will need administrator privileges to install this system-wide. If you do not want to (or can't) make system-wide changes, you can install it for just the current user by running this command:

$ pip install --user ipython[all]

This will install the IPython package into a user-specific location—you will be able to use it, but nobody else on your computer can. If you are having difficulty with the installation, check the official documentation for more detailed installation instructions: http://ipython.org/install.html.

With the IPython Notebook installed, you can launch it with the following:

$ ipython3 notebook

This will do two things. First, it will create an IPython Notebook instance that will run in the command prompt you just used. Second, it will launch your web browser and connect to this instance, allowing you to create a new notebook. It will look something similar to the following screenshot (where home/bob will be replaced by your current working directory):

To stop the IPython Notebook from running, open the command prompt that has the instance running (the one you used earlier to run the IPython command). Then, press Ctrl + C and you will be prompted Shutdown this notebook server (y/[n])?. Type y and press Enter and the IPython Notebook will shut down.

Installing additional packages

Python 3.4 will include a program called pip, which is a package manager that helps to install new libraries on your system. You can verify that pip is working on your system by running the $ pip3 freeze command, which tells you which packages you have installed on your system.

The additional packages can be installed via the pip installer program, which has been part of the Python standard library since Python 3.3. More information about pip can be found at https://docs.python.org/3/installing/index.html.

After we have successfully installed Python, we can execute pip from the command-line terminal to install additional Python packages:

pip install SomePackage

Already installed packages can be updated via the --upgrade flag:

pip install SomePackage --upgrade

A highly recommended alternative Python distribution for scientific computing is Anaconda by Continuum Analytics. Anaconda is a free—including commercial use—enterprise-ready Python distribution that bundles all the essential Python packages for data science, math, and engineering in one user-friendly cross-platform distribution. The Anaconda installer can be downloaded at http://continuum.io/downloads#py34, and an Anaconda quick start-guide is available at https://store.continuum.io/static/img/Anaconda-Quickstart.pdf.

After successfully installing Anaconda, we can install new Python packages using the following command:

conda install SomePackage

Existing packages can be updated using the following command:

conda update SomePackage

The major Python packages that were used for writing this course are listed here:

NumPySciPyscikit-learnmatplotlibpandastablespymongoredis

As these packages are all hosted on PyPI, the Python package index, they can be easily installed with pip. To install NumPy, you would run:

$ pip install numpy

To install scikit-learn, you would run:

$ pip3 install -U scikit-learn

Note

Important

Windows users may need to install the NumPy and SciPy libraries before installing scikit-learn. Installation instructions are available at www.scipy.org/install.html for those users.

Users of major Linux distributions such as Ubuntu or Red Hat may wish to install the official package from their package manager. Not all distributions have the latest versions of scikit-learn, so check the version before installing it.

Those wishing to install the latest version by compiling the source, or view more detailed installation instructions, can go to http://scikit-learn.org/stable/install.html to view the official documentation on installing scikit-learn.

Most libraries will have an attribute for the version, so if you already have a library installed, you can quickly check its version:

>>> import redis >>> redis.__version__ '2.10.3'

This works well for most libraries. A few, such as pymongo, use a different attribute (pymongo uses just version, without the underscores).

How you can run a Python program

There are a few different ways in which you can run a Python program.

Running Python scripts

Python can be used as a scripting language. In fact, it always proves itself very useful. Scripts are files (usually of small dimensions) that you normally execute to do something like a task. Many developers end up having their own arsenal of tools that they fire when they need to perform a task. For example, you can have scripts to parse data in a format and render it into another different format. Or you can use a script to work with files and folders. You can create or modify configuration files, and much more. Technically, there is not much that cannot be done in a script.

It's quite common to have scripts running at a precise time on a server. For example, if your website database needs cleaning every 24 hours (for example, the table that stores the user sessions, which expire pretty quickly but aren't cleaned automatically), you could set up a cron job that fires your script at 3:00 A.M. every day.

Note

According to Wikipedia, the software utility Cron is a time-based job scheduler in Unix-like computer operating systems. People who set up and maintain software environments use cron to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals.

I have Python scripts to do all the menial tasks that would take me minutes or more to do manually, and at some point, I decided to automate. For example, I have a laptop that doesn't have a Fn key to toggle the touchpad on and off. I find this very annoying, and I don't want to go clicking about through several menus when I need to do it, so I wrote a small script that is smart enough to tell my system to toggle the touchpad active state, and now I can do it with one simple click from my launcher. Priceless.

Running the Python interactive shell

Another way of running Python is by calling the interactive shell. This is something we already saw when we typed python on the command line of our console.

So open a console, activate your virtual environment (which by now should be second nature to you, right?), and type python. You will be presented with a couple of lines that should look like this (if you are on Linux):

Python 3.4.0 (default, Apr 11 2014, 13:05:11)[GCC 4.8.2] on linuxType "help", "copyright", "credits" or "license" for more information.

Those >>> are the prompt of the shell. They tell you that Python is waiting for you to type something. If you type a simple instruction, something that fits in one line, that's all you'll see. However, if you type something that requires more than one line of code, the shell will change the prompt to ..., giving you a visual clue that you're typing a multiline statement (or anything that would require more than one line of code).

Go on, try it out, let's do some basic maths:

>>> 2 + 46>>> 10 / 42.5>>> 2 ** 1024179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216

The last operation is showing you something incredible. We raise 2 to the power of 1024, and Python is handling this task with no trouble at all. Try to do it in Java, C++, or C#. It won't work, unless you use special libraries to handle such big numbers.

I use the interactive shell every day. It's extremely useful to debug very quickly, for example, to check if a data structure supports an operation. Or maybe to inspect or run a piece of code.

When you use Django (a web framework), the interactive shell is coupled with it and allows you to work your way through the framework tools, to inspect the data in the database, and many more things. You will find that the interactive shell will soon become one of your dearest friends on the journey you are embarking on.

Another solution, which comes in a much nicer graphic layout, is to use IDLE (Integrated DeveLopment Environment). It's quite a simple IDE, which is intended mostly for beginners. It has a slightly larger set of capabilities than the naked interactive shell you get in the console, so you may want to explore it. It comes for free in the Windows Python installer and you can easily install it in any other system. You can find information about it on the Python website.

Guido Van Rossum named Python after the British comedy group Monty Python, so it's rumored that the name IDLE has been chosen in honor of Erik Idle, one of Monty Python's founding members.

Running Python as a service

Apart from being run as a script, and within the boundaries of a shell, Python can be coded and run as proper software. We'll see many examples throughout the module about this mode. And we'll understand more about it in a moment, when we'll talk about how Python code is organized and run.

Running Python as a GUI application

Python can also be run as aGUI (Graphical User Interface). There are several frameworks available, some of which are cross-platform and some others are platform-specific.

Tk is a graphical user interface toolkit that takes desktop application development to a higher level than the conventional approach. It is the standard GUI forTool Command Language (TCL), but also for many other dynamic languages and can produce rich native applications that run seamlessly under Windows, Linux, Mac OS X, and more.

Tkinter comes bundled with Python, therefore it gives the programmer easy access to the GUI world, and for these reasons, I have chosen it to be the framework for the GUI examples that I'll present in this module.

Among the other GUI frameworks, we find that the following are the most widely used:

PyQtwxPythonPyGtk

Describing them in detail is outside the scope of this module, but you can find all the information you need on the Python website in the GUI Programming section. If GUIs are what you're looking for, remember to choose the one you want according to some principles. Make sure they:

Offer all the features you may need to develop your projectRun on all the platforms you may need to supportRely on a community that is as wide and active as possibleWrap graphic drivers/tools that you can easily install/access

How is Python code organized

Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts.

Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module.

Note

If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion.

It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that).

A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, calledpackage, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more).

As always, an example will make all of this much clearer. I have created an example structure in my module project, and when I type in my Linux console:

$ tree -v example

I get a tree representation of the contents of the ch1/example folder, which holds the code for the examples of this chapter. Here's how a structure of a real simple application could look like:

example/├── core.py├── run.py└── util ├── __init__.py ├── db.py ├── math.py └── network.py

You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks.

As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder.

Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only