Difference and Differential Equations with Applications in Queueing Theory - Aliakbar Montazer Haghighi - E-Book

Difference and Differential Equations with Applications in Queueing Theory E-Book

Aliakbar Montazer Haghighi

0,0
107,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A Useful Guide to the Interrelated Areas of Differential Equations, Difference Equations, and Queueing Models Difference and Differential Equations with Applications in Queueing Theory presents the unique connections between the methods and applications of differential equations, difference equations, and Markovian queues. Featuring a comprehensive collection of topics that are used in stochastic processes, particularly in queueing theory, the book thoroughly discusses the relationship to systems of linear differential difference equations. The book demonstrates the applicability that queueing theory has in a variety of fields including telecommunications, traffic engineering, computing, and the design of factories, shops, offices, and hospitals. Along with the needed prerequisite fundamentals in probability, statistics, and Laplace transform, Difference and Differential Equations with Applications in Queueing Theory provides: * A discussion on splitting, delayed-service, and delayed feedback for single-server, multiple-server, parallel, and series queue models * Applications in queue models whose solutions require differential difference equations and generating function methods * Exercises at the end of each chapter along with select answers The book is an excellent resource for researchers and practitioners in applied mathematics, operations research, engineering, and industrial engineering, as well as a useful text for upper-undergraduate and graduate-level courses in applied mathematics, differential and difference equations, queueing theory, probability, and stochastic processes.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 528

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Title page

Copyright page

Dedication

Preface

CHAPTER ONE: Probability and Statistics

1.1. Basic Definitions and Concepts of Probability

1.2. Discrete Random Variables and Probability Distribution Functions

1.3. Moments of a Discrete Random Variable

1.4. Continuous Random Variables

1.5. Moments of a Continuous Random Variable

1.6. Continuous Probability Distribution Functions

1.7. Random Vector

1.8. Continuous Random Vector

1.9. Functions of a Random Variable

1.10. Basic Elements of Statistics

1.11. Inferential Statistics

1.12. Hypothesis Testing

1.13. Reliability

Exercises

CHAPTER TWO: Transforms

2.1. Fourier Transform

2.2. Laplace Transform

2.3. Z-Transform

2.4. Probability Generating Function

Exercises

CHAPTER THREE: Differential Equations

3.1. Basic Concepts and Definitions

3.2. Existence and Uniqueness

3.3. Separable Equations

3.4. Linear Differential Equations

3.5. Exact Differential Equations

3.6. Solution of the First ODE by Substitution Method

3.7. Applications of the First-Order ODEs

3.8. Second-Order Homogeneous ODE

3.9. The Second-Order Nonhomogeneous Linear ODE with Constant Coefficients

3.10. Miscellaneous Methods for Solving ODE

3.11. Applications of the Second-Order ODE

3.12. Introduction to PDE: Basic Concepts

Exercises

CHAPTER FOUR: Difference Equations

4.1. Basic Terms

4.2. Linear Homogeneous Difference Equations with Constant Coefficients

4.3. Linear Nonhomogeneous Difference Equations with Constant Coefficients

4.4. System of Linear Difference Equations

4.5. Differential–Difference Equations

4.6. Nonlinear Difference Equations

Exercises

CHAPTER FIVE: Queueing Theory

5.1. Introduction

5.2. Markov Chain and Markov Process

5.3. Birth and Death (B-D) Process

5.4. Introduction to Queueing Theory

5.5.Single-Server Markovian Queue, M/M/1

5.6. Finite Buffer Single-Server Markovian Queue: M/M/1/N

5.7. M/M/1 Queue with Feedback

5.8. Single-Server Markovian Queue with State-Dependent Balking

5.9. Multiserver Parallel Queue

5.10. Many-Server Parallel Queues with Feedback

5.11. Many-Server Queues with Balking and Reneging

5.12. Single-Server Markovian Queueing System with Splitting and DELAYED Feedback

Exercises

Appendix

The Poisson Probability Distribution

The Chi-Square Distribution

The Chi-Square Distribution (continued)

The Standard Normal Probability Distribution

The Standard Normal Probability Distribution (continued)

The (Student's) t Probability Distribution

References and Further Readings

Answers/Solutions to Selected Exercises

Index

Copyright © 2013 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Haghighi, Aliakbar Montazer.

Difference and differential equations with applications in queueing theory / Aliakbar Montazer Haghighi Department of Mathematics, Prairie View A&M University, Prairie View, Texas, Dimitar P. Mishev Department of Mathematics, Prairie View A&M University, Prairie View, Texas.

pages cm

Includes bibliographical references and index.

ISBN 978-1-118-39324-6 (hardback) 1. Queuing theory. 2. Difference equations. 3. Differential equations. I. Mishev, D. P. (Dimiter P.) II. Title.

QA274.8.H338 2013

519.8'2–dc23

2013001302

To my better half, Shahin,

who has given me three beautiful grandchildren:

Maya Haghighi, Kayvan Haghighi, and Leila Madison Grant.

—A. M.Haghighi

To my mother, wife, and three children.

—D. P. Mishev

Preface

Topics in difference and differential equations with applications in queueing theory typically span five subject areas: (1) probability and statistics, (2) transforms, (3) differential equations, (4) difference equations, and (5) queueing theory. These are addressed in at least four separate textbooks and taught in four different courses in as many semesters. Due to this arrangement, students needing to take these courses have to wait to take some important and fundamental required courses until much later than should be necessary. Additionally, based on our long experience in teaching at the university level, we find that perhaps not all topics in one subject are necessary for a degree. Hence, perhaps we as faculty and administrators should rethink our traditional way of developing and offering courses. This is another reason for the content of this book, as of the previous one from the authors, to offer several related topics in one textbook. This gives the instructor the freedom to choose topics according to his or her desire to emphasize, yet cover enough of a subject for students to continue to the next course, if necessary.

The methodological content of this textbook is not exactly novel, as “mathematics for engineers” textbooks have reflected this method for long past. However, that type of textbook may cover some topics that an engineering student may already know. Now with this textbook the subject will be reinforced. The need for this practice has generally ignored some striking relations that exist between the seemingly separate areas of a subject, for instance, in statistical concepts such as the estimation of parameters of distributions used in queueing theory that are derived from differential–difference equations. These concepts commonly appear in queueing theory, for instance, in measures on effectiveness in queuing models.

All engineering and mathematics majors at colleges and universities take at least one course in ordinary differential equations, and some go further to take courses in partial differential equations. As mentioned earlier, there are many books on “mathematics for engineers” on the market, and one that contains some applications using Laplace and Fourier transforms. Some also have included topics of probability and statistics, as the one by these authors. However, there is a lack of applications of probability and statistics that use differential equations, although we did it in our book. Hence, we felt that there is an urgent need for a textbook that recognizes the corresponding relationships between the various areas and a matching cohesive course. Particularly, theories of queues and reliability are two of those topics, and this book is designed to achieve just that. Its five chapters, while retaining their individual integrity, flow from selected topics in probability and statistics to differential and difference equations to stochastic processes and queueing theory.

Chapter 1 establishes a strong foundation for what follows in Chapter 2 and beyond. Classical Fourier and Laplace transforms as well as Z-transforms and generating functions are included in Chapter 2. Partial differential equations are often used to construct models of the most basic theories underlying physics and engineering, such as the system of partial differential equations known as Maxwell's equations, from which one can derive the entire theory of electricity and magnetism, including light. In particular, elegant mathematics can be used to describe the vibrating circular membrane. However, our goal here is to develop the most basic ideas from the theory of partial differential equations and to apply them to the simplest models arising from physics and the queueing models. Detailed topics of ordinary and partial differential and difference equations are included in Chapter 3 and Chapter 4 that complete the necessary tools for Chapter 5, which discusses stochastic processes and queueing models. However, we have also included the power series method of solutions of differential equations, which can be applied to, for instance, Bessel's equation.

In our previous book, we required two semesters of calculus and a semester of ordinary differential equations for a reader to comprehend the contents of the book. In this book, however, knowledge of at least two semesters of calculus that includes some familiarity with terminology such as the gradient, divergence, and curl, and the integral theorems that relate them to each other, are needed. However, we discuss not only the topics in differential equation, but also the difference equations that have vast applications in the theory of signal processing, stochastic analysis, and queueing theory.

Few instructors teach the combined subject areas together due to the difficulties associated with handling such a rigorous course with such hefty materials. Instructors can easily solve this issue by teaching the class as a multi-instructor course.

We should note that throughout the book, we use boldface letters, Greek or Roman (lowercase or capital) for vectors and matrices. We shall write P(n) or Pn to mean P as a function of a discrete parameter n. Thus, we want to make sure that students are well familiar with functions of discrete variables as well as continuous ones. For instance, a vibrating string can be regarded as a continuous object, yet if we look at a fine enough scale, the string is made up of molecules, suggesting a discrete model with a large number of variables. There are many cases in which a discrete model may actually provide a better description of the phenomenon under study than a continuous one.

We also want to make sure that students realize that solution of some problems requires the ability to carry out lengthy calculations with confidence. Of course, all of these skills are necessary for a thorough understanding of the mathematical terminology that is an essential foundation for the sciences and engineering. We further note that subjects discussed in each chapter could be studied in isolation; however, their cohesiveness comes from a thorough understanding of applications, as discussed in this book.

We hope this book will be an interesting and useful one to both students and faculty in science, technology, engineering, and mathematics.

Aliakbar Montazer Haghighi

Dimitar P. Mishev

Houston, Texas

April 2013

CHAPTER ONE

Probability and Statistics

The organization of this book is such that by the time reader gets to the last chapter, all necessary terminology and methods of solutions of standard mathematical background has been covered. Thus, we start the book with the basics of probability and statistics, although we could have placed the chapter in a later location. This is because some chapters are independent of the others.

In this chapter, the basics of probability and some important properties of the theory of probability, such as discrete and continuous random variables and distributions, as well as conditional probability, are covered.

After the presentation of the basics of probability, we will discuss statistics. Note that there is still a dispute as to whether statistics is a subject on its own or a branch of mathematics. Regardless, statistics deals with gathering, analyzing, and interpreting data. Statistics is an important concept that no science can do without. Statistics is divided in two parts: descriptive statistics and inferential statistics. Descriptive statistics includes some important basic terms that are widely used in our day-to-day lives. The latter is based on probability theory. To discuss this part of the statistics, we include point estimation, interval estimation, and hypothesis testing.

We will discuss one more topic related to both probability and statistics, which is extremely necessary for business and industry, namely reliability of a system. This concept is also needed in applications such as queueing networks, which will be discussed in the last chapter.

In this chapter, we cover as much probability and statistics as we will need in this book, except some parts that are added for the sake of completeness of the subject.

1.1. Basic Definitions and Concepts of Probability

Nowadays, it has been established in the scientific world that since quantities needed are not quite often predictable in advance, randomness should be accounted for in any realistic world phenomenon, and that is why we will consider random experiments in this book.

Determining probability, or chance, is to quantify the variability in the outcome or outcomes of a random experiment whose exact outcome or outcomes cannot be predicted by certainty. Satellite communication systems, such as radar, are built of electronic components such as transistors, integrated circuits, and diodes. However, as any engineer would testify, the components installed usually never function as the designer has anticipated. Thus, not only is the probability of failure to be considered, but the reliability of the system is also quite important, since the failure of the system may have not only economic losses but other damages as well. With probability theory, one may answer the question, “How reliable is the system?”

Definition 1.1.1. Basics
(a) Any result of performing an experiment is called an outcome of that experiment. A set of outcomes is called an event.
(b) If occurrences of outcomes are not certain or completely predictable, the experiment is called a chance or random experiment.
(c) In a random experiment, sets of outcomes that cannot be broken down into smaller sets are called elementary (or simple or fundamental) events.
(d) An elementary event is, usually, just a singleton (a set with a single element, such as {e}). Hence, a combination of elementary events is just an event.
(e) When any element (or outcome) of an event happens, we say that the event occurred.
(f) The union (set of all elements, with no repetition) of all events for a random experiment (or the set of all possible outcomes) is called the sample space.
(g) In “set” terminology, an event is a subset of the sample space. Two events A1 and A2 are called mutually exclusive if their intersection is the empty set, that is, they are disjoint subsets of the sample space.
(h) Let A1, A2, … , An be mutually exclusive events such that A1 ∪ A2 ∪ … ∪ An = Ω. The set of {A1, A2, … , An} is then called a partition of the sample space Ω.
(i) For an experiment, a collection or a set of all individuals, objects, or measurements of interest is called a (statistical) population.For instance, to determine the average grade of the differential equation course for all mathematics major students in four-year colleges and universities in Texas, the totality of students majoring mathematics in the colleges and universities in the Texas constitutes the population for the study.Usually, studying the population may not be practically or economically feasible because it may be quite time consuming, too costly, and/or impossible to identify all members of it. In such cases, sampling is being used.
(j) A portion, subset, or a part of the population of interest (finite or infinite number of them) is called a sample.Of course, the sample must be representative of the entire population in order to make any prediction about the population.
(k) An element of the sample is called a sample point. By quantification of the sample we mean changing the sample points to numbers.
(l) The range is the difference between the smallest and the largest sample points.
(m) A sample selected such that each element or unit in the population has the same chance to be selected is called a random sample.
(n) The probability of an event A, denoted by P(A), is a number between 0 and 1 (inclusive) describing likelihood of the event A to occur.
(o) An event with probability 1 is called an almostsure event. An event with probability 0 is called a null or an impossible event.
(p) For a sample space with n (finite) elements, if all elements or outcomes have the same chance to occur, then we assign probability 1/n to each member. In this case, the sample space is called equiprobable.For instance, to choose a digit at random from 1 to 5, we mean that every digit of {1, 2, 3, 4, 5} has the same chance to be picked, that is, all elementary events in {1}, {2}, {3}, {4}, and {5} are equi­probable. In that case, we may associate probability 1/5 to each digit singleton.
(q) If a random experiment is repeated, then the chance of occurrence of an outcome, intuitively, will be approximated by the ratio of occurrences of the outcome to the total number of repetitions of the experiment. This ratio is called the relative frequency.
Axioms of Probabilities of Events

We now state properties of probability of an event A through axioms of probability. The Russian mathematician Kolmogorov originated these axioms in early part of the twentieth century. By an axiom, it is meant a statement that cannot be proved or disproved. Although all probabilists accept the three axioms of probability, there are axioms in mathematics that are still controversial, such as the axiom of choice, and not accepted by some prominent mathematicians.

Let Ω be the sample space, the set function containing all possible events drawn from Ω, and P denote the probability of an event. The triplet is then called the probability space. Later, after we define a random variable, we will discuss this space more rigorously.

Axioms of Probability
Axiom A1. 0 ≤ P(A) ≤ 1 for each event A in .
Axiom A2.P(Ω) = 1.
Axiom A3. If A1 and A2 are mutually exclusive events in , then:

where mutually exclusive events are events that have no sample point in common, and the symbol ∪ means the union of two sets, that is, the set of all elements in both set without repetition.

Note that the axioms stated earlier are for events. Later, we will define another set of axioms of probability involving random variables.

If the occurrence of an event has influence on the occurrence of other events under consideration, then the probabilities of those events change.

Definition 1.1.2

Suppose is a probability space and B is an event (i.e., ) with positive probability, P(B) > 0. The conditional probability of A given B, denoted by P(A|B), defined on , is then given by:

(1.1.1)

If P(B) = 0, then P(A|B) is not defined. Under the condition given, we will have a new triple, that is, a new probability space . This space is called the conditional probability space induced on, given B.

Definition 1.1.3

For any two events A and B with conditional probability P(B | A) or P(A | B), we have the multiplicative law, which states:

(1.1.2)

We leave it as an exercise to show that for n events A1, A2, … , An, we have:

(1.1.3)

Definition 1.1.4

We say that events A and B are independent if and only if:

(1.1.4)

It will be left as an exercise to show that if events A and B are independent and P(B) > 0, then:

(1.1.5)

It can be shown that if P(B) > 0 and (1.1.5) is true, then A and B are independent. For proof, see Haghighi et al. (2011a, p. 139).

The concept of independence can be extended to a finite number of events.

Definition 1.1.5

Events A1, A2, … , An are independent if and only if the probability of the intersection of any subset of them is equal to the product of corresponding probabilities, that is, for every subset {i1, … , ik} of {1, … , n} we have:

(1.1.6)

As one of the very important applications of conditional probability, we state the following theorem, whose proof may be found in Haghighi et al. (2011a):

Theorem 1.1.1. The Law of Total Probability

Let A1, A2, … , An be a partition of the sample space Ω. For any given event B, we then have:

(1.1.7)

Theorem 1.1.1 leads us to another important application of conditional probability. Proof of this theorem may also found in Haghighi et al. (2011a).

Theorem 1.1.2. Bayes' Formula

Let A1, A2, … , An be a partition of the sample space Ω. If an event B occurs, the probability of any event Aj given an event B is:

(1.1.8)

Example 1.1.1

Suppose in a factory three machines A, B, and C produce the same type of products. The percent shares of these machines are 20, 50, and 30, respectively. It is observed that machines A, B, and C produce 1%, 4%, and 2% defective items, respectively. For the purpose of quality control, a produced item is chosen at random from the total items produced in a day. Two questions to answer:

1. What is the probability of the item being defective?
2. Given that the item chosen was defective, what is the probability that it was produced by machine B?
Answers

To answer the first question, we denote the event of defectiveness of the item chosen by D. By the law of total probability, we will then have:

Hence, the probability of the produced item chosen at random being defective is 2.8%.

To answer the second question, let the conditional probability in question be denoted by P(B | D). By Bayes' formula and answer to the first question, we then have:

Thus, the probability that the defective item chosen be produced by machine C is 71.4%.

Example 1.1.2

Suppose there are three urns that contain black and white balls as follows:

(1.1.9)

A ball is drawn randomly and it is “white.” Discuss possible probabilities.

Discussion

The sample space Ω is the set of all pairs (·,·), where the first dot represents the urn number (1, 2, or 3) and the second represents the color (black or white). Let U1, U2 and U3 denote events that drawing was chosen from, respectively. Assuming that urns are identical and balls have equal chances to be chosen, we will then have:

(1.1.10)

Also, U1 = (1,·), U2 = (2,·), U3 = (3,·).

Let W denote the event that a white ball was drawn, that is, W = {(·, w)}. From (1.1.9), we have the following conditional probabilities:

(1.1.11)

From Bayes' rule, (1.1.9), (1.1.10), and (1.1.11), we have:

(1.1.12)

(1.1.13)

Note that denominator of (1.1.12) is:

(1.1.14)

Using (1.1.14), we have:

(1.1.15)

Again, using (1.1.14), we have:

(1.1.16)

Now, observing from (1.1.13), (1.1.15), and (1.1.16), there is a better chance that the ball was drawn from the second urn. Hence, if we assume that the ball was drawn from the second urn, there is one white ball that remains in it. That is, we will have the three urns with 0, 1, and 1 white ball, respectively, in urns 1, 2, and 3.

1.2. Discrete Random Variables and Probability Distribution Functions

As we have seen so far, elements of a sample space are not necessarily numbers. However, for convenience, we would rather have them so. This is done through what is called a random variable. In other words, a random variable quantifies the sample space. That is, a random variable assigns numerical (or set) labels to the sample points. Formally, we define a random variable as follows:

Definition 1.2.1

A random variable is a function (or a mapping) on the sample space.

We note that a random variable is really neither a variable (as known independent variable) nor random, but as mentioned, it is just a function. Also note that sometimes the range of a random variable may not be numbers. This is simply because we defined a random variable as a mapping. Thus, it maps elements of a set into some elements of another set. Elements of either set do not have to necessarily be numbers.

There are two main types of random variables, namely, discrete and continuous. We will discuss each in detail.

Definition 1.2.2

A discrete random variable is a function, say X, from a countable sample space, Ω (that could very well be a numerical set), into the set of real numbers.

Example 1.2.1

Suppose we are to select two digits from 1 to 6 such that the sum of the two numbers selected equals 7. Assume that repetition is not allowed. The sample space under consideration will then be S = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, which is discrete. This set can also be described as S = {(i, j): i + j = 7, i, j = 1, 2, … , 6}.

Now, the random variable X can be defined by X((i, j)) = k, k = 1, 2, … , 6. That is, the range of X is the set {1, 2, 3, 4, 5, 6} such that, for instance, X((1, 6)) = 1, X((2, 5)) = 2, X((3, 4)) = 3, X((4, 3)) = 4, X((5, 2)) = 5, and X((6, 1)) = 6. In other words, the discrete random variable X has quantified the set of ordered pairs S to a set of positive integers from 1 to 6.

Example 1.2.2

Toss a fair coin three times. Denoting heads by H and tails by T, the sample space will then contain eight triplets as Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Each tossing will result in either heads or tails. Thus, we might define the random variable X to take values 1 and 0 for heads and tails, respectively, at the jth tossing. In other words,

Hence, P{Xj = 0} = 1/2 and P{Xj = 1} = 1/2. Now from the sample space we see that the probability of the element HTH is:

(1.2.1)

In contrast, product of individual probabilities is:

(1.2.2)

From (1.2.1) and (1.2.2), we see that X1, X2, and X3 are mutually independent.

Now suppose we define X and Y as the total number of heads and tails, respectively, after the third toss. The probability, then, of three heads and three tails is obviously zero, since these two events cannot occur at the same time, that is, P{X = 3, Y = 3} = 0. However, from the sample space probabilities of individual events are P{X = 3} = 1/8 and P{Y = 3} = 1/8. Thus, the product is:

Hence, X and Y, in this case, are not independent.

One of the useful concepts using random variable is the indicator function (or indicator random variable that we will define in the next section.

Definition 1.2.3

Let A be an event from the sample space Ω. The random variable IA(ω) for ω ∈ A defined as:

(1.2.3)

is called the indicator function (or indicator random variable).

Note that for every ω ∈ Ω, IΩ(ω) = 1 and Iϕ(ω) = 0.

We leave it as an exercise for the reader to show the following properties of random variables:

(a) if X and Y are two discrete random variables, then X ± Y and XY are also random variables, and
(b) if {Y = 0} is empty, X/Y is also a random variable.

The way probabilities of a random variable are distributed across the possible values of that random variable is generally referred to as the probability distribution of that random variable. The following is the formal definition.

Definition 1.2.4

Let X be a discrete random variable defined on a sample space Ω and x is a typical element of the range of X. Let px denote the probability that the random variable X takes the value x, that is,

(1.2.4)

where pX is called the probability mass function (pmf) of X and also referred to as the (discrete) probability density function (pdf) of X.

Note that , where x varies over all possible values for X.

Example 1.2.3

Suppose a machine is in either “good working condition” or “not good working condition.” Let us denote “good working condition” by 1 and “ not good working condition” by 0. The sample space of states of this machine will then be Ω  {0, 1}. Using a random variable X, we define P([X = 1]) as the probability that the machine is in “good working condition” and P([X = 0]) as the probability that the machine is not in “good working condition.” Now if P([X = 0])  4/5 and P([X = 0]) = 1/5, then we have a distribution for X.

Definition 1.2.5

Suppose X is a discrete random variable, and x is a real number from the interval (−∞, x]. Let us define FX(x) as:

(1.2.5)

where pn is defined as P([X = n]) or P(X = n). FX(x) is then called the cumulative distribution function (cdf) for X.

Note that from the set of axioms of probability mentioned earlier, for all x, we have:

(1.2.6)

We now discuss selected important discrete probability distribution functions. Before that, we note that a random experiment is sometimes called a trial.

Definition 1.2.6

A Bernoulli trial is a trial with exactly two possible outcomes. The two possible outcomes of a Bernoulli trial are often referred to as success and failure denoted by s and f, respectively. If a Bernoulli trial is repeated independently n times with the same probabilities of success and failure on each trial, then the process is called Bernoulli trials.

Notes:
(1) From Definition 1.2.6, if the probability of s is p, 0 ≤ p ≤ 1, then, by the second axiom of probability, the probability of f will be q = 1 − p.
(2) By its definition, in a Bernoulli trial, the sample space for each trial has two sample points.
Definition 1.2.7

Now, let X be a random variable taking values 1 and 0, corresponding to success and failure, respectively, of the possible outcome of a Bernoulli trial, with p (p > 0) as the probability of success and q as probability of failure. We will then have:

(1.2.7)

Formula (1.2.7) is the probability distribution function (pmf) of the Bernoulli random variable X.

Note that (1.2.7) is because first of all, pkq1−k > 0, and second, .

Example 1.2.4

Suppose we test 6 different objects for strength, in which the probability of breakdown is 0.2. What is the probability that the third object test be successful is, that is, does not breakdown?

Answer

In this case, we have a sequence of six Bernoulli trials. Let us assume 1 for a success and 0 for a failure. We would then have a 6-tuple (001000) to symbolize our objective. Hence, the probability would be (0.2)(0.2)(0.8)(0.2)(0.2)(0.2) = 0.000256.

Now suppose we repeat a Bernoulli trial independently finitely many times. We would then be interested in the probability of given number of times that one of the two possible outcomes occurs regardless of the order of their occurrences. Therefore, we will have the following definition:

Definition 1.2.8

Suppose Xn is the random variable representing the number of successes in n independent Bernoulli trials. Denote the pmf of Xn by Bk = b(k; n, p). Bk = b(k; n, p) is called the binomial distribution function with parameters n and p of the random variable X, where the parameters n, p and the number k refer to the number of independent trials, probability of success in each trial, and the number of successes in n trials, respectively. In this case, X is called the binomial random variable. The notation X ∼ b(k; n, p) is used to indicate that X is a binomial random variable with parameters n and p.

We leave it as an exercise to prove that:

(1.2.8)

where q = 1 − p.

Example 1.2.5

Suppose two identical machines run together, each to choose a digit from 1 to 9 randomly five times. We want to know what the probability that a sum of 6 or 9 appears k times (k = 0, 1, 2, 3, 4, 5) is.

Answer

To answer the question, note that we have five independent trials. The sample space in this case for one trial has 81 sample points and can be written in a matrix form as follows:

There are 13 sample points, where the sum of the components is 6 or 9. They are:

Hence, the probability of getting a sum as 6 or 9 on one selection of both machines together (i.e., probability of a success) is p = 13/81. Now let X be the random variable representing the total times a sum as 6 or 9 is obtained in 5 trials. Thus, from (1.2.8), we have:

For instance, the probability that the sum as 6 or 9 does not appear at all will be (68/81)5 = 0.42, that is, there is a (100 − 42) = 58% chance that we do get at least a sum as 6 or 9 during the five trials.

Based on a sequence of independent Bernoulli trials, we now define two other important discrete random variables. Consider a sequence of independent Bernoulli trials with probability of success in each trial as p, 0 ≤ p ≤ 1. Suppose we are interested in the total number of trials required to have the rth success, r being a fixed positive integer. The answer is in the following definition:

Definition 1.2.9

Let X be a random variable with pmf as:

(1.2.9)

Formula (1.2.9) is then called a negative binomial (or Pascal) probability distribution function (or binomial waiting time). In particular, if r = 1 in (1.2.9), then we will have:

(1.2.10)

The pmf given by (1.2.10) is called a geometric probability distribution function.

Example 1.2.6

As an example, suppose a satellite company finds that 40% of call for services received need advanced technology service. Suppose also that on a particular crazy day, all tickets written are put in a pool and requests are drawn randomly for service. Finally, suppose that on that particular day there are four advance service personnel available. We want to find the probability that the fourth request for advanced technology service is found on the sixth ticket drawn from the pool.

Answer

In this problem, we have independent trials with p = 0.4 as probability of success, that is, in need of advanced technology service, on any trial. Let X represent the number of the tickets on which the fourth request in question is found. Thus,

Example 1.2.7

We now want to derive (1.2.9) differently. Suppose treatment of a cancer patient may result in “response” or “no response.” Let the probability of a response be p and for a no response be 1 − p. Hence, the simple space in this case has two outcomes, simply, “response” and “no response.” We now repeatedly treat other patients with the same medicine and observe the reactions. Suppose we are looking for the probability of the number of trials required to have exactly k “responses.”

Answer

Denoting the sample space by S, S = {response, no response}. Let us define the random variable X on S to denote the number of trials needed to have exactly k responses. Let A be the event, in S, of observing k − 1 responses in the first x − 1 treatments. Let B be the event of observing a response at the xth treatment. Let also C be the event of treating x patients to obtain exactly k responses. Hence, C = A ∩ B. The probability of C is:

In contrast, P(B | A) = p and:

Moreover, P(X = x) = P(C). Hence:

(1.2.11)

We leave it as an exercise to show that (1.2.11) is equivalent to (1.2.9).

Definition 1.2.10

Let n represent a sample (sampling without replacement) from a finite population of size N that consists of two types of items n1 of “defective,” say, and n2 of “nondefective,” say, n1 + n2 = n. Suppose we are interested in the probability of selecting x “defective” items from the sample. n1 must be at least as large as x. Hence, x must be less than or equal to the smallest of n and n1. Thus,

(1.2.12)

defines the general form of hypergeometric pmf of the random variable X.

Notes:
i. If sampling would have been with replacement, distribution would have been binomial.
ii.px is the probability of waiting time for the occurrence of exactly x “defective” outcomes. We could think of this scenario as an urn containing N white and green balls. From the urn, we select a random sample (a sample selected such that each element has the same chance to be selected) of size n, one ball at a time without replacement. The sample consists of n1 white and n2 green balls, n1 + n2 = n. What is the probability of having x white balls drawn in a row? This model is called an urn model.
iii. If we let xi equal to 1 if a defective item is selected and 0 if a nondefective item is selected, and let x be the total number of defectives selected, then . Now, if we consider selection of a defective item as a success, for instance, then we could also interoperate (1.2.12) as:

(1.2.13)

Example 1.2.8

Suppose we have 100 balls in a box and 10 of them are red. If we randomly take out 40 of them (without replacement), what is the probability that we will have at least 6 red balls?

Answer

In this example, which is a hypergeometric distribution, if we assume that all 40 balls are withdrawn at the same time, N = 100, n = 40, n1 = 10, “defective,” is replaced by “red,” and n2 = 90, and “nondefective” is replaced by “nonred”. The question is to find the probability of selecting at least six red balls. To find the probabilities, we need to calculate the probabilities of 7, 8, 9, and 10 red balls and sum them all or calculate p ≡ 1 − P{number of red balls} ≤ 5. To do this, we could use statistical software, called Stata, for instance. However, using (1.2.12) or (1.2.13), we have:

Thus, p = 1 − 0.846 = 0.154 = 15.4%.

We caution that if one uses Excel formula as: “=HYPGEOMDIST(6,40,10,100),” which works out to be 10%, it is not quite right.

As our final important discrete random variable, we define the Poisson probability distribution function.

Definition 1.2.11

A Poisson random variable is a nonnegative random variable X such that:

(1.2.14)

where λ is a constant. Formula (1.2.14) is called a Poisson probability distribution function with parameter λ.

Example 1.2.9

Suppose that the number of telephone calls arriving to a switchboard of an institution every working day has a Poisson distribution with parameter 20. What is the probability that there will be:

(a) 20 calls in one day?
(b) at least 30 calls in one day?
(c) at most 30 calls in one day?
Answers

Using λ = 20 in (1.2.12) we will have:

(a).
(b).
(c)

Let X be a binomial random variable with distribution function Bk and λ = np be fixed. We will then leave it as an exercise to show that:

(1.2.15)

1.3. Moments of a Discrete Random Variable

We now discuss some properties of a discrete distribution.

Definition 1.3.1

Suppose X is a discrete random variable defined on a sample space Ω with pmf of pX. The mathematical expectation or simply expectation of X, or expected value of X, or the mean of X or the first moment of X, denoted by E(X), is then defined as follows:If Ω is finite and the range of X is {x1, x2, … , xn}, then:

(1.3.1)

and if Ω is infinite and the range of X is {x1, x2, … , xn, …}, then:

(1.3.2)

provided that the series converges. If Ω is finite and , i = 1, 2, … , n, is constant for all i's, say 1/n, then the right-hand side of (1.3.1) will become x1 + x2 + … + xn/n. This expression is denoted by and is called arithmetic average of x1, x2, … , xn, that is,

(1.3.3)

, i = 1, 2, … , n in (1.3.1), (1.3.2), and (1.3.3) is called the weight for the values of the random variable X. Hence, in (1.3.1) and (1.3.2), the weights vary and E(X) is called the weighted average, while in (1.3.3) the weights are the same and is called the arithmetic average or simply the average.

We next state some properties of the first moment without proof. We leave the proof as exercises.

Properties of the First Moment
1. The expected value of the indicator function IA(ω) defined in (1.2.3) is P(A), that is,

(1.3.4)

2. If c is a constant, then E(c) = c.
3. If c, c1, and c2 are constants and X and Y are two random variables, then:

(1.3.5)

4. If X1, X2, … , Xn are n random variables, then:

(1.3.6)

5. Let X1 and X2 be two independent random variables with marginal mass (density) functions and , respectively. If E(X1) and E(X2) exist, we will then have:

(1.3.7)

6. For a finite number of random variables, that is, if X1, X2, … , Xn are n independent random variables, then:

(1.3.8)

We now extend the concept of moments.

Definition 1.3.2

Let X be a discrete random variable and r a positive integer. E(Xr) is then called the rth moment of X or moment of order r of X. In symbols:

(1.3.9)

Note that if r = 1, E(Xr) = E(X), that is, the moment of first order or the first moment of X is just the expected value of X. The second moment, that is, E(X2) is also important, as we will see later.

Let us denote E(X) by μ, that is, E(X) ≡ μ. It is clear that if X is a random variable, so is X − μ, where μ is a constant. However, since E(X − μ) = E(X) − E(μ) = μ − μ = 0, we can centerX by choosing the new random variable X − μ. This leads to the following definition.

Definition 1.3.3

The rth moment of the random variable X − μ, denoted by μr(X) is defined by E[(X − μ)r], and is called the rth central moment of X, that is,

(1.3.10)

Note that the random variable X − μ measures the deviation of X from its mean. Thus, we have the next definition:

Definition 1.3.4

The variance of a random variable, X, denoted by Var(X) or equivalently by σ2(X), or if there is no fear of confusion, just σ2, is defined as the second central moment of X, that is,

(1.3.11)

The positive square root of the variance of a random variable X is called the standard deviation and is denoted by σ(X).

It can easily be shown that if X is a random variable and μ is finite, then:

(1.3.12)

It can also be easily proven that if X is a random variable and c is a real number, then:

(1.3.13)

(1.3.14)

Example 1.3.1

Consider the Indicator function defined in (1.2.3). That is,

The expected value of IA(ω) is:

Example 1.3.2

Consider the Bernoulli random variable defined in Definition 1.2.7. Thus, the random variable X takes two values 1 and 0, for instance, for success and failure, respectively. The probability of success is assumed to be p. Thus, the expected value of X is:

To find the variance, note that:

Hence,

Example 1.3.3

We want to find the mean and variance of the random variable X having binomial distribution defined in (1.2.8).

Answer

From (1.2.8), we have:

We leave it as an exercise to show that the Var(X) = np(1 − p).

Example 1.3.4

Consider the Poisson distribution defined by (1.2.14). We want to find the mean and variance of the random variable X having Poisson pmf as given in (1.2.14).

Answer

1.4. Continuous Random Variables

So far we have been discussing discrete random variables, discrete distribution functions, and some of their properties. We now discuss continuous cases.

Definition 1.4.1

When the values of outcomes of a random experiment are real numbers (not necessarily integers or rational), the sample space, Ω, is a called a continuous sample space, that is, Ω is the entire real number set or a subset of it (i.e., an interval or a union of intervals).

The set consisting of all subsets of real numbers is extremely large and it will be impossible to assign probabilities to all of them. It has been shown in the theory of probability that a smaller set, say , may be chosen that contains all events of our interest. In this case, is, loosely, referred to as the Borel field. We now pause to discuss Borel field more rigorously.

Definition 1.4.2

A nonempty set-function is called a σ-algebra if it is closed under complements, and under finite or countable unions, that is,

(i)A1, , then A1 ∪ A2 and , and
(ii), i ≥ 1, then .

Note that axioms (i) and (ii) imply that, should be closed under finite and countable intersections as well.

Example 1.4.1

The power set of a set X, , is a σ-algebra.

Definition 1.4.3

Earlier in this chapter we defined “function.” The way it was defined, it was a “point function” since values were assigned to each point of a set. A set functionF assigns values to sets or regions of the space.

Definition 1.4.4

A measure,μ, on a set is a set function that assigns a real number to each subset of , (which intuitively determines the size of the set ) such that:

i.μ(ϕ) = 0, where ϕ is the empty set,
ii.μ(A) ≥ 0, , that is, nonnegative, and
iii. if is a finite or countable sequence of mutually disjoint sets in , then , the countably additive axiom, where is the set of integers.
Notes:
1. What the third axiom says is that the measure of a “large” subset (the union of subsets Ai's) that can be partitioned into a finite (or countable) number of “smaller” disjoint subsets is the sum of the measures of the “smaller” subsets.
2. Generally speaking, if we were to associate a size to each subset of a given set so that we are consistent yet satisfying the other axioms of a measure, only trivial examples like the counting measure would be available. To remove this barrier, a measure would be defined only on a subcollection of all subsets, the so called measurablesubsets, which are required to form a σ-algebra. In other words, elements of the σ-algebra are called measurable sets. This means that countable unions, countable intersections, and complements of measurable subsets are measurable.
3. Existence of a nonmeasurable set involves axiom of choice.
4. Main applications of measures are in the foundations of the Lebesgue integral, in Kolmogorov's axiomatization of probability theory, including ergodic theory. (Andrey Kolmogorov was a Russian mathematician who was a pioneer of probability theory.)
5. It can be proven that a measure is monotone, that is, if A is a subset of B, then the measure of A is less than or equal to the measure of B.

Let us now consider the following example.

Example 1.4.2

Consider the life span of a patient with cancer who is under treatment. Hence, the duration of the patient's life is a positive real number. This number is actually an outcome of our treatment (experiment). Let us denote this outcome by ω. Thus, the sample space is the set of all real numbers (of course, in reality, truncated positive real line). Now, we could include the nonpositive part of the real line to our sample space as long as probabilities assigned to them are zeros. Thus, the sample space would become just the real line. While the treatment goes on, we might ask, what is the probability that the patient dies before a preassigned time, say ωt?

It might also be of interest to know the time interval, say , in which the dose level of a medicine needs to show a reaction.

To answer questions of these types, we would have to consider intervals (−∞, ωt) and as events. Thus, we need to consider a family of sets, called Borel sets.

Definition 1.4.5

Let Ω be a sample space. A family (or a collection) of subsets of Ω satisfies the following axioms:

Axiom B1.,
Axiom B2. If , then , that is, is closed under complement, and
Axiom B3. If is a finite or countable family of subsets of Ω in , then also , that is, is closed under the union of at most countable many of its members, called the class of events. The class of events satisfying Axioms B1–B3 is called a Borel field or a σ-field (reads as sigma-field, which is a σ-algebra).
Example 1.4.3

The family of events satisfies axioms B1–B3 stated in Definition 1.4.5 and hence is a Borel field. In fact, this is the smallest family that satisfies the axioms. This family is called the trivial Borel field.

Example 1.4.4

Let A be a nonempty set such that ϕ ⊂ A ⊂ Ω. Thus, {ϕ, A, Ac, Ω} is the smallest Borel field that contains A.

Definition 1.4.6

The smallest Borel field of subsets of the real line that contains all intervals (−∞, ω) is called Borel sets.

Notes:
1. A subset of the real line is a Borel set if and only if it belongs to the Borel field mentioned in Definition 1.4.5.
2. Since intersection of σ-algebras is again a σ-algebra, the Borel sets are intersection of all σ-algebras containing the collection of open sets in .
3. We may define Borel sets as follows: let be a topological space. The smallest σ-algebra containing the open sets in is then called the collection of Borel sets0 in X, if such a collection exists.
Definition 1.4.7

Let X be a set and be a Borel set. The ordered pair is then called a measurable space.

Definition 1.4.8

Let Ω be a sample space. Also let P(·) be a nonnegative function defined on a Borel set . The function P(·) is then called a probability measure, if and only if the following axioms M1–M3 are satisfied:

Axiom M1.P(Ω) = 1.
Axiom M2.P(A) ≥ 0, .
Axiom M3. If is a finite or countable sequence of mutually dis­joint sets (i.e., Ai ∩ Aj = ϕ, for each i ≠ j) in , then , the countably additive axiom.
Example 1.4.5

Let Ω be a countable set and the set of all subsets of Ω. Next, let , where p(ω) ≥ 0 and . The function P(·), then, is a probability measure.

Definition 1.4.9

The triplet , where Ω is a sample space, is a Borel set, and is a probability measure, is called the probability space.

Note that a probability space is a measure space with a probability measure.

Definition 1.4.10

A measure λ on the real line such that λ((a, b]) = b − a, ∀a < b, and , is called a Lebesgue measure.

Example 1.4.6

The Lebesgue measure of the interval [0, 1] is its length, that is, 1.

Note that a particularly important example is the Lebesgue measure on a Euclidean space, which assigns the conventional length, area, and volume of Euclidean geometry to suitable subsets of the n-dimensional Euclidean space .

We now return to the discussion of a continuous random variable.

If A1, A2, A3, … is a sequence of mutually exclusive events represented as intervals of and P(Ai), i = 1, 2, … , is the probability of the event Ai, i = 1, 2, … , then, by the third axiom of probability, A3, we will have:

(1.4.1)

For a random variable, X, defined on a continuous sample space, Ω, the probability associated with the sample points for which the values of X falls on the interval [a, b] is denoted by P(a ≤ X ≤ b).

Definition 1.4.11

Suppose the function f(x) is defined on the set of real numbers, , such that f(x) ≥ 0, for all real x, and . Then, f(x) is called a continuous probability density function (pdf) (or just density function) on and it is denoted by, fX(x). If X is a random variable that its probability is described by a continuous pdf as:

(1.4.2)

then X is called a continuous random variable. The probability distribution function of X, denoted by FX(x), is defined as:

(1.4.3)

Notes:
1. There is a significant difference between discrete pdf and continuous pdf. For a discrete pdf, fX = P(X = x) is a probability, while with a con­tinuous pdf, fX(x) is not a probability. The best we can say is that fX(x)dx ≈ P(x ≤ X ≤ x + dx) for all infinitesimally small dx.
2. If there is no fear of confusion, we will suppress the subscript “X” from fX(x) and FX(x) and write f(x) and F(x), respectively.

As it can be seen from (1.4.3), distribution function can be described as the area under the graph of the density function.

Note from (1.4.2) and (1.4.3) that if a  b  x, then:

(1.4.4)

What (1.4.4) says is that if X is a continuous random variable, then the probability of any given point is zero. That is, for a continuous random variable to have a positive probability, we have to choose an interval.

Notes:
(a) From (1.4.4), we will have:

(1.4.5)

(b) Since by the fundamental theorem of integral calculus, fx(x) = dFx(x)/dx, the density function of a continuous random variable can be obtained as the derivative of the distribution function, that is, . Conversely, the cumulative distribution function can be recovered from the probability density function with (1.4.3).
(c)FX(x) collects all the probabilities of values of X up to and including x. Thus, it is the cumulative distribution function (cdf) of X.
(d) For x1 < x, the intervals (−∞, x1] and (x1, x] are disjoint and their union is (−∞, x]. Hence, from (1.4.2) and (1.4.3), if a ≤ b, then:

(1.4.6)

(e) It can easily be verified that FX(−∞) = 0 and FX(∞) = 0 (why?).
(f) As it can be seen from (1.4.3), the concept and definition of cdf applies to both discrete and continuous random variables. If the random variable is discrete, FX(x) is the sum of fx's. However, if the random variable is continuous, then the sum becomes a limit and eventually an integral of the density function. The most obvious difference between the cdf for continuous and discrete random variables is that FX is a continuous function if X is continuous, while it is a step function if X is discrete.

1.5. Moments of a Continuous Random Variable

As part of properties of continuous random variables, we now discuss continuous moments. Before doing that, we note that in an integral when the variable of integration, X, is replaced with a function, say F(x), the integral becomes a Stieltjes integral that looks as , found by Stieltjes in late nineteenth century. If F(x) is a continuous function and its derivative is denoted by f(x), then the Lebesgue–Stieltjes integral becomes . Henri Lebesgue was a French mathematician (1875–1941) and Thomas Joannes Stieltjes was a Dutch astronomer and mathematician (1856–1899).

Definition 1.5.1

Let X be a continuous random variable defined on the probability space with pdf fX(x). The mathematical expectation or simply expectation of X, or expected value of X, or the mean of X or the first moment of X, denoted by E(X), is then defined as the Lebesgue–Stieltjes integral:

(1.5.1)

provided that the integral exists. Formula (1.5.1), for a case for an arbitrary (continuous measurable) function of X, say g(X), where X is a bounded random variable with continuous pdf fX(x), will be:

(1.5.2)

provided that the integral converges absolutely.

Definition 1.5.2

The kth moments of the continuous random variable X with pdf fX(x), denoted by E[Xk], k = 1, 2, … , is defined as:

(1.5.3)

provided that the integral exists, that is,

(1.5.4)

In other words, from (1.5.3) and (1.5.4), the kth of X exists if and only if the kth absolute moment of X, E(|X|k), is finite.

Notes:
(1) It can be shown that if the kth, k = 1, 2, … moment of a random variable (discrete or continuous) exists, then do all the lower order moments.
(2) Among standard known continuous distributions, Cauchy probability distribution with density function:

(1.5.5)

is the only one that its kth, k = 1, 2, … , moments do not exist for even values of k and exist for odd values of k in the sense that the Cauchy principal values of the integral exist and are equal to zero.

Several examples will be given in the next section.

1.6. Continuous Probability Distribution Functions

As in the discrete case, we now list a selected number of continuous probability distributions that we may be using in this book.

Definition 1.6.1

A continuous random variable X that has the probability density function:

(1.6.1)

has a uniform distribution over an interval [a, b].

It is left for the reader to show that (1.6.1) defines a probability density function and that the uniform distribution function of X is given by:

(1.6.2)

Note that since the graphs of the uniform density and distribution functions have rectangular shapes, they are sometimes referred to as the rectangular densityfunctions and rectangular distribution functions, respectively.

Example 1.6.1

Suppose X is distributed uniformly over [0, 10]. We want to find P(3 < X ≤ 7). Thus:

Example 1.6.2

Suppose a counter registers events according to a Poisson distribution with rate 4. The counter begins at 8:00 a.m. and registers 1 event in 30 minutes. What is the probability that the event occurred by 8:20 a.m.?

Answer

We will restate the problem symbolically and then substitute the values of the parameters to answer the question. We have a Poisson distribution with rate λ and registration rate of 1 per τ minutes. We are to find the pdf of the occurrence at time t. Hence, we let N(t) be the number of events registered from start to time t. We also let T1 be the time of occurrence of the first event. Next, using properties of the conditional probability and the Poisson distribution, the cdf is:

Therefore, T1 is a uniform random variable in (0, τ). Hence, the first count happened before 8:20 a.m. with probability (20/30) = 66.67%.

Definition 1.6.2

A continuous random variable X with pdf

(1.6.3)

and cdf

(1.6.4)

is called a negative exponential (or exponential) random variable. Relation (1.6.3) and relation (1.6.4) are called exponential density function and exponential distribution function, respectively. μ is called the parameter for the pdf and cdf. See Figure 1.6.1 and Figure 1.6.2.

We note that the expected value of exponential distribution is the reciprocal of its parameter. This is because from (1.6.3) we have:

See also Figure 1.6.1 and Figure 1.6.2.

Figure 1.6.1. Exponential pdf with different values for its parameters.

Figure 1.6.2. Exponential cdf as area under pdf with different intervals of t.

Example 1.6.3

Suppose it is known that the lifetime of a light bulb has an exponential distribution with parameter 1/250. We want to find the probability that the bulb works (a) for more than 300 hours and (b) for more than 300 if it has already worked 200 hours.

Answer

Let X be the random variable representing the lifetime of a bulb. From (1.6.3) and (1.6.4), we then have:

Therefore,

(a)P(X > 300) = e−1.2 = 0.3012, and
(b)

See Figure 1.6.3.

Figure 1.6.3. Exponential probability for x > 300.

Definition 1.6.3

A continuous random variable, X, with probability density function f(x) defined as:

(1.6.5)

where Γ(α) is defined by:

(1.6.6)

where μ is a positive number, is called a gamma random variable with parameters μ and α. The corresponding distribution called gamma distribution function will, therefore, be:

(1.6.7)

Definition 1.6.4

In (1.6.4), if μ is a nonnegative integer, say k, then the distribution obtained is called the Erlang distribution of order k, denoted by Ek(μ; x), that is,

(1.6.8)

The pdf in this case, denoted by ek(μ; x), will be:

(1.6.9)

Notes:
(a) We leave it as an exercise to show that fX(x; μ, α) given by (1.6.4), indeed, defines a probability density function.
(b) The parameter μ in (1.6.8) is called the scale parameter, since values other than 1 either stretch or compress the pdf in the x-direction.
(c) In (1.6.4), if α = 1, we will obtain the exponential density function with parameter μ defined by (1.6.4).
(d) Γ(α) is a positive function of α.
(e) If α is a natural number, say α = n, then we leave it as an exercise to show that:

(1.6.10)

where n! is defined by:

(1.6.11)

(f) We leave it as an exercise to show that from (1.6.5) and (1.6.10), one obtains:

(1.6.12)

(g) Because of (1.6.11), the gamma function defined in (1.6.5) is called the generalized factorial.
(h) We leave it as an exercise to show that using double integration and polar coordinates, we obtain:

(1.6.13)

(i) We leave it as an exercise to show that using (1.6.13), one can obtain the following:

(1.6.14)

(j) Using (1.6.4), we denote the integral , x > 0, by Γ(α, x), that is,

(1.6.15)

The integral in (1.6.15), which is not an elementary integral, is called the incomplete gamma function.
(k) The parameter α in (1.6.4) could be a complex number whose real part must be positive for the integral to converge. There are tables available for values of Γ(α, x), defined by (1.6.15). As x approaches infinity, the integral in (1.6.5) becomes the gamma function defined by (1.6.4).
(l) If k = 1, then (1.6.6) reduces to the exponential distribution function (1.6.2). In other words, exponential distribution function is a special case of gamma and Erlang distributions.
Definition 1.6.5

In (1.6.4), if α = r/2, where r is a positive integer, and if μ = 1/2, then the random variable X is called the chi-square random variable with r degrees of freedom, denoted by X2(r). The pdf and cdf in this case with shape parameter r are:

(1.6.16)

and

(1.6.17)

respectively, where Γ(r/2) is the gamma function with parameter r/2 defined in (1.6.4).

Notes:
(a) Due to the importance of X2 distribution, tables are available for values of the distribution function (1.6.17) for selected values of r and x.
(b) We leave as an exercise to show the following properties of X2 random variable: mean = r, and variance = 2r.

The next distribution is widely used in many areas of research where statistical analysis is being used, particularly in statistics.

Definition 1.6.6

A continuous random variable X with pdf denoted by f(x; μ, σ2) with two real parameters μ, −∞ < μ < ∞, and σ2, σ > 0, where:

(1.6.18)

is called a Gaussian or normal random variable.

The notation ∼ is used for distribution. The letter N and character Φ are used for normal cumulative distribution. Hence,

(1.6.19)

is to show that the random variable X has a normal cumulative probability distribution with parameters μ and σ2. The normal pdf has a bell-shaped curve and is symmetric about the line f(x) = μ (see Figure 1.6.4). The normal pdf is also asymptotic, that is, the tails of the curve from both sides get very close to the horizontal axis, but never touch it. Later, we will see that μ is the mean and σ2 is the variance of the normal distribution function. The smaller the value of variance is, the narrower the shape of the “bell” would be. That is, the data points are clustered around the mean (i.e., the peak).

Figure 1.6.4. Normal pdf with different values for its parameters.

We leave it as an exercise to show that f(x; μ, σ2), defined in (1.6.18), is indeed a pdf.

Definition 1.6.7

A continuous random variable Z with μ = 0 and σ2 = 1 is called a standard normal random variable. From (1.6.19), the cdf of Z, P(Z ≤ z), is denoted by Φ(z). The notation N(0, 1) or Φ(0, 1) is used to show that a random variable has a standard normal distribution function, which means it has the parameters 0 and 1. The pdf of Z, denoted by ϕ(z), therefore, will be:

(1.6.20)

Note that any normally distributed random variable X with parameters μ and σ > 0 can be standardized using a substitution:

(1.6.21)

The cdf of Z is:

(1.6.22)

which is the integral of the pdf defined in (1.6.20). We leave it as an exercise to show that ϕ(x) defined in (1.6.20) is a pdf.

Note that the cumulative distribution function of a normal random variable with parameters μ and σ2, F(x; μ, σ2), whose pdf was given in (1.6.18), may be obtained by (1.6.22) as:

(1.6.23)

A practical way of finding normal probabilities is to find the value of z from (1.6.22), and then use available tables for values of the area under the curve of ϕ(x).

Figure 1.6.4 shows a graph of normal pdf for different values of its parameters. It shows the standard normal as well as the shifted mean and variety of values for standard deviation. Figure 1.6.5 shows a graph of normal cdf as area under pdf curve with different intervals.

Figure 1.6.5. Normal cdf as area under pdf curve with different intervals.

Definition 1.6.8

A continuous random variable X, on the positive real line, is called a Galton or lognormal