126,99 €
Phase type distributions are widely applicable modeling and statistical tools for non-negative random quantities. They are built on Markov chains, which provide a simple, intuitive stochastic interpretation for their use.
Phase Type Distribution starts from the Markov chain-based definition of phase type distributions and presents many interesting properties, which follow from the basic definition.
As a general family of non-negative distributions with nice analytical properties, phase type distributions can be used for approximating experimental distributions by fitting or by moments matching; and, for discrete event simulation of real word systems with stochastic timing, such as production systems, service operations, communication networks, etc. This book summarizes the up-to-date fitting, matching and simulation methods, and presents the limits of flexibility of phase type distributions of a given order.
Additionally, this book lists numerical examples that support the intuitive understanding of the analytical descriptions and software tools that handle phase type distributions.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 355
Veröffentlichungsjahr: 2024
Stochastic Models in Computer Science and Telecommunication Networks Set
coordinated byBruno Sericola
Volume 2
András Horváth
Miklós Telek
First published 2024 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK
www.iste.co.uk
John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA
www.wiley.com
© ISTE Ltd 2024The rights of András Horváth and Miklós Telek to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2024941676
British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-84821-945-8
During the development of the theory of stochastic processes, many specific models got introduced for describing different real life phenomena. With respect to the contents of this book, the introduction of the Poisson process to model phone calls occurring in a period of time by A. K. Erlang played a seminal role [BRO 60a]. A. K. Erlang also recognized that the inter-arrival time of consecutive call arrivals is exponentially distributed.
Starting from the first specific stochastic processes, various generalizations were introduced to extend their modeling power. One of the simplest and analytically tractable general stochastic models, which got very popular in many application fields, is the class of Markov chains. In continuous time, a characteristic property of Markov chains is that, at any point in time, the next event occurs in an exponentially distributed amount of time, while in discrete time, the next event occurs in a geometrically distributed number of steps. Both of these distributions enjoy the memoryless property which means that, at any point in time, the remaining time to the next event is independent of the time elapsed since the last one.
Markov chains are very efficient modeling tools for such real life processes where the inter-event times are exponentially (geometrically) distributed, and, as observed by A. K. Erlang in the case of phone call arrivals, many practically important real systems enjoy this property. Unfortunately, it does not apply to all real systems. In spite of this, due to the theoretical simplicity and computational tractability of Markov chains, in many cases, real life processes are modeled by Markov chains also when the inter-event times of the real process are significantly different from being exponentially distributed. One reason for this is that the stochastic behavior of systems with non-exponentially (non-geometrically) distributed inter-event times is often too complex to analyze.
In order to have models that provide both the analytical tractability of Markov chains and the possibility to describe non-exponentially distributed inter-event times, simple stochastic models have been introduced using the exponential distribution as a building block. It was recognized already by A. K. Erlang [BRO 60b] that the sum of two independent exponentially distributed random variables has a distribution that is different from exponential. A generalization of this observation leads to apply serial and parallel compositions of exponential random variables to approximate non-exponential inter-event time distributions. This approach is referred to as the method of phases in [KLE 75], where its applicability was questioned due to the lack of a general theorem which unifies the use of special (e.g. serial and parallel) structures.
The general theory of such distributions, which exploits the whole flexibility associated with a Markov chain of a finite number of states (referred to as phases), has been introduced by M. Neuts in [NEU 75], and the associated distribution got widely referred to as continuous phase type (continuous phase type (PH)) distributions.
The introduction of PH distributions resulted in a fertile research in the stochastic modeling community. Many essential questions have been investigated in order to understand the applicability of PH distributions regarding both:
basic properties of PH distributions, such as,
- what is the density function and the moments of a PH distribution;
- whether the Markov chain based description of a PH distribution is unique;
- how many parameters identify a PH distribution of n phases;
their use for approximation of a general non-negative distribution, for example,
- are there efficient procedures to find a PH distribution that closely approximates the general one;
- what are the appropriate measures of “goodness” of the approximation.
The complete summary of the literature on PH distributions in a single book is an infeasible task. In this book, we provide a large portion of the obtained results focusing mostly on the application of PH distributions in approximating general non-negative distributions. To this end, we collect qualitative properties of PH distributions that imply limitations in distribution fitting. For example, we present bounds on the moments of PH distributions, the behavior of the tail of PH distributions, etc. These qualitative properties help to set proper expectations about the quality of distribution approximation. For example, if the moments of a distribution are out of the moment bounds of the considered class of PH distributions, then accurate distribution approximation is infeasible even with the best approximation procedure.
We discuss two distribution approximation approaches which are referred to as matching and fitting. Matching takes a set of characterizing parameters (e.g. a set of moments) and composes a PH distribution with exactly the same parameters, if possible. If the selected parameters properly captures the distribution, then the approximation has good precision but it is still not exact in general.
Fitting is instead an optimization problem. It is based on a distance measure between two distributions and applies an optimization procedure to set the parameters of the PH distribution such that the distance measure is minimized.
Apart of these main lines of the book, we point out to various extensions which improve the applicability of PH distributions in different fields. As an example, we discuss the efficient simulation of PH distributions.
Besides the favorable properties of PH distributions in distribution approximation, another essential ingredient of their success and popularity is a set of efficient computational procedures to evaluate the behavior of stochastic models which contain PH distributions. A part of these procedures is referred to as matrix analytic methods [NEU 81c]. The treatment of these methods is out of the scope of this book.
After the introduction, in Chapter 1, we collect most of the mathematical background which we use in later chapters. Readers who are familiar with the analysis of Markov chains might skip this chapter and return only when a piece of background information becomes of interest where it is used.
Chapters 2 and 3 introduce continuous time and discrete time PH distributions, respectively, and investigate their basic properties, like the distribution function, moments, etc. Additionally, many other useful properties are studied that characterize the flexibility of these classes of PH distributions.
Chapter 4 gives an outlook toward more general classes of distributions that have similar applicability as PH distributions but lack a stochastic interpretation based on the associated Markov chains.
The application of PH distributions started with special combinations of exponentially distributed random variables. Still today, certain subclasses of PH distributions play an important role in many applications. Chapter 5 summarizes the most frequently used subclasses and their properties.
Chapters 6 and 7 address the approximation of positive distributions with PH distributions. The first one discusses the matching approaches while the second one discusses the fitting methods and their respective properties.
When PH distributions are used in complex stochastic models whose complexity require discrete event simulation-based evaluation, the efficient simulation of PH distributions is of essential importance. Chapter 8 is devoted to this subject.
The analytical tractability and flexibility of PH distributions resulted in a line of research where various functions of PH-distributed random variables are applied to approximate distributions with special properties. The final chapter, Chapter 9, summarizes the related results.
Along the text, many numerical examples and related figures help the intuitive understanding of the presented materials. For numerical experimentation with PH distributions, readers can use many program packages that deals with PH distributions in general, for example, BuTools [BUT 15], SMCSolver [BIN 06] and KPC-Toolbox [CAS 08]. There are also a large number of more specific software tools to perform PH distribution fitting tasks, for example, HyperStar [REI 13] and PH-fit [HOR 02]. A short description of some tools is given in Appendix 1.
The previous literature on PH distributions also contains excellent textbooks that overview many properties of PH distributions. Some of these textbooks restrict their attention to a specific PH distribution related matters, for example, [BLA 17] which deals with matrix exponential distributions, and many other textbooks, in contrast to the present one, consider also the stochastic processes built from PH distributions (e.g. [NEU 81c], [BUC 14], [HE 14], [LAK 19]).
The basic properties of PH distributions are well summarized in each of these textbooks and we provide them also here for the sake of completeness. For what concerns the more involved properties, we present many specific features of PH distributions which have not been described in previous textbooks, for example, the eigenvalue structure in sections 2.8 and 3.7, the steepest increase property in section 2.9 and the properties of moments in Chapter 6.
In this section, we collect some basic mathematical concepts. We do not intend to provide an exhaustive description of these concepts but to collect only the properties which we will use later on.
If X is a random variable (RV) on a subset of , then its cumulative distribution function (CDF) is defined as FX(x) = P(X < x). FX(x) is a monotone increasing function.
If FX(x) is continuous, then X is said to be a continuous RV. A continuous RV is characterized by its probability density function (PDF) which is defined as:
Further properties of a continuous RV are:
where the squared coefficient of variation (SCV) is a normalized version of the variance and the remaining lifetime distribution describes the distribution of X − τ (how much X is greater than τ) if X is larger than τ.
If FX(x) is piecewise constant with positive jumps between the constant intervals, then X is said to be a discrete RV. A discrete random variable is characterized by its probability mass function (PMF), which describes the points, xi, where FX(x) has a jump and the probabilities, pi, which are the sizes of the jumps at xi. That is P(X = xi) = pi. Based on the PMF, the moments of X are obtained as:
The variance and the SCV are obtained from mk in the same way as for continuous RVs.
If FX(x) is neither continuous nor piecewise constant, then X is said to be a mixed RV and its analysis is based on its CDF.
For a (non-defective) random variable, m0 = 1. Besides the mean m1, the most often used parameters related to the moments are the variance and the SCV [1.1].
In this work, we use several variants of the moments for notational convenience:
the central moments
the factorial moments
the reduced moments
the normalized moments
There are some essential relations among moments [STI 94]. For any random variable:
for k ≥ 0. The matrix composed of the moments according to [1.6] is referred to as the Hankel matrix. For k = 1, we have , that is, the variance is non-negative. Also, for any non-negative random variable:
for k ≥ 1.
The Laplace transform (LT) of function f is defined as:
If, then the integral in[1.8]converges for(wheredenotes the real part of s) because for, we have:
The LT of a non-negative RV X is defined based on its PDF and denoted as:
The LT can be efficiently used in the analysis of non-negative RVs, for example:
the
n
th moment of
X
can be obtained as because:
when
X
and
Y
are independent RVs the LT of
X
+
Y
is:
The z transform1 of the sequence {a0, a1, …} is defined as:
If is finite, then the summation in [1.12] converges for |z| ≤ 1.
If T is a RV with support on the set of natural numbers with PMF ak = P(X = k), then its z transform is defined as:
Analogously to the LT, the z transform can be used for the analysis of discrete RVs:
the
n
th factorial moment of
T
can be computed as:
when
X
and
Y
are independent RVs the
z
transform of
Z
=
X
+
Y
is:
Matrix functions of quadratic matrices are defined by the series expansion of the function. If Q is a quadratic matrix and F is an analytic function with series expansion , then:
The most often used matrix function eQt is defined through its series expansion as:
The matrix inverse of quadratic matrices can be introduced and computed in several different ways. We use the relation that element (i, j) of the inverse of matrix M is:
where det(M) is the determinant of matrix M and detij(M) denotes the determinant of the matrix obtained by removing row i and column j from matrix M multiplied by (−1)i+j.
The eigenvalues of a real quadratic matrix Q of size n × n are the roots of the degree n characteristic polynomial CQ(x) = det(xI − Q) where I is the identity matrix.
For matrixMof size n × n, row vectorβ = (b0b1 … bn−1) and column vectoren = (0 0 … 0 1)Tof size n, we have:
where matrixMis:
Proof. The characteristic polynomial of matrix M is CM(x) = det(xI − M) = xn + an−1xn−1 + … + a1x + a0.
Applying [1.15] for element (i, n) of (xI – M)−1 for 1 ≤ i ≤ n – 1, we have:
where A of size (i – 1) × (i – 1) and B of size (n – i) × (n – i) are:
Finally, multiplying (xI – M)−1 by the row vector β from the left and by the column vector en from the right, we obtain the statement of the lemma.
For a real quadratic matrix Q of size n × n, the spectral decomposition (also referred to as the Jordan decomposition) is a factorization:
where Ʌ is either a diagonal matrix with potentially complex diagonal elements containing the eigenvalues (in this case Q is referred to as diagonalizable), or Ʌ is a block diagonal matrix, referred to as the Jordan matrix, in the form:
whose diagonal blocks are called Jordan blocks. The size u × u Jordan block associated with eigenvalue λ is:
If all eigenvalues of a real valued square matrix are non-zero, then the matrix is referred to as non-singular or full rank and its inverse exists.
If Q is diagonalizable, then:
where λℓ is the ℓth diagonal element of Ʌ referred to as its ℓth eigenvalue, uℓ is the ℓth column of B−1 referred to as its ℓth right eigenvector, and υℓ is the ℓth row of B referred to as its ℓth left eigenvector. The spectral decomposition is not unique, for example, a permutation of λℓ, uℓ, υℓ results in the same matrix Q. Also the multiplication of uℓ and the division of υℓ by the same non-zero constant leaves the results invariant. When Q is real, the complex eigenvalues appear in complex conjugate pairs and the associated left and right eigenvectors are complex conjugates as well. The eigenvectors associated with real eigenvalues are real (or can be transformed to be real by the multiplication of uℓ and the division of υℓ by the same non-zero constant).
From BB−1 = I, it also follows for the rows of B and the columns of B−1 that υkuℓ = δkℓ where δkℓ is the Kronecker delta (δkℓ = 1 if k = ℓ and zero otherwise). This implies that the eigenvectors associated with different eigenvalues are orthogonal.
The representation of a diagonalizable matrix is also referred to as dyad decomposition because uℓυℓ is a dyadic matrix. The matrix functions of diagonalizable matrices are easy to obtain via dyad decomposition as:
It is because, assuming is the series expansion of F(x), we can write:
If Q is non-diagonalizable, then the dyad decomposition cannot be used directly. In those cases, the powers of the Jordan blocks play a role in the analysis of matrix functions. For example, the kth power of a Jordan block with eigenvalue λ is:
where (k)+ denotes max(k, 0).
The following theorems provide relations between matrices with identical Jordan matrices or matrix blocks.
If the Jordan matrix of matricesRandS(both of size n × n) are identical, that is, with appropriately ordered eigenvalues the Jordan decomposition ofRandSareR = G−1ɅGandS = H−1ɅH, then there exists a non-singular matrixWsuch thatR = W−1SW.
Proof. With W = H−1G, we have:
If the Jordan matrix of matrixRof size n × n isɅRand with appropriately ordered eigenvalues, the Jordan matrix of matrixSof size m × m with m > n is:
then there exists a matrixWof size n × m such thatR W = W S.
Proof. The block structure of the Jordan decomposition of S is:
where U1 is of size m × n and the rest of the block dimensions follow. From HH−1 = I, we have V1U1 = I and V1U2 = 0. Using this block structure, we have:
and
That is, W = G−1V1.
In the case of diagonalizable matrices, theorem 1.3 implies that the eigenvalues of matrix R and matrix S are identical, while theorem 1.4 means that the eigenvalues of matrix R is a subset of the eigenvalues of matrix S.
The Jordan decompositionQ = B−1ɅBis referred to as normalized if, wheredenotes the column vector of ones.
Not all real matrices have a normalized Jordan decomposition but the generator matrices associated with PH distributions have normalized Jordan decomposition as shown in [TEL 07, theorem 5]. We note that multiplying with B−1 from the left implies .
If the Jordan decomposition ofRandSare normalized in theorem 1.3, then the transformation matrix provided in the proof of the theorem satisfies.
Proof. If the Jordan decomposition of R and S are normalized, then , and consequently:
If the Jordan decomposition ofRandSare normalized in theorem 1.4, then the transformation matrix satisfies.
Proof. If the Jordan decomposition of R and S are normalized, then , , and consequently . Using it, we have:
If matrices R and S satisfy the condition of theorem 1.4, then the proof of the theorem provides a constructive way to compute W using the Jordan decomposition of R and S, which is a computationally sensitive task. If the Jordan decomposition of R and S can be normalized, then W can also be obtained by solving the linear system of equations RW = W S and .
Let p(t) = (p1(t) … pn(t)) be a row vector function of size n and Q be a matrix of size n × n. We often make use of the ordinary differential equation (ODE):
with initial condition p(0) = π. The scalar version of this ODE is:
which indicates that the derivative of the vector function is taken element-wise.
The solution of[1.20]is:
Proof. To show that [1.21] satisfies [1.20], we write:
The derivation in the proof also indicates that eQt and Q commute.
If Q is diagonalizable with dyad decomposition , then:
If an eigenvalue has positive real part, then the related function diverges as t tends to infinity. If all eigenvalues have negative real parts, then all functions converge to zero as t tends to infinity. If all eigenvalues have negative real parts except one which is zero, then all functions converge to zero except the one with zero eigenvalue which is constant.
As a result, if all eigenvalues of Q have negative real parts, then converges and:
which is easily obtained from the dyad decomposition if Q is diagonalizable. In general, it follows from:
If at least one eigenvalue of Q has non-negative real parts, then does not converge.
The exponential distribution plays an important role in the theory of Markov chains. If random variable X is exponentially distributed with parameter λ, then:
The fact that the remaining lifetime of an exponentially distributed RV is independent of τ and is identical with its original distribution is referred to as memoryless property. The constant hazard rate of exponential distribution is due to this memoryless property.
The Erlang distribution is closely related to the exponential distribution. The sum of n independent, identically distributed exponential random variables, , where Xi is exponentially distributed with parameter λ, is Erlang distributed with shape parameter n and rate parameter λ. We denote this distribution as Erlang(n, λ). If the rate is not defined, we denote the Erlang distribution with shape parameter n as Erlang(n).
The PDF, the LT and the moments of the Erlang(n, λ) distribution are, respectively:
where the LT can be obtained using [1.11].
The are several excellent textbooks introducing Markov chains (e.g. [TRI 01, BOL 06, LAK 19]). In this and the next sections, we summarize only those properties which are used later in this book for completeness.
Let Xk denote the state of a finite state discrete time Markov chain (DTMC) after k steps of the process where the state space is composed by n states, that is Xk ∈ {1, 2, …, n}. For i ∈ {1, 2, …, n}, we denote the state probabilities as pi(k) = Pr(Xk = i), which is the probability that the DTMC is in state i after k steps. The row vector of the state probabilities is p(k) = (p1(k) … pn(k)). The probability that the Markov chain moves from state i to state j in the kth step is denoted by Πij(k) where Πij(k) = Pr(Xk = j|Xk−1 = i). We are interested in Markov chains where this probability does not depend on the time, that is, the process is time inhomogeneous. In this case, Πij(k) = Πij for ∀k. The matrix composed of the state transition probabilities, Πij, is denoted by Π and referred to as the one-step state transition probability matrix. The transient behavior of the DTMC with transition probability matrix Π satisfies:
which is the law of total probability for the state of the process at time k – 1. For the row vector of the state probabilities, we can write:
from which
where p(0) = π denotes the initial distribution of the process.
The transition probability matrix Π and initial distribution π have the following properties. The elements of π and Π are probabilities, that is, non-negative reals not greater than one. Each row of Π and also π describes a probability distribution, that is, and . With the introduction of the column vector of ones, , these summations can be written in the following matrix and vector forms, and . From , we have that 1 is always an eigenvalue of Π (and is an associated right eigenvector). A matrix with these properties (non-negative with row sum equal to one) is also referred to as stochastic matrix. The eigenvalues of a stochastic matrix are on the closed unit disk of the complex plain, that is, the absolute value of any eigenvector is not greater than one.
A DTMC is said to be irreducible if any state j is reachable from any state i by a finite non-zero transition probability sequence through states k1, …, km−1. For an irreducible finite state DTMC, the multiplicity of the eigenvalue 1 is one.
The stationary distribution of an irreducible finite DTMC with generator Π, , if exists, can be computed as the unique solution of the linear system of equations:
where p is indeed the normalized left eigenvector associated with eigenvalue one. If limk→∞p(k) does not exist (due to some periodic behavior which results in eigenvalues on the unit cycle, e.g., −1), still the limit exists.
If a DTMC is not irreducible, then it is composed of a number of transient communicating blocks and an absorbing block [KEL 78]. A communicating block is a maximal subset of DTMC states in which any state is reachable from any other state of the subset. A transient communicating block is such that:
if the process leaves the block, it never returns to the block,
if the process is in the block, then the absorbing block is reached after a finite number of steps.
The absorbing block is such that once the process enters the block, it never leaves the block.
The return times of state i of the DTMC are the number of steps, k, for which Pr(Xk = i|X0 = i) > 0. Each communicating block has a cycle period. If the cycle period of a block is m