142,99 €
Since they were first formulated in 1972, generalized linear models have enjoyed a veritable boom, with numerous applications in insurance, economics and biostatistics. Today, they are still the subject of a great deal of research.
This book provides an overview of the theory of generalized linear models. Particular attention is paid to the problems of censoring, missing data and excess zeros. Didactic and accessible, Generalized Linear Models is illustrated with exercises and numerous R codes.
With all the necessary prerequisites introduced in a step-by-step fashion, this book is aimed at students (at master's or engineering school level), as well as teachers and practitioners of mathematics and statistical modeling.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 281
Veröffentlichungsjahr: 2025
To my parents, To my daughter Laura, To Marielle and Gaële
Series Editor Nikolaos Limnios
Jean-François Dupuy
First published 2025 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd 27-37 St George’s Road London SW19 4EU UK
www.iste.co.uk
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.wiley.com
© ISTE Ltd 2025 The rights of Jean-François Dupuy to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2025932421
British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-78630-702-6
Since their formulation by Nelder and Wedderburn in the Journal of the Royal Statistical Society: Series A in 1972, generalized linear models (GLMs) have become one of the cornerstones of statistical modeling. They have spawned – and continue to do so – an abundance of literature, powered by theoretical and methodological questions, or their links to applications. This work is dedicated to them. Of course, it does not have the aim of providing an exhaustive account of this literature. Several of the themes covered in the chapters deserve to be the subject of a work in their own right. This is the case for a question with missing data, or when a censored response needs to be taken into account. The aim of this work (and of the author) is therefore more modest, seeking to describe some problems that involve GLMs which have been recently studied (missing data, censored data and excess zeros) and to report, once again without claiming to be exhaustive, the solutions that arose with them.
The subjects covered in this work do not therefore cover the immense variety of contributions to the literature on GLMs. We will not find, for example, a chapter dedicated to questions regarding the validation of the models, or of variable selection in a high dimension. In fact, the choice of subjects covered involves some degree of subjectivity, and largely reflects the author’s centers of interest. Another imperative guided the writing of this work and the choice of its content: the methods described there can, for the most part, be implemented using dedicated functions, immediately available within packages of the statistical and data analysis software R (a free and open-source software). Certain methods require a little programming work, and R code examples are provided throughout the book.
Equally, note that a majority of the problems described here (such as the problems with missing or censored data) do not only arise in GLMs. We have therefore tried to give a sufficiently general description of the solutions proposed here, so that the reader will understand the main principles and can either apply – or adapt – them in other contexts. Finally, this work has been written in such a way that it can be approached by those with different levels of understanding. It is thus possible to comprehend the content without dwelling on the theoretical proofs which are, for the most part, provided in appendices in each chapter. Although these chapters are written such that they can be read (almost) independently of one another, it is recommended nevertheless to go through them in the order given in the table of contents, as it follows an increasing progression with regard to the difficulty of the methods described. A data set (available under R) serves as a common thread throughout the book, and it is described in section 2.5, which should be read before moving onto the other sections that use this set of data.
My warmest thanks go to Nikolaos Limnios, for his constant encouragement since my thesis, for inviting me to write this work and for the additions he has suggested.
I would also like to sincerely thank Ben Brown, who did a tremendous job translating this book from French to English.
Jean-François DupuyMarch 2025
Mean and variance of
X
cov(
X, Y
)
Covariance of X and
Y
Empirical mean of a sample
Y
1
, …,
Y
n
ℙ
n
Empirical measure
Empirical process
Convergence in distribution
Convergence in probability
Almost sure convergence
o
ℙ
,
O
ℙ
Stochastic-order symbols
Sets of the real numbers, positive reals, strictly positive reals
Binomial, Bernoulli, negative binomial distributions
Poisson and generalized Poisson distributions
Normal, Student’s
t
-, chi-squared, Fisher distributions
Gamma, exponential, inverse-Gaussian distributions
u
α
,
t
n
(
α
),
c
α
(
q
),
f
q,p
(
α
)
Quantiles of order
α
of the distributions
Φ
Cumulative distribution function of the distribution
1{·}
Indicator function
Transpose of the matrix (or vector)
A
AIC
Akaike’s information criterion
AIPW
Augmented inverse probability weighted
BIC
Bayesian information criterion
EM
Expectation-maximization (algorithm)
GLM
Generalized linear models
i.i.d.
Independent and identically distributed (random variables)
IPW
Inverse probability weighted
LSE
Least-squares estimator
MAR
Missing at random
MCAR
Missing completely at random
MLE
Maximum likelihood estimator
MNAR
Missing not at random
MZINB
Marginal zero-inflated negative binomial
MZIP
Marginal zero-inflated Poisson
RSS
Residual sum of squares
s.e.
Standard error
ZIB
Zero-inflated binomial
ZIGP
Zero-inflated generalized Poisson
ZINB
Zero-inflated negative binomial
ZIP
Zero-inflated Poisson