Spatial Econometrics using Microdata - Jean Dubé - E-Book

Spatial Econometrics using Microdata E-Book

Jean Dube

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book provides an introduction to spatial analyses concerning disaggregated (or micro) spatial data.

Particular emphasis is put on spatial data compilation and the structuring of the connections between the observations. Descriptive analysis methods of spatial data are presented in order to identify and measure the spatial, global and local dependency.

The authors then focus on autoregressive spatial models, to control the problem of spatial dependency between the residues of a basic linear statistical model, thereby contravening one of the basic hypotheses of the ordinary least squares approach.

This book is a popularized reference for students looking to work with spatialized data, but who do not have the advanced statistical theoretical basics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 287

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Acknowledgments

Preface

P.1. Introduction

P.2. Who is this work aimed at?

P.3. Structure of the book

1 Econometrics and Spatial Dimensions

1.1. Introduction

1.2. The types of data

1.3. Spatial econometrics

1.4. History of spatial econometrics

1.5. Conclusion

2 Structuring Spatial Relations

2.1. Introduction

2.2. The spatial representation of data

2.3. The distance matrix

2.4. Spatial weights matrices

2.5. Standardization of the spatial weights matrix

2.6. Some examples.

2.7. Advantages/disadvantages of micro-data

2.8. Conclusion

3 Spatial Autocorrelation

3.1. Introduction

3.2. Statistics of global spatial autocorrelation

3.3. Local spatial autocorrelation

3.4. Some numerical examples of the detection tests

3.5. Conclusion

4 Spatial Econometric Models

4.1. Introduction

4.2. Linear regression models

4.3. Link between spatial and temporal models

4.4. Spatial autocorrelation sources

4.5. Statistical tests

4.6. Conclusion

5 Spatio-Temporal Modeling

5.1. Introduction.

5.2. The impact of the two dimensions on the structure of the links: structuring of spatio-temporal links

5.3. Spatial representation of spatio-temporal data

5.4. Graphic representation of the spatial data generating processes pooled over time

5.5. Impacts on the shape of the weights matrix

5.6. The structuring of temporal links: a temporal weights matrix

5.7. Creation of spatio-temporal weights matrices

5.8. Applications of autocorrelation tests and of autoregressive models

5.9. Some spatio-temporal applications

5.10. Conclusion

Conclusion

C.1. A brief review

C.2. Opening

Glossary

Appendix

A.1. Chapter 2 appendix

A.2. Chapter 3 appendix

A.3. Chapter 4 appendix

A.4. Chapter 5 appendix

Bibliography

Index

To the memory of Gilles Dubé.

For Mélanie, Karine, Philippe, Vincent and Mathieu.

First published 2014 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2014The rights of Jean Dubé and Diègo Legros to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2014945534

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-84821-468-2

Acknowledgements

While producing a reference book does require a certain amount of time, it is also impossible without the support of partners. Without the help of the publisher, ISTE, it would have been impossible for us to share, on such a great scale, the fruit of our work and thoughts on spatial microdata.

Moreover, without the financial help of the Fonds de Recherche Québecois sur la Société et la Culture (FRQSC) and the Social Sciences and Humanities Research Council (SSHRC), the writing of this work would certainly not have been possible. Therefore, we thank these two financial partners.

The content of this work is largely the result of our thoughts and reflections on the processes that generate individual spatial data1 and the application of the various tests and models from the data available.

We thank the individuals who helped, whether closely or from afar, in the writing of this work by providing comments on some or all of the chapters: Nicolas Devaux (student in regional development), Cédric Brunelle (Professor at Memorial University), Sotirios Thanos (Reseracher Associate at University College London) and Philippe Trempe (masters student in regional development). Without the invaluable help of these people, the writing of this book would certainly have taken much longer and would have been far more difficult. Their comments helped us orientate the book towards an approach that would be more understandable by an audience that did not necessarily have a lot of experience in statistics.

1 Our first works largely focus on the data of real estate transactions: a data collection process that is neither strictly spatial, nor strictly temporal (see Chapter 5).

Preface

P.1. Introduction

Before even bringing up the main subject, it would seem important to define the breadth that we wish to give this book. The title itself is quite evocative: it is an introduction to spatial econometrics when data consist of individual spatial units. The stress is on microdata: observations that are points on a geographical projection rather than geometrical forms that describe the limits (whatever they may be) of a geographical zone. Therefore, we propose to cover the methods of detection and descriptive spatial analysis, and spatial and spatio-temporal modeling.

In no case do we wish this work to substitute important references in the domain such as Anselin [ANS 88], Anselin and Florax [ANS 95], LeSage [LES 99], or even the more recent reference in this domain: LeSage and Pace [LES 09]. We consider these references to be essential for anyone wishing to become invested in this domain.

The objective of the book is to make a link between existing quantitative approaches (correlation analysis, bivaried analysis and linear regression) and the manner in which we can generalize these approaches to cases where the available data for analysis have a spatial dimension. While equations are presented, our approach is largely based on the description of the intuition behind each of the equations. The mathematical language is vital in statistical and quantitative analyses. However, for many people, the acquisition of the knowledge necessary for a proper reading and understanding of the equations is often off-putting. For this reason, we try to establish the links between the intuition of the equations and the mathematical formalizations properly. In our opinion, too few introductory works place importance on this structure, which is nevertheless the cornerstone of quantitative analysis. After all, the goal of the quantitative approach is to provide a set of powerful tools that allow us to isolate some of the effects that we are looking to identify. However, the amplitude of these effects depends on the type of tool used to measure them.

The originality of the approach is, in our opinion, fourfold. First, the book presents simple fictional examples. These examples allow the readers to follow, for small samples, the detail of the calculations, for each of the steps of the construction of weighting matrices and descriptive statistics. The reader is also able to replicate the calculations in simple programs such as Excel, to make sure he/she understands all of the steps properly. In our opinion, this step allows non-specialist readers to integrate the particularities of the equations, the calculations and the spatial data.

Second, this book aims to make the link between summation writing (see double summation) of statistics (or models) and matrix writing. Many people will have difficulties matching the transition from one to the other. In this work, we present for some spatial indices the two writings, stressing the transition from one writing to the other. The understanding of matrix writing is important since it is more compact than summation writing and makes the mathematical expressions containing double summation, such as detection indices of spatial correlation patterns, easier to read; this is particularly useful in the construction of statistics used for spatial detection of local patterns. The use of matrix calculations and simple examples allow the reader to generalize the calculations to greater datasets, helping their understanding of spatial econometrics. The matrix form also makes the calculations directly transposable into specialized software (such as MatLab and Mata (Stata)) allowing us to carry out calculations without having to use previously written programs, at least for the construction of the spatial weighting matrices and for the calculation of spatial concentration indices. The presentation of matrix calculations step by step allows us to properly compute the calculation steps.

Third, in the appendix this work suggests programs that allow the simulation of spatial and spatio-temporal microdata. The programs then allow the transposing of the presentations of the chapters onto cases where the reality is known in advance. This approach, close to the Monte Carlo experiment, can be beneficial for some readers who would want to examine the behavior of test statistics as well as the behavior of estimators in some well-defined contexts. The advantages of this approach by simulation are numerous:

– it allows the intuitive establishment of the properties of statistical tools rather than a formal mathematical proof;
– it provides a better understanding of the data generating processes (DGP) and establishes links with the application of statistical models;
– it offers the possibility of testing the impact of omitting one dimension in particular (spatial or temporal) on the estimations and the results;
– it gives the reader the occasion to put into practice his/her own experiences, with some minor modifications.

Finally, the greatest particularity of this book is certainly the stress placed on the use of spatial microdata. Most of the works and applications in spatial econometrics rely on aggregate spatial data. This representation thus assumes that each observation takes the form of a polygon (a geometric shape) representing fixed limits of the geographical boundaries surrounding, for example, a country, a region, a town or a neighborhood. The data then represent an aggregate statistic of individual observations (average, median, proportion) rather than the detail of each of the individual observations. In our opinion, the applications relying on microdata are the future for not only putting into practice of spatial econometric methods, but also for a better understanding of several phenomena. Spatial microdata allow us to avoid the classical problem of the ecological error2 [ROB 50] as well as directly replying to several critics saying that spatial aggregate data does not allow capturing some details that are only observable at a microscale. Moreover, while not exempt from the modifiable area unit problem (MAUP)3 [ARB 01, OPE 79], they do at least present the advantage of explicitly allowing for the possibility of testing the effect of spatial aggregation on the results of the analyses.

Thus, this book acts as an intermediatiory for non-econometricians and non-statisticians to transition toward reference books in spatial econometrics. Therefore, the book is not a work of theoretical econometrics based on formal mathematical proofs4, but is rather an introductory document for spatial econometrics applied to microdata.

P.2. Who is this work aimed at?

Nevertheless, reading this book assumes a minimal amount of knowledge in statistics and econometrics. It does not require any particular knowledge of geographical information systems (GIS). Even if the work presents programs that allow for the simulation of data in the appendixes, it requires no particular experience or particular aptitudes in programming.

More particularly, this booked is addressed especially to master’s and PhD students in the domains linked to regional sciences and economic geography. As the domain of regional sciences is rather large and multidisciplinary, we want to provide some context to those who would like to get into spatial quantitative analysis and go a bit further on this adventure. In our opinion, the application of statistics and statistical models can no longer be done without understanding the spatial reality of the observations. The spatial aspect provides a wealth of information that needs to be considered during quantitative empirical analyses.

The books is also aimed at undergraduate and postgraduate students in economics who wish to introduce the spatial dimension into their analyses. We believe that this book provides excellent context before formally dealing with theoretical aspects of econometrics aiming to develop the estimators, show the proofs of convergence as well develop the detection tests according to the classical approaches (likelihood ratio (LR) test, Lagrange multiplier (LM) test and Wald tests).

We also aim to reach researchers who are not econometricians or statisticians, but wish to learn a bit about the logic and the methods that allow the detection of the presence of spatial autocorrelation as well as the methods for the correction of eventual problems occurring in the presence of autocorrelation.

P.3. Structure of the book

The books is split into six chapters that follow a precise logic. Chapter 1 proposes an introduction to spatial analysis related to disaggregated or individual data (spatial microdata). Particular attention is placed on the structure of spatial databases as well as their particularities. It shows why it is essential to take account of the spatial dimension in econometrics if the researcher has data that is geolocalized; it presents a brief history of the development of the branch of spatial econometrics since its formation.

Chapter 2 is definitely the central piece of the work and spatial econometrics. It serves as an opening for the other chapters, which use weights matrices in their calculations. Therefore, it is crucial and it is the reason for which particular emphasis is placed on it with many examples. A fictional example is developed and taken up again in Chapter 3 to demonstrate the calculation of the detection indices of the spatial autocorrelation patterns.

Chapter 3 presents the most commonly used measurements to detect the presence of spatial patterns in the distribution of a given variable. These measurements prove to be particularly crucial to verify the assumption of the absence of spatial correlation between the residuals or error terms of the regression model. The presence of a spatial autocorrelation violates one of the assumptions that ensures the consistency of the estimator of the ordinary least squares (OLS) and can modify the conclusions coming from the statistical model. The detection of such a spatial pattern requires the correction of the regression model and the use of spatial and spatio-temporal regression models. Obviously, the detection indices can also be used as descriptive tools and this chapter is largely based on this fact.

Chapters 4 and 5 present the autoregressive models used in spatial econometrics. The spatial autoregressive models (Chapter 4) can easily be transposed to spatio-temporal applications (Chapter 5) by developing an adapted weights matrix to the analyzed reality. A particular emphasis is put on the intuition behind the use of one type of model rather than another: this is the fundamental idea behind the DGP. In function of the postulated model, the consequences of the spatial relation detected between the residuals of the regression model can be more or less important, going from an imprecision in the calculation of the estimated variance, to a bias in the estimations of the parameters. The appendixes linked to Chapters 4 (spatial modeling) and 5 (spatio-temporal modeling) are based on the simulation of a given DGP and the estimation of autoregressive models from the weights matrices built previously (see Chapter 2).

Finally, the Conclusion is proposed, underlying the central role of the construction of the spatial weights matrix in spatial econometrics and the different possible paths allowing the transposition of existing techniques and methods to different definitions of the “distance”.

We hope that this overview of the foundations of spatial econometrics will spike the interest of certain students and researchers, and encourage them to use spatial econometric modeling with the goal of getting as much as possible out of their databases and inspire some of them to propose new original approaches that will complete the current methods developed. After all, the development of spatial methods notably allows the integration of notions of spatial proximity (and others). This aspect is particularly crucial for certain theoretical schools of thought linked to regional science and new geographical economics (NGE), largely inspired by the works of Krugman [FUJ 04, KRU 91a, KRU 91b, KRU 98], recipient of the 2008 Nobel prize in economics [BEH 09].

Figure P.1. Links between the chapters

Jean DUBÉand Diègo LEGROSAugust 2014

2 The ecological error problem comes from the transposition of conclusions made with aggregate spatial units to individual spatial units that make up the spatial aggregation.

3 The concept of MAUP was proposed by Openshaw and Taylor in 1979 to designate the influence of spatial cutting (scale and zonage effects) on the results of statistical processing or modeling.

4 Any reader interested in a more formal presentation of spatial econometrics is invited to consult the recent work by LeSage and Pace (2009) [LES 09] that is considered by some researchers as a reference that marks a “big step forward” in “for spatial econometrics” [ELH 10, p. 9].

1

Econometrics and Spatial Dimensions

1.1. Introduction

Does a region specializing in the extraction of natural resources register slower economic growth than other regions in the long term? Does industrial diversification affect the rhythm of growth in a region? Does the presence of a large company in an isolated region have a positive influence on the pay levels, compared to the presence of small-and medium-sized companies? Does the distance from highway access affect the value of a commercial/industrial/residential terrain? Does the presence of a public transport system affect the price of property? All these are interesting and relevant questions in regional science, but the answers to these are difficult to obtain without using appropriate tools. In any case, statistical modeling (econometric model) is inevitable in obtaining elements of these answers.

What is econometrics anyway? It is a domain of study that concerns the application of methods of statistical mathematics and statistical tools with the goal of inferring and testing theories using empirical measurements (data). Economic theory postulates hypotheses that allow the creation of propositions regarding the relations between various economic variables or indicators. However, these propositions are qualitative in nature and provide no information on the intensity of the links that they concern. The role of econometrics is to test these theories and provide numbered estimations of these relations. To summarize, econometrics, it is the statistical branch of economics: it seeks to quantify the relations between variables using statistical models.

For some, the creation of models is not satisfactory in that they do not take into account the entirety of the complex relations of reality. However, this is precisely one of the goals of models: to formulate in a simple manner the relations that we wish to formalize and analyze. Social phenomena are often complex and the human mind cannot process them in their totality. Thus, the model can then be used to create a summary of reality, allowing us to study it in part. This particular form obviously does not consider all the characteristics of reality, but only those that appear to be linked to the object of the study and that are particularly important for the researcher. A model that is adapted to a certain study often becomes inadequate when the object of the study changes, even if this study concerns the same phenomenon.

We refer to a model in the sense of the mathematical formulation, designed to approximately reproduce the reality of a phenomenon, with the goal of reproducing its function. This simplification aims to facilitate the understanding of complex phenomena, as well as to predict certain behaviors using statistical inference. Mathematical models are, generally, used as part of a hypothetico-deductive process. One class of model is particularly useful in econometrics: these are statistical models. In these models, the question mainly revolves around the variability of a given phenomenon, the origin of which we are trying to understand (dependent variable) by relating it to other variables that we assume to be explicative (or causal) of the phenomenon in question.

Therefore, an econometric model involves the development of a statistical model to evaluate and test theories and relations and guide the evaluation of public policies1. Simply put, an econometric model formalizes the link between a variable of interest, written as y, as being dependent on a set of independent or explicative variables, written as x1, x2,…, xK, where K represents the total number of explicative variables (equation [1.1]). These explicative variables are then suspected as being at the origin of the variability of the dependent or endogenous variable:

[1.1]

We still need to be able to propose a form for the relation that links the variables, which means defining the form of the function f (•). We then talk of the choice of functional form. This choice must be made in accordance with the theoretical foundation of the phenomena that we are looking to explain. The researcher thus explicitly hypothesizes on the manner in which the variables are linked together. The researcher is said to be proposing a data generating process (DGP). He/she postulates a relation that links the selected variables without necessarily being sure that the postulated form is right. In fact, the validity of the statistical model relies largely on the DGP postulated. Thus, the estimated effects of the independent variables on the determination of the dependent variables arise largely from the postulated relation, which reinfirce the importance of the choice of the functional form. It is important to note that the functional form (or the type of relation) is not necessarily known with certitude during empirical analysis and that, as a result, the DGP is postulated: it is the researcher who defines the form of the relations as a function of the a priori theoretical forms and the subject of interest.

Obviously, since all of the variables, which influence the behavior during the study, and the form of the relation are not always known, it is a common practice to include, in the statistical model, a term that captures this omission. The error of specification is usually designated by the term . Some basic assumptions are made on the behavior of the “residual” term (or error term). Violating these basic assumptions can lead to a variety of consequences, starting from imprecision in the measurement of variance, to bias (bad measurement) of the searched for effect.

[1.2]

The linear regression model allows us not only to know whether an explicative variable xk is statistically linked to the dependent variable (βk ≠ 0), but also to check if the two variables vary in the same direction (βk > 0) or in opposite directions (βk < 0). It also allows us to answer the question: “by how much does the variable of interest (explained variable) change when the independent variable (dependent variable) is modified?”. Herein also lies a large part of the goal of regression analysis: to study or simulate the effect of changes or movements of the independent variable on the behavior of the dependent variable (partial analysis). Therefore, the statistical model is a tool that allows us to empirically test certain hypotheses certain hypotheses as well as making inference from the results obtained.

The validity of the estimated parameters, and as a result, the validity of the statistical relation, as well as of the hypotheses tests from the model, rely on certain assumptions regarding the behavior of the error term. Thus, before going further into the analysis of the results of the econometric model it is strongly recommended to check if the following assumptions are respected:

– the expectation of error terms is zero: the assumed model is “true” on average:

[1.3]

– the variance of the disturbances is constant for each individual: disturbance homoskedasticity assumption:

[1.4]

– the disturbances of the model are independent (non-correlated) among themselves: the variable of interest is not influenced, or structured, by any other variables than the ones retained:

[1.5]

The first assumption is, by definition, globally respected when the model is estimated by the method of ordinary least squares (OLS). However, nothing indicates that, locally, this property is applicable: the errors can be positive (negative) on average for high (low) values of the dependent variable. This behavior usually marks a form of nonlinearity in the relation3. Certain simple approaches allow us to take into account the nonlinearity of the relation: the transformation of variables (logarithm, square root, etc.), the introduction of quadratic forms (x, x2, x3, etc.), the introduction of dummy variables and so on and so forth.

The second assumption concerns the calculation of the variance of the disturbances and the influence of the variance of the estimator of parameter β. Indeed, the application of common statistical tests largely relies on the estimated variance and when this value is not minimal, the measurement of the variance of parameter β is not correct and the application of classical hypothesis tests is not appropriate. It is then necessary to correct the problem of heteroskedasticity of the variance of the disturbances. The procedures to correct for the presence of heteroskedasticity are relatively simple and well documented.

The third assumption is more important: if it is violated, it can invalidate the results obtained. Depending on the form of the structure between the observations, it can have an influence on the estimation of the variance of parameters or even on the value of the estimated parameters. This latter consequence is heavier since it potentially invalidates all of the conclusions taken from the results obtained. Once again, to ensure an accurate interpretation of the results, the researcher must correct the problem of the correlation between the error terms. Here the procedures to correct for correlation among the error terms are more complex and largely depend on the type of data considered.

1.2. The types of data

The models used are largely linked to the structure and the characteristics of the data available for the analysis. However, the violation of one or several assumptions on the error terms is equally a function of the type of data used. Without a loss in generality, it is possible to identify three types of data: cross-sectional data, time series data and spatio-temporal data. The importance of the spatial dimension comes out particularly in the cross-sectional and spatio-temporal data.

The first essential step when working with a quantitative approach is to identify the type of data available to make the analyses. Not only do these data have particular characteristics in terms of violating the assumptions about the structure of the error terms, but they also influence the type of model that must be used. The type of model depends largely on the characterization of the dependent variables. Specific models are drawn for dummy variables (logit or probit models), for positive discrete (count) data (Poisson or negative binomial models), for truncated data (Heckman or Tobit models), etc. For the most part, the current demonstration will be focused on the models adapted to the case where the dependent variable is continuous (linear regression model).

1.2.1. Cross-sectional data

For this type of data, the sources of the variation are interobservations, i.e. between the observations. It is then possible that the variation of the dependent variable is linked to some characteristics that are unique to the individuals. In the case where we cannot identify the majority of the factors that influence the variation of the dependent variable, we are faced with a problem of non-homogeneous variance, or heteroskedasticity problem. This behavior violates the second assumption of the behavior of the error terms. The linear regression model must then be corrected so that the estimated variance respects the base assumption so that the usual tests have the correct interpretation.

The tests for the detection of heteroskedasticity that are the best known are certainly those by Breusch and Pagan [BRE 79] and White [WHI 80]. The former suggests verifying if there is a significant statistical relation between the error terms squared (an estimation of the variance) and the independent variables of the model. In the case where this relation proves to be significant, we say that the variance is not homogeneous and depends on certain values of the independent variables. The second test is based on a similar approach. The White test suggests regressing the error terms squared for the whole set of the independent variables of the model as well as the crossed terms and quadratic terms of the variables. This addition of the quadratic and crossed terms allows us to consider a certain form of nonlinearity in the explanation of the variance. As for the previous case, the tests aim to verify the existence of a significant relation between the variance of the model and some independent variables or more complex terms, in which we must reject the homogeneity hypothesis of the variance.

This type of data is largely used in microeconomics and in all the related domains. The spatial data are cross-sectional data but incorporating another particularity: the error terms can be correlated among themselves in space since they share common localization characteristics. This behavior is then in violation of the third assumption, linked to the independence between the error terms. This is the heart and foundation of spatial econometrics (we will come back to it a bit later).

1.2.2. Time series

With this type of data the variation studied is intra-observation, i.e. over time, but for a unique observation. This type of data is likely to reveal a correlation between the error terms over time and thus be in violation of the third base assumption on the behavior of error terms. We then talk of temporal autocorrelation. In this case, the parameters obtained can be biased and the conclusions that we draw from the model can be wrong. The problems of temporal correlation between the error terms have been known for several centuries.

The most commonly used test to detect such a phenomenon is the Durbin and Watson statistic [DUR 50]. This test is inspired by a measurement of the correlation between the value of the residuals taken at a period, t, and one taken at the previous period, t – 1. It aims to verify that the correlation is statistically significant, in which case we are in the presence of temporal (or serial) autocorrelation. Another simple test consists of regressing the values of the residuals of the model at the period t for the value of the previous period, t – 1, and look to determine if the parameter associated with the time-lagged variable of the residuals is significant4. The correction methods are also largely documented and usually available in most software.

Time series can also bring additional complications such as a changing variance (increasing or decreasing) over time. The problem of non-homogeneous variance over time is in violation of the second assumption for the behavior of error terms and the modeling methods therefore become more complex.

This type of data is especially used in macroeconomics and related domains: the indicators of a spatial entity are followed for a certain number of periods. The data regarding the market indices of a company or of a bond also represent good examples of time series.

As we will see later on, the approach for the modeling of spatial data is largely inspired by models in time series. In fact, there exists an important parallel between the problems encountered in the analysis of time series (or temporal data) and the problems encountered in the analysis of spatial data. We will come back to this in Chapter 4.

1.2.3. Spatio-temporal data

There are also data that possess the two characteristics: individuals that are observed over time. We then talk of spatio-temporal data. Without loss of generality, there exist two types of spatio-temporal data: panel data (or longitudinal data) and the cross-section pooled over time. In the first case, these are the same individuals that are observed at each (or nearly) time period, while in the second case, these are different individuals that are observed in each of the periods. The distinction is small, but real and important (we will come back to it in Chapter 5). In both cases, the notation relies on the introduction of two sub-indices: an index identifying the individual observation, i, and an index identifying the time period at which each of these observations are collected t. In the case of the panel, the subindices i are the same in each of the periods, while in the case of cross-sectional data pooled over time, these indices are different in each of the periods.

For this type of data, several problems are likely to arise: persistence of the behaviors over time, a non-homogeneous variance between the individuals and a correlation of the responses in space and time. The problems are potentially very important because the information contained in this data is a lot richer. This type of data is currently increasingly popular, notably because it enables not only the studying of variations between individuals and across time, but also the evolution of given individuals over time. This is certainly the data that provides the most information. Nevertheless, the introduction of the spatial dimension in this type of data is relatively new.

These types of data have recently captured a particular amount of attention and they are currently the object of numerous theoretical advances. Several pieces of software now allow the accurate modeling of this type of data. Spatio-temporal data is also likely to reveal several problems that invalidate the base postulations with regard to the behavior of error terms. There can exist not only a spatial correlation between the error terms but also a serial correlation. Moreover, the variance can depend on the relative situation in space, just like the behavior of the independent variable. Therefore, the richness of the source of the variation can result in several problems with the assumptions on the behavior of the residuals of the model, and that the use of appropriate models to take into account these phenomena is essential.