112,99 €
This book is aimed at students in communications and signal processing who want to extend their skills in the energy area. It describes power systems and why these backgrounds are so useful to smart grid, wireless communications being very different to traditional wireline communications.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 935
Veröffentlichungsjahr: 2017
Cover
Title Page
Preface
Acknowledgments
Some Notation
1 Introduction
1.1 Big Data: Basic Concepts
1.2 Data Mining with Big Data
1.3 A Mathematical Introduction to Big Data
1.4 A Mathematical Theory of Big Data
1.5 Smart Grid
1.6 Big Data and Smart Grid
1.7 Reading Guide
Bibliographical Remarks
Part I: Fundamentals of Big Data
2 The Mathematical Foundations of Big Data Systems
2.1 Big Data Analytics
2.2 Big Data: Sense, Collect, Store, and Analyze
2.3 Intelligent Algorithms
2.4 Signal Processing for Smart Grid
2.5 Monitoring and Optimization for Power Grids
2.6 Distributed Sensing and Measurement for Power Grids
2.7 Real‐time Analysis of Streaming Data
2.8 Salient Features of Big Data
2.9 Big Data for Quantum Systems
2.10 Big Data for Financial Systems
2.11 Big Data for Atmospheric Systems
2.12 Big Data for Sensing Networks
2.13 Big Data for Wireless Networks
2.14 Big Data for Transportation
Bibliographical Remarks
3 Large Random Matrices
3.1 Modeling of Large Dimensional Data as Random Matrices
3.2 A Brief of Random Matrix Theory
3.3 Change Point of Views: From Vectors to Measures
3.4 The Stieltjes Transform of Measures
3.5 A Fundamental Result: The Marchenko–Pastur Equation
3.6 Linear Eigenvalue Statistics and Limit Laws
3.7 Central Limit Theorem for Linear Eigenvalue Statistics
3.8 Central Limit Theorem for Random Matrix
3.9 Independence for Random Matrices
3.10 Matrix‐Valued Gaussian Distribution
3.11 Matrix‐Valued Wishart Distribution
3.12 Moment Method
3.13 Stieltjes Transform Method
3.14 Concentration of the Spectral Measure for Large Random Matrices
3.15 Future Directions
Bibliographical Remarks
4 Linear Spectral Statistics of the Sample Covariance Matrix
4.1 Linear Spectral Statistics
4.2 Generalized Marchenko–Pastur Distributions
4.3 Estimation of Spectral Density Functions
4.4 Limiting Spectral Distribution of Time Series
Bibliographical Remarks
5 Large Hermitian Random Matrices and Free Random Variables
5.1 Large Economic/Financial Systems
5.2 Matrix‐Valued Probability
5.3 Wishart‐Levy Free Stable Random Matrices
5.4 Basic Concepts for Free Random Variables
5.5 The Analytical Spectrum of the Wishart–Levy Random Matrix
5.6 Basic Properties of the Stieltjes Transform
5.7 Basic Theorems for the Stieltjes Transform
5.8 Free Probability for Hermitian Random Matrices
5.9 Random Vandermonde Matrix
5.10 Non‐Asymptotic Analysis of State Estimation
Bibliographical Remarks
6 Large Non‐Hermitian Random Matrices and Quatartenionic Free Probability Theory
6.1 Quatartenionic Free Probability Theory
6.2 R‐diagonal Matrices
6.3 The Sum of Non‐Hermitian Random Matrices
6.4 The Product of Non‐Hermitian Random Matrices
6.5 Singular Value Equivalent Models
6.6 The Power of the Non‐Hermitian Random Matrix
6.7 Power Series of Large Non‐Hermitian Random Matrices
6.8 Products of Random Ginibre Matrices
6.9 Products of Rectangular Gaussian Random Matrices
6.10 Product of Complex Wishart Matrices
6.11 Spectral Relations between Products and Powers
6.12 Products of Finite‐Size I.I.D. Gaussian Random Matrices
6.13 Lyapunov Exponents for Products of Complex Gaussian Random Matrices
6.14 Euclidean Random Matrices
6.15 Random Matrices with Independent Entries and the Circular Law
6.16 The Circular Law and Outliers
6.17 Random SVD, Single Ring Law, and Outliers
6.18 The Elliptic Law and Outliers
Bibliographical Remarks
7 The Mathematical Foundations of Data Collection
7.1 Architectures and Applications for Big Data
7.2 Covariance Matrix Estimation
7.3 Spectral Estimators for Large Random Matrices
7.4 Asymptotic Framework for Matrix Reconstruction
7.5 Optimum Shrinkage
7.6 A Shrinkage Approach to Large‐Scale Covariance Matrix Estimation
7.7 Eigenvectors of Large Sample Covariance Matrix Ensembles
7.8 A General Class of Random Matrices
Bibliographical Remarks
8 Matrix Hypothesis Testing using Large Random Matrices
8.1 Motivating Examples
8.2 Hypothesis Test of Two Alternative Random Matrices
8.3 Eigenvalue Bounds for Expectation and Variance
8.4 Concentration of Empirical Distribution Functions
8.5 Random Quadratic Forms
8.6 Log‐Determinant of Random Matrices
8.7 General MANOVA Matrices
8.8 Finite Rank Perturbations of Large Random Matrices
8.9 Hypothesis Tests for High‐Dimensional Datasets
8.10 Roy’s Largest Root Test
8.11 Optimal Tests of Hypotheses for Large Random Matrices
8.13 Hypothesis Testing for Matrix Elliptically Contoured Distributions
Bibliographical Remarks
Part II: Smart Grid
9 Applications and Requirements of Smart Grid
9.1 History
9.2 Concepts and Vision
9.3 Today’s Electric Grid
9.4 Future Smart Electrical Energy System
10 Technical Challenges for Smart Grid
10.1 The Conceptual Foundation of a Self‐Healing Power System
10.3 The Electric Power System as a Complex Adaptive System
10.4 Making the Power System a Self‐Healing Network Using Distributed Computer Agents
10.5 Distribution Grid
10.6 Cyber Security
10.7 Smart Metering Network
10.8 Communication Infrastructure for Smart Grid
10.9 Wireless Sensor Networks
Bibliographical Remarks
11 Big Data for Smart Grid
11.1 Power in Numbers: Big Data and Grid Infrastructure
11.2 Energy’s Internet: The Convergence of Big Data and the Cloud
11.3 Edge Analytics: Consumers, Electric Vehicles, and Distributed Generation
11.4 Crosscutting Themes: Big Data
11.5 Cloud Computing for Smart Grid
11.6 Data Storage, Data Access and Data Analysis
11.7 The State‐of‐the‐Art Processing Techniques of Big Data
11.8 Big Data Meets the Smart Electrical Grid
11.9 4Vs of Big Data: Volume, Variety, Value and Velocity
11.10 Cloud Computing for Big Data
11.11 Big Data for Smart Grid
11.12 Information Platforms for Smart Grid
Bibliographical Remarks
12 Grid Monitoring and State Estimation
12.1 Phase Measurement Unit
12.2 Optimal PMU Placement
12.3 State Estimation
12.4 Basics of State Estimation
12.5 Evolution of State Estimation
12.6 Static State Estimation
12.7 Forecasting‐Aided State Estimation
12.8 Phasor Measurement Units
12.9 Distributed System State Estimation
12.10 Event‐Triggered Approaches to State Estimation
12.11 Bad Data Detection
12.12 Improved Bad Data Detection
12.13 Cyber‐Attacks
12.14 Line Outage Detection
Bibliographical Remarks
13 False Data Injection Attacks against State Estimation
13.1 State Estimation
13.2 False Data Injection Attacks
13.3 MMSE State Estimation and Generalized Likelihood Ratio Test
13.4 Sparse Recovery from Nonlinear Measurements
13.5 Real‐Time Intrusion Detection
Bibliographical Remarks
14 Demand Response
14.1 Why Engage Demand?
14.2 Optimal Real‐time Pricing Algorithms
14.3 Transportation Electrification and Vehicle‐to‐Grid Applications
14.4 Grid Storage
Bibliographical Remarks
Part III: Communications and Sensing
15 Big Data for Communications
15.1 5G and Big Data
15.2 5G Wireless Communication Networks
15.3 Massive Multiple Input, Multiple Output
15.4 Free Probability for the Capacity of the Massive MIMO Channel
15.5 Spectral Sensing for Cognitive Radio
Bibliographical Remarks
16 Big Data for Sensing
16.1 Distributed Detection and Estimation
16.2 Euclidean Random Matrix
16.3 Decentralized Computing
Appendix A: Some Basic Results on Free Probability
A.1 Non‐Commutative Probability Spaces
A.2 Distributions
A.3 Asymptotic Freeness of Large Random Matrices
A.4 Limit Theorems
A.5
R
‐diagonal Random Variables
A.6 Brown Measure of
R
‐diagonal Random Variables
Appendix B: Matrix‐Valued Random Variables
B.1 Random Vectors and Random Matrices
B.2 Multivariate Normal Distribution
B.3 Wishart Distribution
B.4 Multivariate Linear Model
B.5 General Linear Hypothesis Testing
Bibliographical Remarks
References
Index
End User License Agreement
Chapter 01
Table 1.1 Comparison between classical, free, and quatartenionic free probability theories.
Table 1.2 Comparison of different entropy definitions.
Chapter 03
Table 3.1 An analogy between the quantum system and the big data measurement system.
Table 3.2 Generating the Gaussian random matrix
G
β
(
m
,
n
).
Table 3.3 Hermite and Laguerre ensembles.
Chapter 05
Table 5.1 Common random matrices and their moments (the entries of
W
are i.i.d. with zero mean and variance
;
W
is square
, unless otherwise specified.
).
Table 5.2 Definition of commonly encountered random matrices for convergence laws (the entries of
W
are i.i.d. with zero mean and variance
;
W
is square
, unless otherwise specified).
Table 5.3 Table of Stieltjes, R‐ and S‐ transforms (Table 5.2 lists the definitions of the matrix notations used in this table).
Chapter 06
Table 6.1 Comparison between classical, free, and quatartenionic free probability theories.
Table 6.2 A normalized Wishart‐like matrix is defined as
. Random matrix
Z
defined in (6.89) is constructed out of random unitary matrices
distributed according to the Haar measure and/or (independent) random Ginibre matrices
of a given size
N
. Asymptotic distribution
P
(
x
) of the density of a rescaled eigenvalue
of
ρ
for
is characterized by the singularity at 0, its support [
a
,
b
], the second moment
M
2
determining the average purity
and the mean entropy
, according to which the table is ordered. Taken from [309].
Chapter 09
Table 9.1 Domains in the smart grid conceptual model.
Table 9.2 The smart grid compared with the existing grid.
Table 9.3 Evolution of the power system from a static to a dynamic infrastructure.
Chapter 10
Table 10.1 A comparison of the protection systems, smart grid, and central control system.
Table 10.2 Smart grid communication technologies.
Chapter 01
Figure 1.1 Big data, big impact: new possibilities for international development.
Figure 1.2 A big data processing framework. The research challenges form a three‐tier structure and center around the “big data mining platform” (Tier I), which focuses on low‐level data accessing and computing. Challenges on information sharing and privacy, and Big Data application domains and knowledge form Tier II, which concentrates on high‐level semantics, application‐domain knowledge, and user privacy issues. The outmost circle shows Tier III challenges on actual mining algorithms.
Figure 1.3 The square kilometer array.
Figure 1.4 The eigenvalues of a single matrix drawn from the complex Ginibre ensemble of random matrices. The dashed line is the unit circle. This numerical experiment was performed using the promising Julia http://julialang.org/(accessed August 17, 2016).
Figure 1.5 Vision of a smart transmission grid.
Figure 1.6 Big data vision.
Chapter 02
Figure 2.1 Smoothed density of the eigenvalues of
C
, where the correlation matrix
C
is extracted from
assets of the S&P500 during the years 1991–1996. For comparison we have plotted the density (2.10) for
and
: this is the theoretical value obtained assuming that the matrix is purely random except for its highest eigenvalue (dotted line). A better fit can be obtained with a smaller value of
(solid line), corresponding to 74% of the total variance. Inset: same plot, but including the highest eigenvalue corresponding to the “market,” which is found to be 30 times greater than
b
.
Figure 2.2 Plots of
ρ
(
λ
) versus for different values of
and
.
Figure 2.3 Combined plot of
ρ
(
λ
) versus
λ
for
obtained analytically as well as numerically for independent, identically distributed random data sets.
Figure 2.4 Spectra of the covariance matrix
C
for the Student distribution (2.29) with
and
,
, for
, 2, 5, 20, and 100 (thin lines from solid to dotted), calculated using the formula (2.30) and compared to the uncorrelated Wishart (thick line). One sees that for
the spectra tend to the Wishart distribution.
Figure 2.5 Spectra of the empirical covariance matrix
S
calculated from (2.30) with
, compared to experimental data (stair lines) obtained by the Monte Carlo generation of finite matrices
.
Figure 2.6 Eigenvalue spacing distribution for the monthly mean sea‐level pressure (SLP) correlation matrix. The solid curve is the GOE prediction.
Figure 2.7 Eigenvalue spacing distribution for the monthly mean wind‐stress correlation matrix. The solid curve is the GUE prediction.
Figure 2.8 The ring law for the product of non‐Hermitian random matrix with white noise only. The number of random matrix
. The radii of the inner circle and the outer circle agree with (2.71).
Figure 2.9 The ring law for the product of non‐Hermitian random matrices with signal plus white noise. The number of random matrix
. The radius of the inner circle is less than that of the white‐noise‐only scenario.
Chapter 03
Figure 3.1 Plotted above is the distribution of the eigenvalues of
where
X
is an
random Gaussian matrix with
and
. The blue curve is the Marchenko–Pastur law with density function
f
MP
(
x
).
Figure 3.2 Plotted above is the distribution of the eigenvalues of
where
X
is an
random Gaussian matrix with
and
. The blue curve is the quarter circle law with density function.
Chapter 04
Figure 4.1 The curve of
(solid thin), and the sets B and supp
c
(
F
n
(
x
)) (solid thick) for
and
.
. In the figure the
is the
m
in our notation.
Figure 4.2 Spectral density curves for sample covariance matrices
,
;
.
X
ij
are i.i.d. standard Gaussian distribution with zero mean and variance 1, or
. In MATLAB:
X=randn(N,n);
Figure 4.3 The same as Figure 4.2 except
;
.
Figure 4.4 The kernel density estimation of
is compared with the Marchenko‐Pastur law.
,
,
, and
.
Figure 4.5 The kernel density estimation
f
n
(
x
) deviates from the limiting Marchenko–Pastur law
:
,
. The optimal bandwidth
. Here
points are plotted. The setting is the same as Figure 1 unless otherwise specified.
Figure 4.6 The same as Figure 4 except that
.
Chapter 05
Figure 5.1 The figures represent spectra of the eigenvalue distributions
ρ
(
λ
) of the experimental correlation matrix measured in a series of measurements for
, respectively. The underlying correlation matrix has two eigenvalues
and
with the weights
. At the critical value
((5.43)), the spectrum splits. The spectral densities are calculated analytically.
Figure 5.2 Simulated limit distribution for a uniform distribution
with
averaged over 700 sample matrices.
Figure 5.3 Simulated limit distribution for
, where
f
(
x
)is an unbounded pdf defined as
. The distributions with
are averaged over 700 sample matrices.
Chapter 06
Figure 6.1 Left: Complex‐valued operation for a real function in upper complex plane. Right: Quaternion‐valued operation for a complex function in hyper complex plane.
Figure 6.2 The sum of
L
non‐Hermitian random matrices:
,
and
, for one matrix, i.e.,
.
Figure 6.3 The same as Figure 6.2 except
.
Figure 6.4 Eigenvalues for a product of
L
non‐Hermitian random matrices:
,
and
, for one matrix, i.e.,
.
Figure 6.5 The same as Figure 6.4 except
.
Figure 6.6 The empirical eigenvalue density function for a product of
L
non‐Hermitian random matrices:
,
and
, for one matrix, i.e.,
.
Figure 6.7 The same as Fig. 6.6 except
.
Figure 6.8 The empirical eigenvalue density function for one non‐Hermitian random matrix:
,
,
and
for
(thus
).
Figure 6.9 The same as Figure 6.8 except
.
Figure 6.10 The eigenvalues for one non‐Hermitian random matrix:
,
,
and
for
(thus
).
Figure 6.11 The same as Figure 6.10 except
.
Figure 6.12 The eigenvalues for one non‐Hermitian random matrix:
,
,
and
for
and
.
Figure 6.13 The same as Figure 6.12.
Figure 6.14 The eigenvalues of (
X
L
)
1/
M
for one non‐Hermitian random matrix
X
of size
:
,
,
,
. The ratio
α
is
.
Figure 6.15 The same as Figure 6.14. Four outliers.
Figure 6.16 The eigenvalues of (
X
1/
M
)
L
for one non‐Hermitian random matrix
X
of size
. All other parameters are the same as Figure 6.14.
Figure 6.17 The same as Figure 6.16. Four outliers.
Figure 6.18 The eigenvalues of a geometric series of
K
terms: each term is (
X
L
/
M
) for one non‐Hermitian random matrix
X
of size
.
,
,
.
Figure 6.19 The same as Figure 6.18. Four outliers.
Figure 6.20 The eigenvalues of a geometric series of
K
terms: each term is (
X
L
/
M
) for one non‐Hermitian random matrix
X
of size
.
,
,
, and
.
Figure 6.21 The same as Figure 6.20 except
.
Figure 6.22 Product of
square i.i.d. matrices.
.
Figure 6.23 The same as Figure 6.22.
Figure 6.24 Density plots of the logarithm of eigenvalue density of the N
N random Green’s matrix (6.19) obtained by numerical diagonalization of 10 realizations of the matrix for N
. The solid lines represent the borderlines of the support of eigenvalue density following from the theory. The dashed lines show the diffusion approximation (6.22). From (a) to (d), we keep increasing
Y
.
Figure 6.25 This figure shows the eigenvalues of a single
i.i.d. random matrix with atom distribution
X
defined by a white Gaussian random variable with zero mean and variance one; the eigenvalues were perturbed by adding the diagonal matrix with ten diagonal entries:
; 3; 2;
;
;
;
;
;
;
, corresponding to ten locations on the complex
z
plane. The small circles are centered at these ten locations on the complex plane, and each has a radius
where
. Five hundred Monte Carlo trials are performed to see how stable these eigenvalues locations are. We can clearly identify every corresponding eigenvalue location on the complex plane.
Figure 6.26 Parameters are same as Figure 6.25, except for
.
Figure 6.27 Parameters are same as Figure 6.25, except for
.
Figure 6.28 Parameters are same as Figure 6.25, except for
. Only 50 Monte Carlo trials are performed here, rather than 500.
Figure 6.29 This figure shows the eigenvalues of a single
i.i.d. random matrix with atom distribution
X
defined by a white Gaussian random variable with zero mean and variance one; the eigenvalues were perturbed by adding a deterministic matrix with four eigenvalues: 2 + 2j;2‐
δ
+2j;2 + 2j‐j‐
δ
;2 + 2j+
+j
(their corresponding eigenvectors are random Gaussian vectors). Here
is the minimum distance between two eigenvalues. The small circles are centered at these four eigenvalue locations on the complex plane, respectively, and each has a radius
where
. Five hundred Monte Carlo trials are performed to see how stable these eigenvalues locations are. We can clearly identify every corresponding eigenvalue location on the complex plane.
Figure 6.30 This figure shows the eigenvalues of a single
i.i.d. random matrix with atom distribution
X
defined by a white Gaussian random variable with zero mean and variance one; the eigenvalues were perturbed by adding a deterministic matrix with four eigenvalues: a + jb;a‐
δ
+jb;a + jb‐j
δ
;a + jb+
+j
(their corresponding eigenvectors are random Gaussian vectors). Here
is the minimum distance between two eigenvalues, and
. The small circles are centered at these 4 eigenvalue locations on the complex plane, and each has a radius of
. In this case
. 200 Monte Carlo trials are performed to see how stable these eigenvalues locations are. We can clearly identify every corresponding eigenvalue location on the complex plane.
Figure 6.31 This figure shows the eigenvalues of a single
i.i.d. random matrix with atom distribution
X
defined by a white Gaussian random variable with zero mean and variance one; the eigenvalues were perturbed by adding a deterministic matrix with four eigenvalues: 2; 2 + 2j;2j;‐2‐2j (their corresponding eigenvectors are random Gaussian vectors). The small circles are centered at these four eigenvalue locations on the complex plane, and each has a radius
where
. Twenty Monte Carlo trials are performed to see how stable these eigenvalues locations are. We can clearly identify every corresponding eigenvalue location on the complex plane.
Figure 6.32 Eigenvalues of
for
where
and
. The small circles are centered at
, respectively, and each has a radius of
. Twenty Monte Carlo trials are performed.
Figure 6.33 Eigenvalues of the matrix
for
and
. Each entry is an i.i.d. Gaussian random variable.
Figure 6.34 Eigenvalues of the matrix
for
and
. Each entry is an i.i.d. Bernoulli random variable, taking the values +1 and
1 each with probability 1/2.
Figure 6.35 Eigenvalues of the matrix
for
and
. Each entry is an i.i.d. Gaussian random variable.
Figure 6.36 Eigenvalues of the matrix
for
and
. Each entry is an i.i.d. Bernoulli random variable, taking the values +1 and
each with probability 1/2.
Figure 6.37 Eigenvalues of the matrix
for
,
and
. Each entry is an i.i.d. Gaussian random variable.
Figure 6.38 Eigenvalues of the matrix
for
,
and
. Each entry is an i.i.d. Bernoulli random variable, taking the values +1 and
each with probability 1/2.
Figure 6.39 Plotted above is the distribution of the eigenvalues of
where
X
n
is an
random matrix with
and
. Each entry of
X
n
is an i.i.d. Gaussian random variable.
. The three circles with radius of 1/
n
1/4
are located at
. Twenty Monte Carlo trials are performed.
Chapter 07
Figure 7.1 Comparison of the risk estimate using Monte Carlo (solid line) and SURE (cross) versus
for
. SNR = 0.5
Figure 7.2 The same as Figure 7.1 except SNR = 1
Figure 7.3 Theorem 7.6.1 interpreted as a projection in Hilbert space.
Figure 7.4 Theorem 7.6.1 interpreted as tradeoff between bias and variance: Shrinkage intensity 0 corresponds to the sample covariance matrix
S
. Shrinkage intensity 1 corresponds to the shrinkage target
μ
I
. Optimal shrinkage intensity (represented by •) corresponds to the minimum expected loss combination
Figure 7.5 Bayesian interpretation. The left sphere has center
μ
I
and radius
α
and represents prior information. The right sphere has center
S
and radius
β
. The distance between sphere centers is
δ
and represents sample information. If all we knew was that the true covariance matrix
lies on the left sphere, our best guess would be its center: the shrinkage target
μ
I
. If all we knew was that the true covariance matrix
lies on the right sphere, our best guess would be its center: the sample covariance matrix
S
. Putting together both pieces of information, the true covariance matrix
must lie on the circle where the two spheres intersect; therefore, our best guess is its center: the optimal linear shrinkage
.
Figure 7.6 Sample versus true eigenvalues. The solid line represents the distribution of the eigenvalues of the sample covariance matrix. Eigenvalues are sorted from largest to smallest, then plotted against their rank. In this case, the true covariance matrix is the identity, that is, the true eigenvalues are all equal to one. The distribution of true eigenvalues is plotted as a dashed horizontal line at one. Distributions are obtained in the limit as the number of observations
n
and the number of variables
p
both go to infinity with the ratio
p
/
n
converging to a finite positive limit. The four plots correspond to different values of this limit.
Figure 7.7 Projection of first sample eigenvector onto population eigenvectors (indexed by their their associated eigenvalues). We have taken
.
Figure 7.8 Limiting density of sample eigenvalues, in the particular case where all the eigenvalues of the true covariance matrix are equal to one. The graph shows excess dispersion of the sample eigenvalues. The formula for this plot comes from solving the Marchenko–Pastur equation for
.
Figure 7.9 Comparison of the optimal linear versus nonlinear bias correction formula. The distribution of true eigenvalues
H
places 20% mass at 1, 40% mass at 3 and 40% mass at 10.
Figure 7.10 Percentage relative improvement in average loss (PRIAL) from applying the optimal nonlinear shrinkage formula to the sample eigenvalues. The solid line shows the PRIAL obtained by dividing the
i
‐th sample eigenvalue by the correction factor
, as a function of sample size. The dotted line shows the PRIAL of the linear shrinkage estimator first proposed in [402]. For each sample size we ran 10 000 Monte Carlo simulations. As in Figure 8, we used
and the distribution of true eigenvalues
H
placing 20% mass at 1, 40% mass at 3 and 40% mass at 10.
Figure 7.11 A distributed system with a large number of sensors.
Figure 7.12 System with
interering users.
Chapter 08
Figure 8.1 Log‐likelihood functions defined in (8.230) under hypothesis ℋ
0
and hypothesis ℋ
1
for different Monte Carlo realizations.
dB, N = 100.
Chapter 09
Figure 9.1 The future electric grid. RE: renewable energy.
Figure 9.2 Visual history of industrial revolutions: from energy to services and communication and back again to energy.
Figure 9.3 Conceptual model of the smart grid.
Figure 9.4 Projected power generation additions: 2020.
Figure 9.5 Next‐generation cost comparison.
Figure 9.6 NIST smart grid reference model.
Figure 9.7 Smart grid domains.
Figure 9.8 Smart grid pyramid.
Figure 9.9 A view of the utility information system impacted by smart grid strategies.
Figure 9.10 Using the cloud for smart‐grid applications.
Figure 9.11 Systems required to support the high penetration of distributed resources.
Chapter 10
Figure 10.1 A damage‐adaptive intelligent flight‐control system (IFCS).
Figure 10.2 How energy management systems can help to avoid blackouts.
Figure 10.3 A sample system with processors connected by communication links.
Figure 10.4 Role of demand response in electric system planning and operations.
Figure 10.5 Model of domestic energy streams.
Figure 10.6 Three step control methodology.
Figure 10.7 Smart grid detailed logical model.
Figure 10.8 Household electricity demand profile.
Figure 10.9 Distribution of network smart metering data.
Figure 10.10 Communication Infrastructure for Smart Grid.
Figure 10.11 Smart grid architecture increases the capacity and flexibility of the network and provides advanced sensing and control through modern communications technologies.
Chapter 11
Figure 11.1 Smart grid framework.
Chapter 12
Figure 12.1 Compensating for signal delay introduced by the antialiasing filter.
Figure 12.2 Electricity ecosystem of the future grid featuring various of players and levels of interactions.
Figure 12.3 Relationship between different elements that collectively constitute the EMS/SCADA.
Chapter 14
Figure 14.1 Demand response connectivity and information flow.
Figure 14.2 Interaction of demand response, variable generation, and storage.
Figure 14.3 A simplified illustration of the wholesale electricity market formed by multiple generators and several regional retail companies. Each retailer provides electricity for a number of users. Retailers are connected to the users via local area networks which are used to announce real‐time prices to the users.
Figure 14.4 Daily residential load curve.
Figure 14.5 Subscription options of charging time zones for PEV owners and variable short‐term market energy pricing.
Chapter 15
Figure 15.1 A proposed 5G heterogeneous wireless cellular architecture.
Cover
Table of Contents
Begin Reading
iii
iv
v
xv
xvi
xvii
xix
xxi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
455
457
458
459
460
461
462
463
464
465
466
467
468
469
471
472
473
474
475
476
477
478
479
480
481
482
483
485
486
487
488
489
490
491
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
517
518
519
520
521
522
523
525
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
557
558
559
560
561
562
563
564
565
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
601
602
603
604
Robert C. Qiu and Paul Antonik
This edition first published 2017© 2017 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law.Advice on how to obtain permision to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Robert C. Qiu and Paul Antonik to be identified as the authors of this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyThe publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.
Library of Congress Cataloging‐in‐Publication Data
Names: Qiu, Robert C., 1966– author. | Antonik, Paul, author.Title: Smart grid using big data analytics/Robert C. Qiu, Paul Antonik.Description: Chichester, West Sussex, United Kingdom : John Wiley & Sons, Inc., 2017. | Includes bibliographical references and index.Identifiers: LCCN 2016042795| ISBN 9781118494059 (cloth) | ISBN 9781118716793 (epub) | ISBN 9781118716809 (Adobe PDF)Subjects: LCSH: Smart power grids. | Big data.Classification: LCC TK3105 .Q25 2017 | DDC 621.310285/57–dc23LC record available at https://lccn.loc.gov/2016042795
Cover design by WileyCover image: johnason/loops7/ziggymaj/gettyimages
To Lily L. Li
When, in the fall of 2010, the first author wrote the initial draft of this book in the form of lecture notes for a smart grid course, the preface began by justifying the need for such a course. He explained at length why it was important that electrical engineers understand Smart Grid. Now, such a justification seems unnecessary. Rather, he had to justify repeatedly his own decision to cover aspects of big data in a smart grid course, in order to convince the audience and most of the time himself. Although we feel completely comfortable with this “big” decision at this point of writing, we still want to outline some points that led to that decision. The decision was motivated by our passion to pursue research in this direction. The excitement of the problems that lie at the intersection between the two topics convinced us that the time had come to study big data for smart grid, which is the integration of communications and sensing.
For big data, we have two major tasks: (i) big data modeling and (ii) big data analytics. After the book was finished, we realized that more than 90% of the contents were dedicated to these two aspects. The applications of this material are treated very lightly. We emphasize the mathematical foundation of big data, in a similar way to Qiu and Wicks’ Cognitive Networked Sensing and Big Data (Springer, 2014). Qiu, Hu, Li and Wicks’ Cognitive Radio Communication and Networking (John Wiley & Sons Ltd, 2012) complements both books. All three books are unified by matrix‐valued random variables (random matrix theory).
In choosing topics we heeded the warning of the former NYU professor K. O. Friedrichs: “It is easy to write a book if you are willing to put into it everything you know about the subject” (P. Lax, Functional Analysis, Wiley‐Interscience, 2002, p. xvii). The services provided by Google Scholar and online digital libraries completely relieved us of the burden of physically going to the library. Using the “cited by” function provided by Google Scholar, even working remotely from the office, we could put things together without difficulty. We were able to use this function to track the latest results on the subject. This book deals with the fundamentals of big data, addressing principles and applications. We view big data as a new science: a combination of information science and data science. Smart grid, communications and sensing are three applications of special interest to the authors.
This book studies the intersection of big data (Part I) with Smart Grid (part II) and communications and sensing (Part III). Random matrix theory is treated as the unifying theme. Random matrix models provide a powerful framework for modeling numerous physical phenomena in quantum systems, financial systems, sensor networks, wireless networks, smart grid, and so forth. One goal is to outline how an audience with a signals‐and‐systems background can contribute to big data research and development (R&D). As most mathematical results are synthesized from the literature of mathematics and physics, we have tried to present them in very different ways, usually motivated by the above Big Data systems. Roughly speaking, a big data system means a large statistical system or “large models.” Although no claim of novel mathematical results is made, the combination of these mathematical models with these particular big data systems seems worth mentioning. Initially, we really intended to write a textbook in a traditional way; however, as the project evolved, we could not resist the temptation to include many beautiful mathematical results. These results are relatively new in the statistical literature and completely novel to the engineering community. We aim to bridge the gap between big data modeling/analytics and large random matrices in a systematic manner. The latest references are reasonably comprehensive in this treatment (sometimes exhaustive, for example for non‐Hermitian random matrices).
Random matrices are ubiquitous [1]. The reason for this is twofold. First, they have a great degree of universality; that is, the eigenvalue properties of large matrices do not depend on the underlying statistical matrix ensemble. Second, random matrices can be viewed as noncommuting probability theory where the whole matrix is treated as an element of the probability space. Nowadays, data sets are usually organized as large matrices whose first dimension is equal to the number of degrees of freedom and the second to the number of measurements. Typical examples include financial systems, sensing systems and wireless communications systems.
As pointed out above, random matrix theory is the foundation for many problems in smart grid and big data. We hold the belief that big data is more basic than smart grid; the latter is is the applied science of the former. On the other hand, smart grid motivates big data. As a result, the close interaction is the natural topic for study. During the first offering of the first author’s course on smart grid (Fall 2010), he primarily relied on the journal papers on power systems. During the second offering of this course (Fall 2013), the contents of the materials mainly covered big data aspects, especially the latest results of the random matrix theory. The audience were graduate students from EE and CS. He realized that without solid backgrounds in big data, the introduction of smart grid—large power systems that lead to high‐dimensional data—could be very superficial. For example, the challenges of state estimation and bad data detection are due to the high dimensionality of the resultant datasets. This issue belongs to the larger class of standard big data problems. Although, in Fall 2013, he pushed the course to the frontiers of statistics, theoretical physics, and finance, he knew that his class had difficulty in following him. He lost most students when he addressed random matrices. To do that, he had to go back to cover random vectors first. It was a very painful experience for all of them because the students were not comfortable with random vectors, which are the most important prerequisite for reading Part I of this book.
Big data is a new science with numerous applications. After we combine smart grid and big data, we are able to crystallize many standard problems and focus our efforts on the marriage of two subjects. We feel very comfortable that this combination will be extremely fruitful in the near future. In the infancy of this connection, our aim is to spell out our goals and methodologies; at the same time, we outline the mathematical foundations by introducing random matrix theory, in the hope that this mathematical theory is sufficiently general and flexible to provide a definitive machinery for the analysis of big data and smart grid. It is common to hear that big data lacks a theoretical foundation. Maybe there is no theory at all. It is the sense of the mission (to search for such a theory) that has sustained us in this long journey.
This book is the result of many years of teaching and research in the field of smart grid. This work is in part funded by the National Science Foundation through three grants (ECCS‐0901420, ECCS‐0821658, and CNS‐1247778), and the Office of Naval Research through two grants (N00010‐10‐1‐0810 and N00014‐11‐1‐0006). We want to thank Dr. Santanu K. Das (ONR) for his support for the work.
We want to thank Dr. Zhen Hu for reading through the whole manuscript. The first author wants to thank the ECE students in his smart grid courses (Fall 2010, Fall 2013) for their patience and useful feedback. The first author was working with China Power Research Institute (CPRI), Beijing, China, when the book was nearing completion. He wants to thank his host Dr. Dongxia Zhang (CPRI) and Dr. Chaoyang Zhu for their hospitality. The first author also worked for two months at Shanghai Jiaotong University. He wants to thank Professors Wenxian Yu, Xiuchen Jiang and Zhijian Jin for their hospitality and useful discussions. He also wants to thank Professors Shaoqian Li and Guangrong Yue at the University of Electronic Science and Technology of China (UESTC) for their hospitality and useful discussions.
Kronecker product of two matrices
matrix with
p
rows and
q
columns
free additive convolution; Voiculescu’s operations
and
free multiplicative convolution
expectation of
operator norm of the matrix
Frobenius norm of the matrix
convergence in distribution
ℂ
the set of complex numbers
expectation of random variable
X
expectation of random vector
x
expectation of random matrix
X
indicator function
Data is “unreasonably effective” [2]. Nobel laureate Eugene Wigner referred to the unreasonable effectiveness of mathematics in the natural sciences [3]. What is big data? According to [4], its sizes are in the order of terabytes or petabytes; it is often online, and it is not available from a central source. It is diverse, may be loosely structured with a large percentage of data missing.It is heterogeneous.
The promise of data‐driven decision‐making is now broadly recognized [5–16]. There is no clear consensus about what big data is. In fact, there have been many controversial statements about big data, such as “Size is the only thing that matters.”
Big data is a big deal [17]. The Big Data Research and Development Initiative has been launched by the US Federal government. “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning” [17]. Universities are beginning to create new courses to prepare the next generation of “data scientists.”
The age of big data has already arrived with global data doubling every two years. The utility industry is not the only one facing this issue (Wal‐Mart has a million customer transactions a day) but utilities have been slower to respond to the data deluge. Scaling up the algorithms to massive datasets is a big challenge.
According to [18]:
A key tenet of big data is that the world and the data that describe it are constantly changing and organizations that can recognize the changes and react quickly and intelligently will have the upper hand … As the volume of data explodes, organizations will need analytic tools that are reliable, robust and capable of being automated. At the same time, the analytics, algorithms, and user interfaces they employ will need to facilitate interactions with the people who work with the tools.
Data is a strategic resource, together with natural resources and human resources. Data is king! “Big data” refers to a technology phenomenon that has arisen since the late 1980s [19]. As computers have improved, their growing storage and processing capacities have provided new and powerful ways to gain insight into the world by sifting through enormous quantities of data available. But this insight, discoverable in previously unseen patterns and trends within these phenomenally large data sets, can be hard to detect without new analytic tools that can comb through the information and highlight points of interest.
Sources such as online or mobile financial transactions, social media traffic, and GPS coordinates, now generate over 2.5 quintillion bytes of so‐called “big data” every day. The growth of mobile data traffic from subscribers in emerging markets exceeded 100% annually through 2015. There are new possibilities for international development (see Figure 1.1).
Figure 1.1 Big data, big impact: new possibilities for international development.
Source: Reproduced from [6] with permission from the World Economic Forum.
Big data at the societal level provides a powerful microscope, together with social mining—the ability to discover knowledge from these data. Scientific research is being revolutionized by this, and policy making is next in line, because big data and social mining are providing novel means for measuring and monitoring wellbeing in our society more realistically, beyond the GDP, more precisely, continuously, everywhere [20].
Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data [16].
Chris Anderson believed that the data deluge makes the scientific method obsolete [21]. Petabytes data tell us to say correlation is enough. There is no need to find the models. Correction replaces causality. It remains open to see whether the data growth will lead to a fundamental change in scientific methods.
In the computing industry we are now focussing on how to process big data [22].
A fundamental question is “What is the unifying theory for big data?” This book adopts the viewpoint that big data is a new science of combining data science and information science. Specialists in different fields deal with big data on their own, while information experts play a secondary role as assistants. In other words, most scientific problems are in the hands of specialists whereas only few problems—common to all fields—are refined by computing experts. When more and more problems are open, some unifying challenges common to all fields will arise. Big data from the Internet may receive more attention first. Big data from physical systems will become more and more important.
Big data will form a unique discipline that requires expertise from mathematics, statistics and computing algorithms.
Following the excellent review in [22], we highlight some challenges for big data:
Processing unstructured and semistructured data.
Presently 85% of the data are unstructured or semistructured. Traditional relational databases cannot handle these massive datasets. High scalability is the most important requirement for big‐data analysis. MapReduce and Hadoop are two nonrelational data analysis technologies.
Novel approaches for data representation.
Current data representation cannot visually express the true essence of the data. If the raw data are labeled, the problem is much easier but customers do not approve of the labeling.
Data fusion.
The true value of big data cannot exhibit itself without data fusion. The data deluge on the Internet has something to do with data formats. One critical challenge is whether we can conveniently fuse the data from individuals, industry and government. It is preferable that data formats be platform free.
Redundancy reduction and high‐efficiency, low‐cost data storage.
Redundancy reduction is important for cost reduction.
Analytical tools and development environments that are suitable for a variety of fields.
Computing algorithm researchers and people from different disciplines are encouraged to work together closely as a team. There are enormous barriers for people from different disciplines to share data. Data collection, especially simultaneous collection for relational data, is still very challenging.
Novel approaches to save energy for data processing, data storage, and communication.
The Defense Advanced Research Projects Agency’s (DARPA’s) XDATA program seeks to develop computational techniques and software tools for analyzing large volumes of data, both semistructured (e.g., tabular, relational, categorical, metadata) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include (i) developing scalable algorithms for processing imperfect data in distributed data stores, and (ii) creating effective human–computer interaction tools to facilitate rapidly customizable visual reasoning for diverse missions.
Data continues to be generated and digitally archived at increasing rates, resulting in vast databases available for search and analysis. Access to these databases has generated new insights through data‐driven methods in the commercial, science, and computing sectors [23]. The defense section is “swimming in sensors and drowning in data.” Big data arises from the Internet and the monitoring of industrial equipment. Sensor networks and the Internet of Things (IoT) are another two drivers.
There is a trend for data to be used that can sometimes be seen only once, for milliseconds, or can only be stored for a short time before being deleted, especially in some defense applications. This trend is accelerated by the proliferation of various digital devices and the Internet. It is important to develop fast, scalable, and efficient methods for processing and visualizing data.
The XDATA program’s technology development is approached through four technical areas (TAs):
TA1: Scalable analytics and data‐processing technology;
TA2: Visual user interface technology;
TA3: Research software integration;
TA4: Evaluation.
It is useful to consider distributed computing via architectures like MapReduce, and its open source implementation, Hadoop. Data collected by the Department of Defense (DoD) are particularly difficult to deal with, including missing data, missing connections between data, incomplete data, corrupted data, data of variable size and type, and so forth [23]. We need to develop analytical principles and implementations scalable to data volume and distributed computer architectures. The challenge for Technical Area 1 is how to enable systematic use of big data in the following list of topic areas:
Methods for leveraging the problem structure to create new algorithms to achieve optimal tradeoffs among time complexity, space complexity, and stream complexity (i.e., how many passes over the data are needed).
Methods for the propagation of uncertainty (i.e., every query should have an answer and an error bar), with performance guarantees for loss of precision due to approximations.
Methods for measuring nonlinear relationships among data.
Sampling and estimation techniques for distributed platforms, including compensating for missing information, corrupted information, and incomplete information.
Methods for distributed dimensionality reduction, matrix factorization, matrix completion (within a distributed data store where data are not all in one place).
Methods for operating on streaming data feeds.
Methods for determining optimal cloud configurations and resource allocation with asymmetric components (e.g., many standard machines, a small number of large‐memory machines, machines with graphical processing units).
The challenge for Technical Area 2 is how to hook up big data analytics to interfaces, including but not limited to the following topics:
Visualization of data for scientific discovery, activity patterns, and summaries.
Expressive visualization and/or query languages and processing that support domain‐specific interaction, successive query refinement, repeated viewing of data, faceted search, multidimensional queries, and collaborative/interactive search.
Principled design, including menus, query boxes, hover tips, invalid action notifications, layout logic, as well as processes of overview, zoom and filter, and details‐on‐demand.
Support for the study and characterization of users, including extraction of relations and history, usage, hover time, click rate, dwell, etc.
Functions of timeliness, online versus batch processing, metainformation, etc.
Analytical workflows including data cleaning and intermediate processing.
Tools for rapid domain‐specific end‐user customization.
The phrase “big data” in the National Science Foundation (NSF) refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, e‐mail, video, click streams, and/or all other digital sources available today and in the future [5].
Today, US government agencies recognize that the scientific, biomedical and engineering research communities are undergoing a profound transformation with the use of large‐scale, diverse, and high‐resolution data sets that allow for data‐intensive decision making, including clinical decision making, at a level never before imagined. New statistical and mathematical algorithms, prediction techniques, and modeling methods, as well as multidisciplinary approaches to data collection, data analysis and new technologies for sharing data and information are enabling a paradigm shift in scientific and biomedical investigation. Advances in machine learning, data mining, and visualization are enabling new ways of extracting useful information in a timely fashion from massive data sets, which complement and extend existing methods of hypothesis testing and statistical inference. As a result, a number of agencies are developing big‐data strategies to align with their missions. The NSF’s solicitation focuses on common interests in big data research across the National Institutes of Health (NIH) and the NSF.
There are challenges with Big Data. The first step is data acquisition. Some data sources, such as sensor networks, can produce staggering amounts of raw data. A lot of this data is not of interest. It can be filtered out and compressed by orders of magnitude. One challenge is to define these filters in such a way that they do not discard useful information.
The second big challenge is to generate the right metadata automatically, and to describe what data is recorded and how it is recorded and measured. This metadata is likely to be crucial to downstream analysis. Frequently, the information collected will not be in a format ready for analysis. We have to deal with erroneous data: some news reports are inaccurate.
Data analysis is considerably more challenging than simply locating, identifying, understanding, and citing data. For effective large‐scale analysis all of this has to happen in a completely automated manner.
Mining requires integrated, cleaned, trustworthy, and efficiently accessible data, declarative query and mining interfaces, scalable mining algorithms, and big data computing environments. Today’s analysts are impeded by a tedious process of exporting data from the database, performing a non‐SQL process, and bringing the data back.
