139,99 €
Scientific computing has become an indispensable tool in numerous fields, such as physics, mechanics, biology,
finance and industry. For example, it enables us, thanks to efficient algorithms adapted to current computers, to
simulate, without the help of models or experimentations, the deflection of beams in bending, the sound level in a theater room or a fluid flowing around an aircraft wing.
This book presents the scientific computing techniques applied to parallel computing for the numerical simulation of large-scale problems; these problems result from systems modeled by partial differential equations. Computing concepts will be tackled via examples.
Implementation and programming techniques resulting from the finite element method will be presented for direct solvers, iterative solvers and domain decomposition methods, along with an introduction to MPI and OpenMP.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 429
Veröffentlichungsjahr: 2015
Cover
Title
Copyright
Preface
Introduction
1 Computer Architectures
1.1. Different types of parallelism
1.2. Memory architecture
1.3. Hybrid architecture
2 Parallelization and Programming Models
2.1. Parallelization
2.2. Performance criteria
2.3. Data parallelism
2.4. Vectorization: a case study
2.5. Message-passing
2.6. Performance analysis
3 Parallel Algorithm Concepts
3.1. Parallel algorithms for recurrences
3.2. Data locality and distribution: product of matrices
4 Basics of Numerical Matrix Analysis
4.1. Review of basic notions of linear algebra
4.2. Properties of matrices
5 Sparse Matrices
5.1. Origins of sparse matrices
5.2. Parallel formation of sparse matrices: shared memory
5.3. Parallel formation by block of sparse matrices: distributed memory
6 Solving Linear Systems
6.1. Direct methods
6.2. Iterative methods
7 LU Methods for Solving Linear Systems
7.1. Principle of LU decomposition
7.2. Gauss factorization
7.3. Gauss-Jordan factorization
7.4. Crout and Cholesky factorizations for symmetric matrices
8 Parallelization of LU Methods for Dense Matrices
8.1. Block factorization
8.2. Implementation of block factorization in a message-passing environment
8.3. Parallelization of forward and backward substitutions
9 LU Methods for Sparse Matrices
9.1. Structure of factorized matrices
9.2. Symbolic factorization and renumbering
9.3. Elimination trees
9.4. Elimination trees and dependencies
9.5. Nested dissections
9.6. Forward and backward substitutions
10 Basics of Krylov Subspaces
10.1. Krylov subspaces
10.2. Construction of the Arnoldi basis
11 Methods with Complete Orthogonalization for Symmetric Positive Definite Matrices
11.1. Construction of the Lanczos basis for symmetric matrices
11.2. The Lanczos method
11.3. The conjugate gradient method
11.4. Comparison with the gradient method
11.5. Principle of preconditioning for symmetric positive definite matrices
12 Exact Orthogonalization Methods for Arbitrary Matrices
12.1. The GMRES method
12.2. The case of symmetric matrices: the MINRES method
12.3. The ORTHODIR method
12.4. Principle of preconditioning for non-symmetric matrices
13 Biorthogonalization Methods for Non-symmetric Matrices
13.1. Lanczos biorthogonal basis for non-symmetric matrices
13.2. The non-symmetric Lanczos method
13.3. The biconjugate gradient method: BiCG
13.4. The quasi-minimal residual method: QMR
13.5. The BiCGSTAB
14 Parallelization of Krylov Methods
14.1. Parallelization of dense matrix-vector product
14.2. Parallelization of sparse matrix-vector product based on node sets
14.3. Parallelization of sparse matrix-vector product based on element sets
14.4. Parallelization of the scalar product
14.5. Summary of the parallelization of Krylov methods
15 Parallel Preconditioning Methods
15.1. Diagonal
15.2. Incomplete factorization methods
15.3. Schur complement method
15.4. Algebraic multigrid
15.5. The Schwarz additive method of preconditioning
15.6. Preconditioners based on the physics
Appendices
Appendix 1: Exercises
A1.1. Parallelization techniques
A1.2. Matrix analysis
A1.3. Direct methods
A1.4. Iterative methods
A1.5. Domain decomposition methods
Appendix 2: Solutions
A2.1. Parallelization techniques
A2.2. Matrix analysis
A2.3. Direct methods
A2.4. Iterative methods
A2.5. Domain decomposition methods
Appendix 3: Bibliography and Comments
A3.1. Parallel algorithms
A3.2. OpenMP
A3.3. MPI
A3.4. Performance tools
A3.5. Numerical analysis and methods
A3.6. Finite volume method
A3.7. Finite element method
A3.8. Matrix analysis
A3.9. Direct methods
A3.10. Iterative methods
A3.11. Mesh and graph partitioning
A3.12. Domain decomposition methods
Bibliography
Index
End User License Agreement
1 Computer Architectures
Figure 1.1.
Spatial parallelism: multiplication of units
Figure 1.2.
Temporal parallelism: pipeline of additions
Figure 1.3.
Interleaved multi–bank memory
Figure 1.4.
Cache memory
Figure 1.5.
Management of the cache memory
Figure 1.6.
Parallel architecture with cache memories
Figure 1.7.
Management of multiple caches
Figure 1.8.
Hierarchy of the caches
Figure 1.9.
Distributed memory computer
Figure 1.10.
GPU architecture
Figure 1.11.
Hybrid computer
2 Parallelization and Programming Models
Figure 2.1.
Message passing
Figure 2.2.
Parallel execution in dynamic mode
Figure 2.3.
Strong and weak scalabilities. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
Figure 2.4.
Order of execution (
j, i
) and dependencies
Figure 2.5.
Order of execution (
i,j
) and dependencies
Figure 2.6.
Pipelining of the linear combination of vectors
Figure 2.7.
One-to-all communication
Figure 2.8.
All-to-all communication
Figure 2.9.
Visualization of the trace of a CFD code. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
Figure 2.10.
Brain hemodynamics; (top) geometry and zoom on the mesh (from Raul Cebral, George Mason University, USA); (bottom). Subdomain connectivity and number of neighbors. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
Figure 2.11.
Trace of a CFD code. MPI calls inside an iterative solver. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
3 Parallel Algorithm Concepts
Figure 3.1.
The effect of cache memory on the speed of calculation
Figure 3.2.
Row-wise communication
Figure 3.3.
Column-wise communication
Figure 3.4.
Row-wise communicators
Figure 3.5.
Column-wise communicators
4 Basics of Numerical Matrix Analysis
Figure 4.1.
Basis change and linear application
5 Sparse Matrices
Figure 5.1.
Supports of the basis functions associated with two vertices
Figure 5.2.
Mesh
Figure 5.3.
Sparse matrix
Figure 5.4.
Coloring the elements of a mesh
Figure 5.5.
Mesh partitioning by sets of vertices
Figure 5.6.
Mesh partitioning by sets of elements
Figure 5.7.
One-dimensional case. Typical decomposition of FD, FE and FV
Figure 5.8.
One-dimensional case. Submatrices coming from the FD (left), FE (center) and FV (right) methods
8 Parallelization of LU Methods for Dense Matrices
Figure 8.1.
Random distribution
Figure 8.2.
Diagonal block transfers at iteration 1
Figure 8.3.
Block transfers of the first row and the first column at iteration 1
Figure 8.4.
Cyclic distribution
9 LU Methods for Sparse Matrices
Figure 9.1.
Structure of the factorized matrix before filling
Figure 9.2.
Structure of the factorized matrix after filling
Figure 9.3.
The initial graph on the left, and the emergence of a clique, on the right
Figure 9.4.
Upper and lower profiles
Figure 9.5.
Frontal numbering
Figure 9.6.
Filling of monotonic profile
Figure 9.7.
Coefficients created or modified in step 1
Figure 9.8.
Coefficients created or modified in step 5
Figure 9.9.
Elimination tree and dependence
Figure 9.10.
Level 1 binary tree
Figure 9.11.
Decomposition into two subdomains
Figure 9.12.
Local meshes of the two subdomains
Figure 9.13.
Mesh-partitioning by nested dissections
Figure 9.14.
The matrix structure obtained by nested dissections
Figure 9.15.
Structure of the matrix after the first partial factorization
14 Parallelization of Krylov Methods
Figure 14.1.
Product by using a block of dense rows
Figure 14.2.
The product by a block of sparse rows
Figure 14.3.
Subgraph associated with a block of rows
Figure 14.4.
Distinct subdomain meshes
Figure 14.5.
Effect of round-off errors on the convergence of the parallel conjugate gradient method. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
Figure 14.6.
Description of the interfaces
Figure 14.7.
Communication strategies. Top, inefficient strategy. Bottom, optimum strategy
Figure 14.8.
Matrix-vector product. Comparison of FD, FE and FV
Figure 14.9.
Multi-domain partition
15 Parallel Preconditioning Methods
Figure 15.1.
Structure of the renumbered tridiagonal matrix with filling
Figure 15.2.
Comparison of diagonal and block LU preconditioners. Number of iterations and CPU time
Figure 15.3.
Coarsening of a bidimensional Cartesian grid
Figure 15.4.
V-cycle and W-cycle for three grid levels
Figure 15.5.
Solutions of the global and local problems, without overlap
Figure 15.6.
Successive iterations of the Schwarz method
Figure 15.7.
Successive iterations of the additive Schwarz method
Figure 15.8.
Overlapping meshes with different overlaps s
Figure 15.9.
Comparison of the convergences of the multiplicative and additive Schwarz method
Figure 15.10.
Overlapping (light gray) from a subset defined by a partition by vertices (dark gray)
Figure 15.11.
Distribution of the interface between two subdomains
Figure 15.12.
Interface nodes not present in the matrix graph of
Ω
2
Figure 15.13.
Linelet. The circles are the neighbors of the black node in the matrix graph
Figure 15.14.
Linelet preconditioner applied to the DCG. Comparison with the CG and DCG with diagonal preconditioner. Left: Mesh. Right: Convergence. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip
Cover
Table of Contents
Begin Reading
Cover
Contents
iii
iv
xi
xii
xiii
xv
xvi
xvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
93
94
95
96
97
98
99
100
101
102
103
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
125
126
127
128
129
130
131
132
133
134
135
136
137
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
279
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
339
340
341
343
344
345
346
347
348
349
350
351
352
Frédéric Magoulès
François-Xavier Roux
Guillaume Houzeaux
First published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
Cover photo generated by Guillermo Marin, Barcelona SuperComputing Center (Spain).
© ISTE Ltd 2016
The rights of Frédéric Magoulès, François-Xavier Roux and Guillaume Houzeaux to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2015955799
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-84821-581-8
Scientific computing has become an invaluable tool in many diverse fields, such as physics, engineering, mechanics, biology, finance and manufacturing. For example, through the use of efficient algorithms adapted to today’s computers, we can simulate, without physically testing costly mock-ups, the deformation of a roof under the weight of snow, the acoustics of a concert hall or the airflow around the wings of an aircraft.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
