Parallel Scientific Computing - Frédéric Magoules - E-Book

Parallel Scientific Computing E-Book

Frédéric Magoules

0,0
139,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Scientific computing has become an indispensable tool in numerous fields, such as physics, mechanics, biology,
finance and industry. For example, it enables us, thanks to efficient algorithms adapted to current computers, to
simulate, without the help of models or experimentations, the deflection of beams in bending, the sound level in a theater room or a fluid flowing around an aircraft wing.
This book presents the scientific computing techniques applied to parallel computing for the numerical simulation of large-scale problems; these problems result from systems modeled by partial differential equations. Computing concepts will be tackled via examples.
Implementation and programming techniques resulting from the finite element method will be presented for direct solvers, iterative solvers and domain decomposition methods, along with an introduction to MPI and OpenMP.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 429

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title

Copyright

Preface

Introduction

1 Computer Architectures

1.1. Different types of parallelism

1.2. Memory architecture

1.3. Hybrid architecture

2 Parallelization and Programming Models

2.1. Parallelization

2.2. Performance criteria

2.3. Data parallelism

2.4. Vectorization: a case study

2.5. Message-passing

2.6. Performance analysis

3 Parallel Algorithm Concepts

3.1. Parallel algorithms for recurrences

3.2. Data locality and distribution: product of matrices

4 Basics of Numerical Matrix Analysis

4.1. Review of basic notions of linear algebra

4.2. Properties of matrices

5 Sparse Matrices

5.1. Origins of sparse matrices

5.2. Parallel formation of sparse matrices: shared memory

5.3. Parallel formation by block of sparse matrices: distributed memory

6 Solving Linear Systems

6.1. Direct methods

6.2. Iterative methods

7 LU Methods for Solving Linear Systems

7.1. Principle of LU decomposition

7.2. Gauss factorization

7.3. Gauss-Jordan factorization

7.4. Crout and Cholesky factorizations for symmetric matrices

8 Parallelization of LU Methods for Dense Matrices

8.1. Block factorization

8.2. Implementation of block factorization in a message-passing environment

8.3. Parallelization of forward and backward substitutions

9 LU Methods for Sparse Matrices

9.1. Structure of factorized matrices

9.2. Symbolic factorization and renumbering

9.3. Elimination trees

9.4. Elimination trees and dependencies

9.5. Nested dissections

9.6. Forward and backward substitutions

10 Basics of Krylov Subspaces

10.1. Krylov subspaces

10.2. Construction of the Arnoldi basis

11 Methods with Complete Orthogonalization for Symmetric Positive Definite Matrices

11.1. Construction of the Lanczos basis for symmetric matrices

11.2. The Lanczos method

11.3. The conjugate gradient method

11.4. Comparison with the gradient method

11.5. Principle of preconditioning for symmetric positive definite matrices

12 Exact Orthogonalization Methods for Arbitrary Matrices

12.1. The GMRES method

12.2. The case of symmetric matrices: the MINRES method

12.3. The ORTHODIR method

12.4. Principle of preconditioning for non-symmetric matrices

13 Biorthogonalization Methods for Non-symmetric Matrices

13.1. Lanczos biorthogonal basis for non-symmetric matrices

13.2. The non-symmetric Lanczos method

13.3. The biconjugate gradient method: BiCG

13.4. The quasi-minimal residual method: QMR

13.5. The BiCGSTAB

14 Parallelization of Krylov Methods

14.1. Parallelization of dense matrix-vector product

14.2. Parallelization of sparse matrix-vector product based on node sets

14.3. Parallelization of sparse matrix-vector product based on element sets

14.4. Parallelization of the scalar product

14.5. Summary of the parallelization of Krylov methods

15 Parallel Preconditioning Methods

15.1. Diagonal

15.2. Incomplete factorization methods

15.3. Schur complement method

15.4. Algebraic multigrid

15.5. The Schwarz additive method of preconditioning

15.6. Preconditioners based on the physics

Appendices

Appendix 1: Exercises

A1.1. Parallelization techniques

A1.2. Matrix analysis

A1.3. Direct methods

A1.4. Iterative methods

A1.5. Domain decomposition methods

Appendix 2: Solutions

A2.1. Parallelization techniques

A2.2. Matrix analysis

A2.3. Direct methods

A2.4. Iterative methods

A2.5. Domain decomposition methods

Appendix 3: Bibliography and Comments

A3.1. Parallel algorithms

A3.2. OpenMP

A3.3. MPI

A3.4. Performance tools

A3.5. Numerical analysis and methods

A3.6. Finite volume method

A3.7. Finite element method

A3.8. Matrix analysis

A3.9. Direct methods

A3.10. Iterative methods

A3.11. Mesh and graph partitioning

A3.12. Domain decomposition methods

Bibliography

Index

End User License Agreement

List of Illustrations

1 Computer Architectures

Figure 1.1.

Spatial parallelism: multiplication of units

Figure 1.2.

Temporal parallelism: pipeline of additions

Figure 1.3.

Interleaved multi–bank memory

Figure 1.4.

Cache memory

Figure 1.5.

Management of the cache memory

Figure 1.6.

Parallel architecture with cache memories

Figure 1.7.

Management of multiple caches

Figure 1.8.

Hierarchy of the caches

Figure 1.9.

Distributed memory computer

Figure 1.10.

GPU architecture

Figure 1.11.

Hybrid computer

2 Parallelization and Programming Models

Figure 2.1.

Message passing

Figure 2.2.

Parallel execution in dynamic mode

Figure 2.3.

Strong and weak scalabilities. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

Figure 2.4.

Order of execution (

j, i

) and dependencies

Figure 2.5.

Order of execution (

i,j

) and dependencies

Figure 2.6.

Pipelining of the linear combination of vectors

Figure 2.7.

One-to-all communication

Figure 2.8.

All-to-all communication

Figure 2.9.

Visualization of the trace of a CFD code. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

Figure 2.10.

Brain hemodynamics; (top) geometry and zoom on the mesh (from Raul Cebral, George Mason University, USA); (bottom). Subdomain connectivity and number of neighbors. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

Figure 2.11.

Trace of a CFD code. MPI calls inside an iterative solver. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

3 Parallel Algorithm Concepts

Figure 3.1.

The effect of cache memory on the speed of calculation

Figure 3.2.

Row-wise communication

Figure 3.3.

Column-wise communication

Figure 3.4.

Row-wise communicators

Figure 3.5.

Column-wise communicators

4 Basics of Numerical Matrix Analysis

Figure 4.1.

Basis change and linear application

5 Sparse Matrices

Figure 5.1.

Supports of the basis functions associated with two vertices

Figure 5.2.

Mesh

Figure 5.3.

Sparse matrix

Figure 5.4.

Coloring the elements of a mesh

Figure 5.5.

Mesh partitioning by sets of vertices

Figure 5.6.

Mesh partitioning by sets of elements

Figure 5.7.

One-dimensional case. Typical decomposition of FD, FE and FV

Figure 5.8.

One-dimensional case. Submatrices coming from the FD (left), FE (center) and FV (right) methods

8 Parallelization of LU Methods for Dense Matrices

Figure 8.1.

Random distribution

Figure 8.2.

Diagonal block transfers at iteration 1

Figure 8.3.

Block transfers of the first row and the first column at iteration 1

Figure 8.4.

Cyclic distribution

9 LU Methods for Sparse Matrices

Figure 9.1.

Structure of the factorized matrix before filling

Figure 9.2.

Structure of the factorized matrix after filling

Figure 9.3.

The initial graph on the left, and the emergence of a clique, on the right

Figure 9.4.

Upper and lower profiles

Figure 9.5.

Frontal numbering

Figure 9.6.

Filling of monotonic profile

Figure 9.7.

Coefficients created or modified in step 1

Figure 9.8.

Coefficients created or modified in step 5

Figure 9.9.

Elimination tree and dependence

Figure 9.10.

Level 1 binary tree

Figure 9.11.

Decomposition into two subdomains

Figure 9.12.

Local meshes of the two subdomains

Figure 9.13.

Mesh-partitioning by nested dissections

Figure 9.14.

The matrix structure obtained by nested dissections

Figure 9.15.

Structure of the matrix after the first partial factorization

14 Parallelization of Krylov Methods

Figure 14.1.

Product by using a block of dense rows

Figure 14.2.

The product by a block of sparse rows

Figure 14.3.

Subgraph associated with a block of rows

Figure 14.4.

Distinct subdomain meshes

Figure 14.5.

Effect of round-off errors on the convergence of the parallel conjugate gradient method. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

Figure 14.6.

Description of the interfaces

Figure 14.7.

Communication strategies. Top, inefficient strategy. Bottom, optimum strategy

Figure 14.8.

Matrix-vector product. Comparison of FD, FE and FV

Figure 14.9.

Multi-domain partition

15 Parallel Preconditioning Methods

Figure 15.1.

Structure of the renumbered tridiagonal matrix with filling

Figure 15.2.

Comparison of diagonal and block LU preconditioners. Number of iterations and CPU time

Figure 15.3.

Coarsening of a bidimensional Cartesian grid

Figure 15.4.

V-cycle and W-cycle for three grid levels

Figure 15.5.

Solutions of the global and local problems, without overlap

Figure 15.6.

Successive iterations of the Schwarz method

Figure 15.7.

Successive iterations of the additive Schwarz method

Figure 15.8.

Overlapping meshes with different overlaps s

Figure 15.9.

Comparison of the convergences of the multiplicative and additive Schwarz method

Figure 15.10.

Overlapping (light gray) from a subset defined by a partition by vertices (dark gray)

Figure 15.11.

Distribution of the interface between two subdomains

Figure 15.12.

Interface nodes not present in the matrix graph of

Ω

2

Figure 15.13.

Linelet. The circles are the neighbors of the black node in the matrix graph

Figure 15.14.

Linelet preconditioner applied to the DCG. Comparison with the CG and DCG with diagonal preconditioner. Left: Mesh. Right: Convergence. For a color version of the figure, see www.iste.co.uk/magoules/computing.zip

Guide

Cover

Table of Contents

Begin Reading

Pages

Cover

Contents

iii

iv

xi

xii

xiii

xv

xvi

xvii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

93

94

95

96

97

98

99

100

101

102

103

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

125

126

127

128

129

130

131

132

133

134

135

136

137

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

279

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

339

340

341

343

344

345

346

347

348

349

350

351

352

Parallel Scientific Computing

Frédéric Magoulès

François-Xavier Roux

Guillaume Houzeaux

First published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

27-37 St George’s Road

London SW19 4EU

UK

www.iste.co.uk

John Wiley & Sons, Inc.

111 River Street

Hoboken, NJ 07030

USA

www.wiley.com

Cover photo generated by Guillermo Marin, Barcelona SuperComputing Center (Spain).

© ISTE Ltd 2016

The rights of Frédéric Magoulès, François-Xavier Roux and Guillaume Houzeaux to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2015955799

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-581-8

Preface

Scientific computing has become an invaluable tool in many diverse fields, such as physics, engineering, mechanics, biology, finance and manufacturing. For example, through the use of efficient algorithms adapted to today’s computers, we can simulate, without physically testing costly mock-ups, the deformation of a roof under the weight of snow, the acoustics of a concert hall or the airflow around the wings of an aircraft.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!