Energy-Efficient Computing and Data Centers - Luigi Brochard - E-Book

Energy-Efficient Computing and Data Centers E-Book

Luigi Brochard

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Data centers consume roughly 1% of the total electricity demand, while ICT as a whole consumes around 10%. Demand is growing exponentially and, left unchecked, will grow to an estimated increase of 20% or more by 2030. This book covers the energy consumption and minimization of the different data center components when running real workloads, taking into account the types of instructions executed by the servers. It presents the different air- and liquid-cooled technologies for servers and data centers with some real examples, including waste heat reuse through adsorption chillers, as well as the hardware and software used to measure, model and control energy. It computes and compares the Power Usage Effectiveness and the Total Cost of Ownership of new and existing data centers with different cooling designs, including free cooling and waste heat reuse leading to the Energy Reuse Effectiveness. The book concludes by demonstrating how a well-designed data center reusing waste heat to produce chilled water can reduce energy consumption by roughly 50%, and how renewable energy can be used to create net-zero energy data centers.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 297

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Introduction

1 Systems in Data Centers

1.1. Servers

1.2. Storage arrays

1.3. Data center networking

1.4. Components

2 Cooling Servers

2.1. Evolution of cooling for mainframe, midrange and distributed computers from the 1960s to 1990s

2.2. Emergence of cooling for scale out computers from 1990s to 2010s

2.3. Chassis and rack cooling methods

2.4. Metrics considered for cooling

2.5. Material used for cooling

2.6. System layout and cooling air flow optimization

3 Cooling the Data Center

3.1. System cooling technologies used

3.2. Air-cooled data center

3.3. ASHRAE data center cooling standards

3.4. Liquid-cooled racks

3.5. Liquid-cooled servers

3.6. Free cooling

3.7. Waste heat reuse

4 Power Consumption of Servers and Workloads

4.1. Trends in power consumption for processors

4.2. Trends in power consumption for GPUs

4.3. ACPI states

4.4. The power equation

5 Power and Performance of Workloads

5.1. Power and performance of workloads

5.2. Power, thermal and performance on air-cooled servers with Intel Xeon

5.3. Power, thermal and performance on water-cooled servers with Intel Xeon

5.4. Conclusions on the impact of cooling on power and performance

6 Monitoring and Controlling Power and Performance of Servers and Data Centers

6.1. Monitoring power and performance of servers

6.2. Modeling power and performance of servers

6.3. Software to optimize power and energy of servers

6.4. Monitoring, controlling and optimizing the data center

7 PUE, ERE and TCO of Various Cooling Solutions

7.1. Power usage effectiveness, energy reuse effectiveness and total cost of ownership

7.2. Examples of data centers PUE and EREs

7.3. Impact of cooling on TCO with no waste heat reuse

7.4. Emerging technologies and their impact on TCO

Conclusion

References

Index

End User License Agreement

List of Tables

Introduction

Table I.1. Impact of PUE on price of electricity versus price of server

Chapter 1

Table 1.1. Power summary of different Intel Xeon Scalable Skylake processors

Table 1.2. Power summary of different GPUs

Table 1.3. Power summary of different DIMMs

Table 1.4. Power summary of different SSD-NVMes

Table 1.5. Power summary of different HDDs

Table 1.6. Power summary of the different components on a board

Table 1.7. Power summary of different PCIe network cards

Table 1.8. Power summary of different PCIe accelerator cards

Table 1.9. Efficiency levels required for each 80 Plus certification

Table 1.10. Power summary of different fans

Chapter 2

Table 2.1. Intel Xeon power and thermal characteristics

Table 2.2. Comparison of SD530 fan speed, power and acoustics with 205 W TDP SKU

Table 2.3. Comparison of SD530 fan speed, power and acoustics with 150 W TDP SKU

Chapter 3

Table 3.1. List and function of cooling equipment

Table 3.2. ASHRAE liquid cooling classes

Chapter 4

Table 4.1. Intel Xeon processors characteristics from 2006 to 2017

Table 4.2. Intel processors characteristics from 2006 to 2017 with SPEC_fp

Table 4.3. Intel Xeon microarchitecture characteristics

Table 4.4. Base and Turbo frequency for non-AVX, AVX 2.0 and AVX-512 instruction...

Table 4.5. NVIDIA GPUs characteristics from 2009 to 2017

Table 4.6. CPU and GPU performance and performance per watt

Chapter 5

Table 5.1. List and type of instructions executed by the SIMD test

Table 5.2. DP GFlops at 2.4 GHz and 2.401 GHz

Table 5.3. DP GFlops per watt at 2.4 GHz and 2.401 GHz

Table 5.4. CPU temperature running HPL

Table 5.5. CPU temperature running BQCD1K

Table 5.6. CPU temperature running BQCD128

Table 5.7. CPU temperature, voltage and frequency running HPL

Table 5.8. CPU temperature, voltage and frequency running BQCD1K

Table 5.9. CPU temperature, voltage and frequency running BQCD128

Table 5.10. CPU temperature, voltage, frequency, node power and performance runn...

Table 5.11. CPU temperature, voltage, frequency, node power and performance runn...

Table 5.12. CPU temperature, voltage, frequency, node power and performance runn...

Table 5.13. Inlet water temperature impact on processor temperature and HPL

Chapter 7

Table 7.1. Xeon 8168 and 8174 TDP and frequencies

Table 7.2. Acquisition Costs of the different cooling components

Table 7.3. Individual PUEs of the different cooling solutions

Table 7.4. Greenfield costs and PUE of the different designs and electricity pri...

Table 7.5. Brownfield costs and PUE of the different designs and electricity pri...

Table 7.6. Greenfield costs and PUE of the different designs and SKU TDP

Table 7.7. Greenfield costs and PUEs of different designs and free air-cooling r...

Table 7.8. Greenfield costs and PUE of the different designs

Table 7.9. Cold water balance and PUE of the different power ratio configuration...

Table 7.10. Greenfield costs of the different designs

Table 7.11. RSF data center power consumption and PV power production

Table 7.12. RSF data center power consumption and all PV power production

List of Illustrations

Introduction

Figure I.1. Price of electricity around the world in 2018, from Statisca.com

Chapter 1

Figure 1.1. 1U, 2U, 4U full width servers, 2U chassis with four ½-width nodes an...

Figure 1.2. Illustrative power efficiency of a 450 W PSU

Chapter 2

Figure 2.1. Heat flux on an air-cooled chip. For a color version of this figure,...

Figure 2.2. Lenovo 1U full width SR630 server. For a color version of this figur...

Figure 2.3. Front view of an SR630 server. For a color version of this figure, s...

Figure 2.4. Rear view of an SR630 server. For a color version of this figure, se...

Figure 2.5. Typical efficiency curve for an air moving device. For a color versi...

Figure 2.6. Thermal resistance and air flow pressure drop. For a color version o...

Figure 2.7. Front view of the 2U Lenovo SD530 server with 4½ width nodes. For a ...

Figure 2.8. Inside view of the ½ width node of a Lenovo SD530. For a color versi...

Figure 2.9. Example of a spread core configuration with 12 DIMMs and a shadow co...

Figure 2.10. Thermal transfer module for SD530. For a color version of this figu...

Chapter 3

Figure 3.1. Classic air-cooled machine room. For a color version of this figure,...

Figure 3.2. Hot and cold air flows in the machine room. For a color version of t...

Figure 3.3. ASHRAE classes of data center operation. For a color version of this...

Figure 3.4. ASHRAE 2015–2020 server power and rack heat load trends

Figure 3.5. Data center rack cooling limits

Figure 3.6. Principle of passive RDHX. For a color version of this figure, see w...

Figure 3.7. RDHX, CDU and chillers. For a color version of this figure, see www....

Figure 3.8. RDHX heat removal capacity for a 32 kW and 10 kW rack load. For a co...

Figure 3.9. Water versus air heat capacity and thermal resistance. For a color v...

Figure 3.10. Evolution of module heat flux from 1950 to 2020

Figure 3.11. Water loop assembly of the IBM iDataPlex dx360 m4. For a color vers...

Figure 3.12. Water flow path in the IBM iDataPlex dx360 m4. For a color version ...

Figure 3.13. Lenovo ThinkSystem SD650 water-cooled node. For a color version of ...

Figure 3.14. Lenovo ThinkSystem SD650 memory water channels. For a color version...

Figure 3.15. Adsorption chiller principle and phases (two chambers design). For ...

Figure 3.16. Adsorption chiller principle and phases (four containers design). F...

Figure 3.17. Sankey diagram of the energy flows in an adsorption chiller. For a ...

Figure 3.18. Measured COP

el

values of different cooling technologies. For a colo...

Figure 3.19. Water uptake of new adsorbent material. For a color version of this...

Chapter 4

Figure 4.1. Lithography and peak performance (SP GFlops) for Xeon architectures

Figure 4.2. Lithography and TDP for Xeon architectures

Figure 4.3. Number of transistors and TDP for Xeon architectures

Figure 4.4. SP and SPEC_fp rate performance for Xeon architectures

Figure 4.5. Performance per watt for Xeon architectures

Figure 4.6. Lithography and peak performance (SP GFlops) for NVIDIA GPUs

Figure 4.7. Lithography and TDP (Watt) for NVIDIA GPUs

Figure 4.8. TDP and transistors over NVIDIA architectures

Figure 4.9. SP and DP peak GFlops per watt for NVIDIA GPUs. For a color version ...

Figure 4.10. ACPI states. For a color version of this figure, see www.iste.co.uk...

Figure 4.11. ACPI G-states

Figure 4.12. ACPI S-states

Figure 4.13. Power saving and latency of S-sates

Figure 4.14. Example of C-sates on Intel Nehalem

Figure 4.15. Example of P-states

Figure 4.16. P-states, voltage and frequency

Chapter 5

Figure 5.1. AVX2 frequency across different 2697v3 processor SKU

Figure 5.2. Frequency of each instruction type with Turbo OFF and ON

Figure 5.3. Node, CPU and DIMM DC power of SIMD instructions Turbo OFF. For a co...

Figure 5.4. Node, CPU and DIMM DC power of SIMD instructions with Turbo ON

Figure 5.5. Node, CPU and DIMM DC power running HPL Turbo OFF. For a color versi...

Figure 5.6. Node, CPU and DIMM DC power running HPL Turbo ON. For a color versio...

Figure 5.7. CPU frequency and temperature running HPL with Turbo OFF. For a colo...

Figure 5.8. CPU frequency and temperature running HPL with Turbo ON. For a color...

Figure 5.9. CPU0 and CPU1 CPI and node bandwidth running HPL with Turbo OFF. For...

Figure 5.10. CPU0&1 CPI and node bandwidth running HPL with Turbo ON. For a colo...

Figure 5.11. Node, CPU and DIMM power running STREAM with Turbo OFF. For a color...

Figure 5.12. Node, CPU and DIMM power running STREAM with Turbo ON. For a color ...

Figure 5.13. CPU temperatures and frequencies running STREAM with Turbo OFF. For...

Figure 5.14. CPU temperatures and frequencies running STREAM with Turbo ON. For ...

Figure 5.15. CPU CPIs and node bandwidth running STREAM with Turbo OFF. For a co...

Figure 5.16. CPU CPIs and node bandwidth running STREAM with Turbo ON. For a col...

Figure 5.17. Node, CPU and DIMM power running BQCD1K with Turbo OFF. For a color...

Figure 5.18. Node, CPU and DIMM power running BQCD128 with Turbo OFF. For a colo...

Figure 5.19. Node, CPU and DIMM power running BQCD1K with Turbo ON. For a color ...

Figure 5.20. Node, CPU and DIMM power running BQCD128 with Turbo ON. For a color...

Figure 5.21. CPU temperatures and frequencies running BQCD1K with Turbo OFF. For...

Figure 5.22. CPU temperatures and frequencies running BQCD128 with Turbo OFF. Fo...

Figure 5.23. CPU temperatures and frequencies running BQCD1K with Turbo ON. For ...

Figure 5.24. CPU temperatures and frequencies running BQCD128 with Turbo ON. For...

Figure 5.25. CPU CPIs and node bandwidth running BQCD1K with Turbo OFF. For a co...

Figure 5.26. CPU CPIs and node bandwidth running BQCD128 with Turbo OFF. For a c...

Figure 5.27. CPU CPIs and node bandwidth running BQCD1K with Turbo ON. For a col...

Figure 5.28. CPU CPIs and node bandwidth running BQCD128 with Turbo ON. For a co...

Figure 5.29. Comparison of BQCD1K with Turbo OFF on three servers. For a color v...

Figure 5.30. Comparison of BQCD1K with Turbo ON on three servers. For a color ve...

Figure 5.31. Comparison of HPL with Turbo ON on three servers. For a color versi...

Figure 5.32. Cooling impact on 2697v3 temperature and performance. For a color v...

Chapter 6

Figure 6.1. Node power management on Lenovo ThinkSystem. For a color version of ...

Figure 6.2. Node manager and RAPL reporting frequencies

Figure 6.3. New circuit to get higher resolution DC node power on SD650

Figure 6.4. New circuit to get higher resolution DC node power on SD650. For a c...

Figure 6.5. Power accuracy of different circuits. For a color version of this fi...

Figure 6.6. Neural network example

Figure 6.7. EAR software stack

Chapter 7

Figure 7.1. RSF load profile for the first 11 months of operations. For a color ...

Figure 7.2. RSF hourly PUE over the first 11 months. For a color version of this...

Figure 7.3. RSF ERE as a function of outdoor temperature. For a color version of...

Figure 7.4. Sectional view of LRZ data center. For a color version of this figur...

Figure 7.5. LRZ cooling infrastructure overview. For a color version of this fig...

Figure 7.6. SuperMUC PUE for 2015

Figure 7.7. CoolMUC-2 power consumption in 2016

Figure 7.8. CoolMUC-2 heat transfer to the absorption chiller in 2016

Figure 7.9. CoolMUC-2 cold water generated by the absorption chiller in 2016

Figure 7.10. CoolMUC-2 operations 05/2017–09/2017. For a color version of this f...

Figure 7.11. Impact of electricity on project payback. For a color version of th...

Figure 7.12. Impact of SKU TDP on project payback at $ 0.15/kWh electricity pric...

Figure 7.13. Impact of free cooling ratio on project payback at $0.15 electricit...

Figure 7.14. Impact of hot water reuse on project payback and different electric...

Figure 7.15. Impact of hot water energy reused on project payback with various a...

Figure 7.16. PEMFC diagram. For a color version of this figure, see www.iste.co....

Figure 7.17. Storing and reusing excess energy with PEMFC. For a color version o...

Figure 7.18. Toward a net-zero energy data center. For a color version of this f...

Conclusion

Figure C.1. Power and heat flow of legacy air-cooled data center

Figure C.2. A net-zero energy and smart data center

Guide

Cover

Table of Contents

Begin Reading

Pages

v

iii

iv

ix

x

xi

xii

xiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

221

Series Editor

Serge Petiton

Energy-Efficient Computing and Data Centers

Luigi Brochard

Vinod Kamath

Julita Corbalán

Scott Holland

Walter Mittelbach

Michael Ott

First published 2019 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2019

The rights of Luigi Brochard, Vinod Kamath, Julita Corbalán, Scott Holland, Walter Mittelbach and Michael Ott to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2019940668

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-185-7

1Systems in Data Centers

There are different types of IT equipment that serve different functions depending on customer application. This chapter provides an overview of servers, storage arrays, switches and their components.

1.1. Servers

A server is a broad term describing a specific piece of IT equipment that provides computing capability and runs software applications in an environment networked with other IT equipment, including other servers. Most servers contain the following major hardware building blocks: processors, memory, chipset, input/output (I/O) devices, storage, peripherals, voltage regulators (VRs), power supplies and cooling systems. Additional application-specific integrated circuits (ASICs) may be necessary, such as an onboard redundant array of independent disks (RAID) controller and a server management controller.

Volume rack-mounted servers are designed to fit within commonly available rack sizes, such as the 19 in. (0.5 m) rack form factor defined by EIA/ECA Standard 310-E specification1. The vertical dimension is expressed in terms of rack units or just units (U). One U or 1U represents 1.75 in. (44.45 mm) of vertical height within a rack. Servers used for computing are available in standard rack-mount and custom configurations. Typical dimensions and sizes for standard rack-mount compute servers are full-width 1U, 2U or 4U. A single-server chassis may contain multiple server nodes. Each node is defined as containing all key components, except power supplies, needed to make up a complete server. These nodes simply share the larger chassis infrastructure to conserve data center space. For more dense servers, there are 1U and 2U server enclosures that house several 1U ½-width servers.

Microservers are an emerging technology. They are based on system on a chip (SOC) design where all the functions, which are located on a motherboard for a classic server, are integrated on a single chip with the exception of memory, boot flash and power circuits. SOC are usually less power hungry than usual microprocessors leading to microservers that are more dense than classic servers. Although microservers and SOC are not analyzed in the following chapters, they are worth mentioning. These servers generally provide sufficient, targeted performance with optimized performance-per-watt capability, while being easily scalable with shared power and cooling infrastructure for individual servers.

To achieve even higher compute density than the 1U form factor, blade servers are another option. Each manufacturer designs their blades based on their own packaging and design goals. These blade chassis range from 3U to 10U tall and can house many blades. Blade servers are the result of technology compaction, which allows for a greater processing density in the same equipment volume. The greater processing density also results in greater power and heat density, further complicating data center cooling. Server components that had previously been packaged inside the tower/pedestal or rack-mounted server (e.g. fans and power supplies) are still required, but these components are now located within a chassis (or enclosure) that is designed to house multiple blade servers in a side-by-side or stacked configuration. Most of the time, the blade chassis includes networking, management and even storage functions for the blade servers, while the blade server integrates at least one controller (Ethernet, fiber channel, etc.) on the motherboard. Extra interfaces can be added using mezzanine cards.

Figure 1.1 illustrates 1U, 2U, 4U full width servers, a 2U chassis hosting four ½-width nodes and a 5U blade chassis hosting eight blades.

Examples of such servers will be given in Chapter 2.

Figure 1.1.1U, 2U, 4U full width servers, 2U chassis with four ½-width nodes and a 5U blade chassis with eight blades including a 1U base for power supplies

1.2. Storage arrays

Disk arrays are enclosures that provide ample non-volatile storage space for use by servers. Like servers, and depending on the scale of deployment required, the storage configuration may be a standard rack-mounted system with varying unit height or possibly a custom stand-alone piece of equipment. Disk storage arrays are typically designed for use in EIA/ECA Standard-310-E-compliant racks. The enclosure contains the storage in either small or large form factor drives in addition to the controllers, midplane and batteries for cache backup. The storage array enclosure typically uses redundant power supplies and cooling in the event of component failures. One of the more challenging aspects of storage arrays is the design of a battery backup system to prevent data loss in case of a power interruption or loss. For an in-depth discussion of storage array thermal guidelines, please consult the ASHRAE storage equipment white paper (ASHRAE TC 9.9 2015).

While disk storage arrays are typically used for online storage, backup and disaster recovery, tape storage is known for its low cost and longevity for archiving purposes. Tape systems come in a variety of different formats based on the type of tape media.

1.3. Data center networking

A computer network, or simply a network, is a collection of computers and other hardware interconnected by communication channels that allow sharing of resources and information. Networking equipment facilitates the interconnection of devices and the sharing of data both within the data center and beyond. Networks tend to be designed in a hierarchical manner, with many devices (such as servers and storage devices in the case of a data center) connected to a switch that is connected to another switch at the next level of the hierarchy, and so on. Another common topology is a mesh configuration in which many peer network switches are connected to one another to form a single level of the hierarchy. We will consider three different levels of a network hierarchy: core, distribution and top of rack (TOR). More elaborate configurations will not be covered. The core network function can be thought of as the gateway through which all data entering and exiting the data center must pass. As such, the core network equipment is connected either directly or indirectly to every device in the data center. The core switch is also connected to a service provider, which is the “pipe” through which all data passes from the data center to the Internet. The distribution level of the network acts as an intermediate level between the core and edge levels of the network, and as such can offload some of the work the core network equipment needs to do.

Specifically, the distribution level is useful for passing data between machines inside the data center or aggregating ports to reduce the number of physical ports required at the core. The TOR network level consists of switches that connect directly to devices that are generating or consuming the data, and then pass the data up to the distribution or core level. A data center network implementation may have all three of these levels, or it may combine or eliminate some of the levels. In large data centers, the load on the networking equipment can be substantial, both in terms of port count and data throughput. The end-of-row (EOR) distribution equipment offloads the core equipment in both throughput and port count. The TOR edge networking equipment offloads the distribution equipment in both throughput and port count, in the same way as the distribution networking equipment offloads the core networking equipment. Switches enable communication between devices connected on a network. In the case of a data center, servers and storage arrays are connected to multiple switches in a hierarchical manner.

The ASIC chip decides where that data need to go and sends it back out through the correct port. The central processing unit (CPU) controls both the PHY and the ASIC. The CPU can take data from the network, process it and then send it back out onto the network.

1.4. Components

1.4.1.Central processing unit

The processor, also referred to as the CPU, is one of the greatest sources of heat generation within a server. Aside from the basic processing of data and instructions to provide an output result, processors may also have many more features for managing data and power throughout a system. The processor die is generally housed in a package that includes a substrate (i.e. a small printed circuit board, or PCB, for bringing out signals) and a lid, as shown in Figure 2.1. The lid, or case, more evenly distributes heat to an attached cooling component such as a heat sink (air-cooled) or cold plate (liquid-cooled), as shown in Figure 2.1