Big-Data Analytics for Cloud, IoT and Cognitive Computing - Kai Hwang - E-Book

Big-Data Analytics for Cloud, IoT and Cognitive Computing E-Book

Kai Hwang

0,0
89,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The definitive guide to successfully integrating social, mobile, Big-Data analytics, cloud and IoT principles and technologies

The main goal of this book is to spur the development of effective big-data computing operations on smart clouds that are fully supported by IoT sensing, machine learning and analytics systems. To that end, the authors draw upon their original research and proven track record in the field to describe a practical approach integrating big-data theories, cloud design principles, Internet of Things (IoT) sensing, machine learning, data analytics and Hadoop and Spark programming.

Part 1 focuses on data science, the roles of clouds and IoT devices and frameworks for big-data computing. Big data analytics and cognitive machine learning, as well as cloud architecture, IoT and cognitive systems are explored, and mobile cloud-IoT-interaction frameworks are illustrated with concrete system design examples. Part 2 is devoted to the principles of and algorithms for machine learning, data analytics and deep learning in big data applications. Part 3 concentrates on cloud programming software libraries from MapReduce to Hadoop, Spark and TensorFlow and describes business, educational, healthcare and social media applications for those tools.

  • The first book describing a practical approach to integrating social, mobile, analytics, cloud and IoT (SMACT) principles and technologies
  • Covers theory and computing techniques and technologies, making it suitable for use in both computer science and electrical engineering programs
  • Offers an extremely well-informed vision of future intelligent and cognitive computing environments integrating SMACT technologies
  • Fully illustrated throughout with examples, figures and approximately 150 problems to support and reinforce learning
  • Features a companion website with an instructor manual and PowerPoint slides www.wiley.com/go/hwangIOT

Big-Data Analytics for Cloud, IoT and Cognitive Computing satisfies the demand among university faculty and students for cutting-edge information on emerging intelligent and cognitive computing systems and technologies. Professionals working in data science, cloud computing and IoT applications will also find this book to be an extremely useful working resource. 

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 767

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Big-Data Analytics for Cloud, IoTand Cognitive Computing

Kai HwangUniversity of Southern California, Los Angeles, USA

Min ChenHuazhong University of Science and Technology, Hubei, China

This edition first published 2017© 2017 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Kai Hwang and Min Chen to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Hwang, Kai, author. | Chen, Min, author.Title: Big-Data Analytics for Cloud, IoT and Cognitive Computing/ Kai Hwang, Min Chen.Description: Chichester, UK ; Hoboken, NJ : John Wiley & Sons, 2017. | Includes bibliographical references and index.Identifiers: LCCN 2016054027 (print) | LCCN 2017001217 (ebook) | ISBN 9781119247029 (cloth : alk. paper) | ISBN 9781119247043 (Adobe PDF) | ISBN 9781119247296 (ePub)Subjects: LCSH: Cloud computing--Data processing. | Big data.Classification: LCC QA76.585 .H829 2017 (print) | LCC QA76.585 (ebook) | DDC 004.67/82--dc23LC record available at https://lccn.loc.gov/2016054027

Cover Design: WileyCover Images: (Top Inset Image) © violetkaipa/Shutterstock;(Bottom Inset Image) © 3alexd/Gettyimages;(Background Image) © adventtr/Gettyimages

CONTENTS

About the Authors

Preface

Motivations and Objectives

A Quick Glance of the Book

Our Unique Approach

Building Cloud/IoT Platforms with AI Capabilities

Intended Audience and Readers Guide

Instructor Guide

About the Companion Website

Part 1 Big Data, Clouds and Internet of Things

1 Big Data Science and Machine Intelligence

1.1 Enabling Technologies for Big Data Computing

1.2 Social-Media, Mobile Networks and Cloud Computing

1.3 Big Data Acquisition and Analytics Evolution

1.4 Machine Intelligence and Big Data Applications

1.5 Conclusions

Homework Problems

References

2 Smart Clouds, Virtualization and Mashup Services

2.1 Cloud Computing Models and Services

2.2 Creation of Virtual Machines and Docker Containers

2.3 Cloud Architectures and Resources Management

2.4 Case Studies of IaaS, PaaS and SaaS Clouds

2.5 Mobile Clouds and Inter-Cloud Mashup Services

2.6 Conclusions

Homework Problems

References

3 IoT Sensing, Mobile and Cognitive Systems

3.1 Sensing Technologies for Internet of Things

3.2 IoT Interactions with GPS, Clouds and Smart Machines

3.3 Radio Frequency Identification (RFID)

3.4 Sensors, Wireless Sensor Networks and GPS Systems

3.5 Cognitive Computing Technologies and Prototype Systems

3.6 Conclusions

Homework Problems

References

Part 2 Machine Learning and Deep Learning Algorithms

4 Supervised Machine Learning Algorithms

4.1 Taxonomy of Machine Learning Algorithms

4.2 Regression Methods for Machine Learning

4.3 Supervised Classification Methods

4.4 Bayesian Network and Ensemble Methods

4.5 Conclusions

Homework Problems

References

5 Unsupervised Machine Learning Algorithms

5.1 Introduction and Association Analysis

5.2 Clustering Methods without Labels

5.3 Dimensionality Reduction and Other Algorithms

5.4 How to Choose Machine Learning Algorithms?

5.5 Conclusions

Homework Problems

References

6 Deep Learning with Artificial Neural Networks

6.1 Introduction

6.2 Artificial Neural Networks (ANN)

6.3 Stacked AutoEncoder and Deep Belief Network

6.4 Convolutional Neural Networks (CNN) and Extensions

6.5 Conclusions

Homework Problems

References

Part 3 Big Data Analytics for Health-Care and Cognitive Learning

7 Machine Learning for Big Data in Healthcare Applications

7.1 Healthcare Problems and Machine Learning Tools

7.2 IoT-based Healthcare Systems and Applications

7.3 Big Data Analytics for Healthcare Applications

7.4 Emotion-Control Healthcare Applications

7.5 Conclusions

Homework Problems

References

8 Deep Reinforcement Learning and Social Media Analytics

8.1 Deep Learning Systems and Social Media Industry

8.2 Text and Image Recognition using ANN and CNN

8.3 DeepMind with Deep Reinforcement Learning

8.4 Data Analytics for Social-Media Applications

8.5 Conclusions

Homework Problems

References

Index

EULA

List of Tables

Chapter 1

Table 1.1

Table 1.2

Table 1.3

Table 1.4

Table 1.5

Table 1.6

Table 1.7

Table 1.8

Table 1.9

Table 1.10

Table 1.11

Chapter 2

Table 2.1

Table 2.2

Table 2.3

Table 2.4

Table 2.5

Table 2.6

Table 2.7

Table 2.8

Table 2.9

Table 2.10

Table 2.11

Table 2.12

Table 2.13

Table 2.14

Table 2.15

Table 2.16

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Table 3.4

Table 3.5

Table 3.6

Table 3.7

Table 3.8

Chapter 4

Table 4.1

Table 4.2

Table 4.3

Table 4.4

Table 4.5

Table 4.6

Table 4.7

Table 4.8

Table 4.9

Table 4.10

Table 4.11

Table 4.12

Table 4.13

Table 4.14

Table 4.15

Table 4.16

Table 4.17

Table 4.18

Table 4.19

Table 4.20

Table 4.21

Table 4.22

Chapter 5

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Table 5.7

Table 5.8

Table 5.9

Table 5.10

Table 5.11

Table 5.12

Table 5.13

Table 5.14

Table 5.15

Table 5.16

Table 5.17

Chapter 6

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Table 6.5

Table 6.6

Table 6.7

Chapter 7

Table 7.1

Table 7.2

Table 7.3

Table 7.4

Table 7.5

Table 7.6

Table 7.7

Table 7.8

Table 7.9

Table 7.10

Table 7.11

Table 7.12

Chapter 8

Table 8.1

Table 8.2

Table 8.3

Table 8.4

List of Illustrations

Chapter 1

Figure 1.1

Big data characteristics: Five V's and corresponding challenges.

Figure 1.2

The evolution of data science up to the big data era.

Figure 1.3

Functional components of data science supported by some software libraries on the cloud in 2016.

Figure 1.4

Hype cycle for emerging high technologies to reach maturity and industrial productivity within the next decade. (Source: Gartner Research, July 2016, reprinted with permission.) [19]

Figure 1.5

Evolutional trend towards parallel, distributed and cloud computing using clusters, MPPs, P2P networks, computing grids, Internet clouds, web services and the Internet of things. (HPC: high-performance computing; HTC: high-throughput computing; P2P: peer-to-peer; MPP: massively parallel processors; RFID: Radio Frequency Identification [2].)

Figure 1.6

Technological convergence enabling cloud computing over the Internet. (Courtesy of Buyya, Broberg and Goscinski, reprinted with permission [3])

Figure 1.7

Interactions among social networks, mobile systems, big data analytics and cloud platforms over various Internet of Things (IoT) domains.

Figure 1.8

The Facebook platform offering over 2.4 millions of user applications [6].

Figure 1.9

Mobile core networks for wide-range communications have gone through five generations, while short-range wireless networks upgraded in data rate, QoS and applications.

Figure 1.10

The interactions of various radio-access networks (RANs) with the unified all-IP based mobile core network, Intranets and the Internet.

Figure 1.11

The architecture of a mobile cloud computing environment.

Figure 1.12

The evolution from basic analysis of small data (MB to GB) in the past to sophisticated cloud analytics over today's big datasets (TB∼PB).

Figure 1.13

Layered development of cloud platform for big data processing and analytics applications.

Figure 1.14

The relationship of data mining and machine learning.

Figure 1.15

Machine and deep learning applications classified in 16 categories.

Chapter 2

Figure 2.1

A generic architecture of a cloud computing system, where physical servers are virtualized as VM instances under the control of a resources management system.

Figure 2.2

Layered architectural development of the cloud platform for IaaS, PaaS and SaaS applications over the Internet.

Figure 2.3

Three cloud service models deployed by major providers. (Reprinted with permission from Dennis Gannon, Keynote address in

IEEE Cloudcom2010

)

Figure 2.4

Conceptual architecture of a modern cloud system for big data computing applications.

Figure 2.5

Two VM architectures compared with conventional physical machines.

Figure 2.6

The XEN Architecture: Domain 0 for resources control and I/O and several guest domains (VMs) are created for housing user applications.

Figure 2.7

The Docker engine accessing the Linux kernel features for isolated virtualization of different application containers.

Figure 2.8

Hypervisor versus Docker engine for creating virtual machines and application containers, respectively.

Figure 2.9

The AWS public cloud consisting of the top management layer, PaaS and IaaS platforms, and the global infrastructure built over datacenters in availability zones located at various regions globally.

Figure 2.10

Recovery overhead on a physical cluster compared with that for a virtual cluster.

Figure 2.11

Live migration of VM from the Dom0 domain to that of an XEN-enabled target host.

Figure 2.12

The Eucalyptus for building private cloud by establishing a virtual network over the VMs linked through Ethernet and the Internet.

Figure 2.13

VMware cloud platform built with vSphere, NSX and vSAN, working as a hybrid cloud with AWS.

Figure 2.14

The EC2 execution environment where Amazon Machine Image (AMI) can be created from public, private or paid pools with security protection.

Figure 2.15

The Amazon S3 storage service for holding unlimited data objects.

Figure 2.16

Google AppEngine platform for PaaS operations with load balancing.

Figure 2.17

Seven Salesforce cloud service offerings: all for SaaS applications except the custom cloud offering PaaS applications.

Figure 2.18

The capabilities of mobile device are enhanced by mobile clouds in a heterogeneous mobile computing environment.

Figure 2.19

Virtual-machine based cloudlets for mobile cloud computing applications.

Figure 2.20

Cloudlet mesh architecture for securing mobile cloud computing.

Figure 2.21

Workflow in a mashup of five cloud services for solving a patient healthcare problem.

Figure 2.22

MapReduce model for selecting skyline services to optimize the QoS.

Figure 2.23

Three data partitioning methods for MapReduce skyline query processing. (Reprinted with permission from F. Zhang, K. Hwang, et al. 2016 [23]).

Figure 2.24

Relative performance of three MapReduce methods for cloud mashup performance (Reprinted with permission from F. Zhang, K. Hwang, et al., 2016. [23])

Chapter 3

Figure 3.1

IoT enabling and synergistic technologies.

Figure 3.2

Projected IoT upgrade in five IoT application domains from 2010 to 2015. (Courtesy of Gubbi et al. 2013. [10]) Reproduced with permission of Elsevier.

Figure 3.3

The architecture of an Internet of Things (IoT) and its underlying technologies.

Figure 3.4

The 24-satellite GPS architecture: the satellites circle the Earth twice per day in multiple layers of fixed orbits without interference to each other.

Figure 3.5

Cloud-centric IoT system for smart home environment.

Figure 3.6

Conceptual architecture of a cloud-based radio access network (C-RAN). (Courtesy of China Mobile Research Institute, 2009.)

Figure 3.7

Interactions among IoT sensing, mobile monitoring and cloud analytics.

Figure 3.8

RFID readers retrieve product data on e-labels (RFID tags) placed on package boxes.

Figure 3.9

A typical example of an RFID system applied to automobile speeding check.

Figure 3.10

Supply chain management in a multi-partner business pipeline.

Figure 3.11

Power management in typical sensor operations.

Figure 3.12

Sensor devices built inside a typical smart phone in 2016.

Figure 3.13

A three-tier architecture based on a BAN communications system.

Figure 3.14

The ground GPS receiver calculates the 3-D location from four or more satellites with the help from a few ground reference stations and a master station.

Figure 3.15

Triangulation method to calculate delayed location signals from four satellites.

Figure 3.16

The promise of deep learning at Google's Brain Project (Reprinted with permission from a public slide presentation by Jeff Dean, 2016).

Figure 3.17

Growing use of deep learning at Google teams (Reprinted with permission from public presentation by Jeff Dean, 2016).

Figure 3.18

The concept of a Google speech recognition system built with deep recurrent neural networks.

Figure 3.19

Using a deep convolution neural network to understand particular images out of millions of photos belong to different or similar classes.

Figure 3.20

Spectrum from real environment to AR, AV and VR.

Chapter 4

Figure 4.1

Machine learning algorithms grouped by different learning styles.

Figure 4.2

Machine learning algorithms grouped by similarity testing.

Figure 4.3

Major steps in linear regression.

Figure 4.4

Unitary linear regression analysis.

Figure 4.5

The relation between body weight and blood pressure.

Figure 4.6

The curve of the sigmoid function applied in the regression method.

Figure 4.7

Principle of using logistic regression for classification purposes.

Figure 4.8

Four steps for the logistic regression process.

Figure 4.9

Three steps in building a classification model through sample data training.

Figure 4.10

Decision tree for making the decision to play tennis or not with probability given at the leaf node.

Figure 4.11

Decision tree for approving loan applications to bank customers.

Figure 4.12

Decision tree partitions for three attributes in Example 4.4.

Figure 4.13

Sequential coverage and data flow for rule extraction.

Figure 4.14

Rules generation strategies between general and specific properties.

Figure 4.15

Rule set generated from using decision tree.

Figure 4.16

Instance of three kinds of nearest neighbors.

Figure 4.17

Flow chart for the nearest neighbor classification algorithm.

Figure 4.18

The concept of using SVM to classify between two classes of sample data.

Figure 4.19

Sample space and a hyperplane solution for Example 4.7.

Figure 4.20

Linearly separating hyperplane with maximized margin from each class.

Figure 4.21

Nonlinear support vector machine.

Figure 4.22

Computational steps in a Naive Bayesian Classification process.

Figure 4.23

Two Bayesian belief networks with two different numbers of variables.

Figure 4.24

Conditional independence assumption of Naive Bayesian classifier.

Figure 4.25

Bayesian belief network for diabetics in Example 4.9.

Figure 4.26

Decision tree for the tennis tournament.

Figure 4.27

Random forest decision for playing tennis under various weather conditions.

Figure 4.28

The process of using random forest for ensemble decision making.

Figure 4.29

Diabetes random forest representation.

Chapter 5

Figure 5.1

Algorithm 5.1 illustrated by a flow chart with more details.

Figure 5.2

The flow chart for the generation of A priori rules.

Figure 5.3

Clustering of items in a hospital's physical examination reports.

Figure 5.4

K-means clustering of patients into three treating groups.

Figure 5.5

Proximity representation of clusters.

Figure 5.6

Tree map for an agglomerative hierarchical clustering.

Figure 5.7

The core point, frontier point and noise point in two clusters.

Figure 5.8

The result figure of density-based clustering analysis.

Figure 5.9

Principal Component Analysis (PCA) steps.

Figure 5.10

Contribution rate of principal component.

Figure 5.11

Explanation of principal component to physical examination indicator.

Figure 5.12

An example to illustrate semi-supervised machine learning.

Figure 5.13

The training score and cross-validation score match nicely in a well-fitting machine learning model created.

Figure 5.14

The over-fitting case when creating a learning model using the linear-SVC algorithm with a small dataset of up to 160 samples.

Figure 5.15

Reducing the model over-fitting effects by enlarging the training set to 800 samples.

Figure 5.16

Effects of using fewer features in the Linear-SVC algorithm.

Figure 5.17

Effects of applying the L1 data regularization on the linear-SVC performance.

Figure 5.18

Under-Fitting results in the Linear-SVC algorithm.

Figure 5.19

Effects of using different loss functions in machine learning model selection.

Chapter 6

Figure 6.1

Rationale versus perceptron modes to recognize a square object.

Figure 6.2

Hierarchy signal flows of human visual cortex in the brain, retina and fingers.

Figure 6.3

Schematic diagrams of biological neuron versus artificial neuron.

Figure 6.4

Concept of deep learning with an ANN having two hidden layers.

Figure 6.5

Process for classification or prediction by feature representation.

Figure 6.6

Conceptual diagram of a perceptron machine.

Figure 6.7

Common activation functions for the perceptron.

Figure 6.8

Structure of a two-layer artificial neural network.

Figure 6.9

General steps for modeling an artificial neural network

Figure 6.10

Forward Propagation based output prediction in a simple ANN with two hidden layers and four neurons at each layer.

Figure 6.11

Backward propagation based output prediction in a simple ANN with two hidden layers and four neurons per layer.

Figure 6.12

Training error and results on the ANN in Example 6.3.

Figure 6.13

Composition of supervised learning cycle.

Figure 6.14

Supervised learning in an ANN versus self-supervised learning in AutoEncoder.

Figure 6.15

Composition of self-supervised learning cycle.

Figure 6.16

Classification process by AutoEncoder.

Figure 6.17

Structure of a stacked AutoEncoder.

Figure 6.18

Sketch map of training stacked AutoEncoder.

Figure 6.19

The structure of a single stage of restricted Boltzmann machine (RBM).

Figure 6.20

Schematic diagram for learning image composition by RBM.

Figure 6.21

The structure of a deep belief network (DBN).

Figure 6.22

Convolutional Neural Network utilized in LeNet-5.

Figure 6.23

Schematic diagram for a convolutional ANN (CNN) (source: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial).

Figure 6.24

The concept of pooling from the 6 x 6 grid to a 2 x 2 grid (source: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial).

Figure 6.25

Schematic diagrams of convolutional neural network.

Figure 6.26

Successive convolutional and pooling steps in a CNN.

Figure 6.27

The architectural classification of various deep learning models. (Those marked with * are covered in this book.)

Figure 6.28

The structure of recurrent neural networks (RNN).

Figure 6.29

Contrast between RNN and ANN architectures.

Figure 6.30

Different forms of input or output applied in different deep learning applications.

Figure 6.31

Structure diagram of convolution neural network for image classification.

Figure 6.32

Stacked AutoEndcoder diagram.

Chapter 7

Figure 7.1

Factors that affect the detection accuracy in detecting chronic diseases.

Figure 7.2

Determinants of health (statistics from centers for disease control in 2003).

Figure 7.3

The layout of common human body sensors.

Figure 7.4

Common health monitoring system based on community services.

Figure 7.5

Communication architecture of exercise promotion devices.

Figure 7.6

Exercise promotion products available in 2016

Figure 7.7

Smart clothing application software and testbed settings.

Figure 7.8

Robotics and cloud-assisted healthcare system.

Figure 7.9

A typical health monitoring system built with smart clothing and backend cloud.

Figure 7.10

Classification results using logistic regression in Example 7.3.

Figure 7.11

Five machine learning models for disease prediction based on medical big data.

Figure 7.12

Relative performance of 5 machine learning methods for disease prediction.

Figure 7.13

ROC curve of the disease prediction results using hospital data.

Figure 7.14

Methods for the high-risk patient prediction process encompassing exploration, preprocessing and evaluation stages.

Figure 7.15

Mental healthcare for special groups of populations.

Figure 7.16

Instance space and feature space for transfer machine learning.

Figure 7.17

The concept of transfer learning for emotion labeling.

Figure 7.18

Layered architecture of the AIWAC emotion monitory system (reprinted with permission from Zhang et al., 2015 [17]).

Figure 7.19

Humanoid robotics for affective interactions between AIWAC and clients.

Figure 7.20

Robot affection interaction based on cloud computing.

Figure 7.21

Architecture of a smart cloud/IoT/5G based cognitive system.

Figure 7.22

Two applications of a smart cognitive system.

Chapter 8

Figure 8.1

A deep learning system for recognizing handwritten numerals.

Figure 8.2

Results of TensorFlow based on programming an artificial neural network.

Figure 8.3

Structure for handwritten numeral recognition with the CNN network.

Figure 8.4

Structure of the Deep ID human face recognition system with a CNN with 10 layers including the I/O layers.

Figure 8.5

The process of text classification with deep learning.

Figure 8.6

One-hot representation for word embedding.

Figure 8.7

Distributed Representation for word embedding.

Figure 8.8

Model for risk disease assessment through medical text learning using CNN network.

Figure 8.9

Convolution for word xw

n

(word No. n).

Figure 8.10

The Gorila architecture for implementing the Google reinforcement learning system (reprinted with permission from David Silver, Google DeepMind, http://www0.cs.ucl.ac.uk/staff/d.silver/web/ Resources_files/).

Figure 8.11

Interaction of an agent and environment in deep learning (reprinted with permission from David Silver's presentation at the International Conference on Machine Learning, ICML 2016) [20].

Figure 8.12

Convolutional neural network construction over the Go playing board.

Figure 8.13

Self-play training pipeline between policy network and value networks (reprinted with permission from David Silver, Google DeepMind, http://icml.cc/2016/tutorials/AlphaGo-tutorial-slides.pdf) [20].

Figure 8.14

The off-line learning process of the AlphaGo program (courtesy of artwork by Lu Wang and Yiming Miao, Huazhong University of Science and Technology, China).

Figure 8.15

The on-line playing process of the AlphaGo game (courtesy of Artwork by Xiaobo Shi and Ping Zhou, Huazhong University of Science and Technology, China).

Figure 8.16

Reinforcement learning applied in Artari game play.

Figure 8.17

The flowchart of DQN algorithm for playing the Flappybird game.

Figure 8.18

The construction of the convolutional neural network used in the FlappyBird Game.

Figure 8.19

Micro-level, meso-level and macro-level construction of social networks.

Figure 8.20

The graph representation of a social network.

Figure 8.21

Graph representation of a social network.

Figure 8.22

Detection of spam from 1 TB of Twitter blogs on EC2 cluster using Baysian classifier (reprinted with permission from Y. Shi, S. Ablilash and K. Hwang, IEEE Mobile Cloud, 2015) [23].

Figure 8.23

Forming a community graph by join, leave, growth, merging, splitting and contraction.

Figure 8.24

High school class formation based on the grade membership of students [22].

Guide

Cover

Table of Contents

Preface

Pages

xi

xii

xiii

xiv

xv

xvi

xvii

1

3

4

5

6

7

8

9

10

11

12

13

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

67

68

69

70

71

73

74

75

77

78

79

81

82

83

84

85

86

88

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

148

149

150

151

152

153

154

155

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

197

198

200

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

295

296

298

299

300

301

302

303

304

305

307

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

335

336

337

338

339

340

341

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

373

374

375

376

377

378

379

380

381

382

383

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

About the Authors

Kai Hwang is Professor of Electrical Engineering and Computer Science at the University of Southern California (USC). He has also served as a visiting Chair Professor at Tsinghua University, Hong Kong University, University of Minnesota and Taiwan University. With a PhD from the University of California, Berkeley, he specializes in computer architecture, parallel processing, wireless Internet, cloud computing, distributed systems and network security. He has published eight books, including Computer Architecture and Parallel Processing (McGraw-Hill 1983) and Advanced Computer Architecture (McGraw-Hill 2010). The American Library Association has named his book: Distributed and Cloud Computing (with Fox and Dongarra) as a 2012 outstanding title published by Morgan Kaufmann. His new book, Cloud Computing for Machine Learning and Cognitive Applications (MIT Press 2017) is a good companion to this book. Dr Hwang has published 260 scientific papers. Google Scholars has cited his published work 16,476 times with an h-index of 54 as of early 2017. An IEEE Life Fellow, he has served as the founding Editor-in-Chief of the Journal of Parallel and Distributed Computing (JPDC) for 28 years.

Dr Hwang has served on the editorial boards of IEEE Transactions on Cloud Computing (TCC), Parallel and Distributed Systems (TPDS), Service Computing (TSC) and the Journal of Big Data Intelligence. He has received the Lifetime Achievement Award from IEEE CloudCom 2012 and the Founder's Award from IEEE IPDPS 2011. He received the 2004 Outstanding Achievement Award from China Computer Federation (CCF). Over the years, he has produced 21 PhD students at USC and Purdue University, four of them elevated to IEEE Fellows and one an IBM Fellow. He has chaired numerous international conferences and delivered over 50 keynote speech and distinguished lectures in IEEE/ACM/CCF conferences or at major universities worldwide. He has served as a consultant or visiting scientist for IBM, Intel, Fujitsu Reach Lab, MIT Lincoln Lab, JPL at Caltech, French ENRIA, ITRI in Taiwan, GMD in Germany, and the Chinese Academy of Sciences.

Min Chen is a Professor of Computer Science and Technology at Huazhong University of Science and Technology (HUST), where he serves as the Director of the Embedded and Pervasive Computing (EPIC) Laboratory. He has chaired the IEEE Computer Society Special Technical Communities on Big Data. He was on the faculty of the School of Computer Science and Engineering at Seoul National University from 2009 to 2012. Prior to that, he has worked as a postdoctoral fellow in the Department of Electrical and Computer Engineering, University of British Columbia for 3 years.

Dr Chen received Best Paper Award from IEEE ICC 2012. He is a Guest Editor for IEEE Network, IEEE Wireless Communications Magazine, etc. He has published 260 papers including 150+ SCI-indexed papers. He has 20 ESI highly cited or hot papers. He has published the book: OPNET IoT Simulation (2015) and Software Defined 5G Networks (2016) with HUST Press, and another book on Big Data Related Technologies (2014) in the Springer Series in Computer Science. As of early 2017, Google Scholars cited his published work over 8,350 times with an h-index of 45. His top paper was cited more than 900 times. He has been an IEEE Senior Member since 2009. His research focuses on the Internet of Things, Mobile Cloud, Body Area Networks, Emotion-aware Computing, Healthcare Big Data, Cyber Physical Systems, and Robotics.

Preface

Motivations and Objectives

In the past decade, the computer and information industry has experienced rapid changes in both platform scale and scope of applications. Computers, smart phones, clouds and social networks demand not only high performance but also a high degree of machine intelligence. In fact, we are entering an era of big data analysis and cognitive computing. This trendy movement is observed by the pervasive use of mobile phones, storage and computing clouds, revival of artificial intelligence in practice, extended supercomputer applications, and widespread deployment of Internet of Things (IoT) platforms. To face these new computing and communication paradigm, we must upgrade the cloud and IoT ecosystems with new capabilities such as machine learning, IoT sensing, data analytics, and cognitive power that can mimic or augment human intelligence.

In the big data era, successful cloud systems, web services and data centers must be designed to store, process, learn and analyze big data to discover new knowledge or make critical decisions. The purpose is to build up a big data industry to provide cognitive services to offset human shortcomings in handling labor-intensive tasks with high efficiency. These goals are achieved through hardware virtualization, machine learning, deep learning, IoT sensing, data analytics, and cognitive computing. For example, new cloud services appear as Learning as a Services (LaaS), Analytics as a Service (AaaS), or Security as a Service (SaaS), along with the growing practices of machine learning and data analytics.

Today, IT companies, big enterprises, universities and governments are mostly converting their data centers into cloud facilities to support mobile and networked applications. Supercomputers having a similar cluster architecture as clouds are also under transformation to deal with the large data sets or streams. Smart clouds become greatly on demand to support social, media, mobile, business and government operations. Supercomputers and cloud platforms have different ecosystems and programming environments. The gap between them must close up towards big data computing in the future. This book attempts to achieve this goal.

A Quick Glance of the Book

The book consists of eight Chapters, presented in a logic flow of three technical parts. The three parts should be read or taught in a sequence, entirely or selectively.

Part I

has three chapters on data science, the roles of clouds, and IoT devices or frameworks for big data computing. These chapters cover enabling technologies to explore smart cloud computing with big data analytics and cognitive machine learning capabilities. We cover cloud architecture, IoT and cognitive systems, and software support. Mobile clouds and IoT interaction frameworks are illustrated with concrete system design and application examples.

Part II

has three chapters devoted to the principles and algorithms for machine learning, data analytics, and deep learning in big data applications. We present both supervised and unsupervised machine learning methods and deep learning with artificial neural networks. The brain-inspired computer architectures, such as IBM SyNapse's TrueNorth processors, Google tensor processing unit used in Brain programs, and China's Cambricon chips are also covered here. These chapters lay the necessary foundations for design methodologies and algorithm implementations.

Part III

presents two chapters on big data analytics for machine learning for healthcare and deep learning for cognitive and social-media applications. Readers should master themselves with the systems, algorithms and software tools such as Google's DeepMind projects in promoting big data AI applications on clouds or even on mobile devices or any computer systems. We integrate SMACT technologies (

Social, Mobile, Analytics, Clouds and IoT

) towards building an intelligent and cognitive computing environments for the future.

Part I: Big Data, Clouds and Internet of Things

Chapter 1: Big Data Science and Machine Intelligence

Chapter 2: Smart Clouds, Virtualization and Mashup Services

Chapter 3: IoT Sensing, Mobile and Cognitive Systems

Part II: Machine Learning and Deep Learning Algorithms

Chapter 4: Supervised Machine Learning Algorithms

Chapter 5: Unsupervised Machine Learning Algorithms

Chapter 6: Deep Learning with Artificial Neural Networks

Part III: Big Data Analytics for Health-Care and Cognitive Learning

Chapter 7: Machine Learning for Big Data in Healthcare Applications

Chapter 8: Deep Reinforcement Learning and Social Media Analytics

Our Unique Approach

To promote effective big data computing on smart clouds or supercomputers, we take a technological fusion approach by integrating big data theories with cloud design principles and supercomputing standards. The IoT sensing enables large data collection. Machine learning and data analytics help decision-making. Augmenting clouds and supercomputers with artificial intelligence (AI) features is our fundamental goal. These AI and machine learning tasks are supported by Hadoop, Spark and TensorFlow programming libraries in real-life applications.

The book material is based on the authors' research and teaching experiences over the years. It will benefit those who leverage their computer, analytical and application skills to push for career development, business transformation and scientific discovery in the big data world. This book blends big data theories with emerging technologies on smart clouds and exploring distributed datacenters with new applications. Today, we see cyber physical systems appearing in smart cities, autonomous car driving on the roads, emotion-detection robotics, virtual reality, augmented reality and cognitive services in everyday life.

Building Cloud/IoT Platforms with AI Capabilities

The data analysts, cognitive scientists and computer professionals must work together to solve practical problems. This collaborative learning must involve clouds, mobile devices, datacenters and IoT resources. The ultimate goal is to discover new knowledge, or make important decisions, intelligently. For many years, we have wanted to build brain-like computers that can mimic or augment human functions in sensing, memory, recognition and comprehension. Today, Google, IBM, Microsoft, the Chinese Academy of Science, and Facebook are all exploring AI in cloud and IoT applications.

Some new neuromorphic chips and software platforms are now built by leading research centers to enable cognitive computing. We will examine these advances in hardware, software and ecosystems. The book emphasizes not only machine learning in pattern recognition, speech/image understanding, language translation and comprehension, with low cost and power requirements, but also the emerging new approaches in building future computers.

One example is to build a small rescue robotic system that can automatically distinguish between voices in a meeting and create accurate transcripts for each speaker. Smart computers or cloud systems should be able to recognize faces, detect emotions, and even may be able to issue tsunami alerts or predict earthquakes and severe weather conditions, more accurately and timely. We will cover these and related topics in the three logical parts of the book: systems, algorithms and applications. To close up the application gaps between clouds and big data user groups, over 100 illustrative examples are given to emphasize the strong collaboration among professionals working in different areas.

Intended Audience and Readers Guide

To serve the best interest of our readers, we write this book to meet the growing demand of the updated curriculum in Computer Science and Electrical Engineering education. By teaching various subsets of nine chapters, instructors can use the book at both senior and graduate levels. Four university courses may adopt this book in the subject areas of Big Data Analytics (BD), Cloud Computing (CC), Machine Learning (ML) and Cognitive Systems (CS). Readers could also use the book as a major reference. The suggested course offerings are growing rapidly at major universities throughout the world. Logically, the reading of the book should follow the order of the three parts.

The book will also benefit computer professionals who wish to transform their skills to meet new IT challenges. For examples, interested readers may include Intel engineers working on Cloud of Things. Google brain and DeepMind teams develop machine learning services including autonomic vehicle driving. Facebook explores new AI features, social and entertainment services based on AV/VR (augmented and virtual realities) technology. IBM clients expect to push cognitive computing services in the business and social-media world. Buyers and sellers on Amazon and Alibaba clouds may want to expand their on-line transaction experiences with many other forms of e-commerce and social services.

Instructor Guide

Instructors can teach only selected chapters that match their own expertise and serve the best interest of students at appropriate levels. To teach in each individual subject area (BD, CC, ML and CS), each course covers 6 to 7 chapters as suggested below:

Big Data Science

(BD):{1, 2, 4, 5, 6, 7, 8};

Cloud Computing

(CC): {1, 2, 4, 5, 6, 7, 8};

Machine Learning

(ML):{1, 4, 5, 6, 7, 8};

Cognitive Systems

(CS):{1, 2, 3, 4, 6, 7, 8}.

Instructors can also choose to offer a course to cover the union of two subject areas such as in the following 3 combinations.

{

BD, CC

}, {

CC, CS

}, or {

BD

,

ML

}, each covering 7 to 8 chapters. All eight chapters must be taught in any course covering three or more of the above subject areas. For example, a course for {

BD

,

CC

,

ML

} or {

CC

,

ML

,

CS

}, must teach all 8 chapters. In total, there are nine possible ways to use the book to teach various courses at senior or graduate levels.

Solutions Manual and PowerPoint slides will be made available to instructors who wish to use the material for classroom use. The website materials will be available in late 2017.

About the Companion Website

Big-Data Analytics for Cloud, IoT and Cognitive Computing is accompanied by a website:

www.wiley.com/go/hwangIOT

The website includes:

PowerPoint slides

Solutions Manual

Part 1Big Data, Clouds and Internet of Things

1Big Data Science and Machine Intelligence

CHAPTER OUTLINE

1.1 Enabling Technologies for Big Data Computing

1.1.1 Data Science and Related Disciplines

1.1.2 Emerging Technologies in the Next Decade

1.1.3 Interactive SMACT Technologies

1.2 Social-Media, Mobile Networks and Cloud Computing

1.2.1 Social Networks and Web Service Sites

1.2.2 Mobile Cellular Core Networks

1.2.3 Mobile Devices and Internet Edge Networks

1.2.4 Mobile Cloud Computing Infrastructure

1.3 Big Data Acquisition and Analytics Evolution

1.3.1 Big Data Value Chain Extracted from Massive Data

1.3.2 Data Quality Control, Representation and Database Models

1.3.3 Big Data Acquisition and Preprocessing

1.3.4 Evolving Data Analytics over the Clouds

1.4 Machine Intelligence and Big Data Applications

1.4.1 Data Mining and Machine Learning

1.4.2 Big Data Applications – An Overview

1.4.3 Cognitive Computing – An Introduction

1.5 Conclusions

1.1 Enabling Technologies for Big Data Computing

Over the past three decades, the state of high technology has gone through major changes in computing and communication platforms. In particular, we benefit greatly from the upgraded performance of the Internet and World Wide Web (WWW). We examine here the evolutional changes in platform architecture, deployed infrastructures, network connectivity and application variations. Instead of using desktop or personal computers to solve computational problems, the clouds appear as cost-efficient platforms to perform large-scale database search, storage and computing over the Internet.

This chapter introduces the basic concepts of data science and its enabling technologies. The ultimate goal is to blend together the sensor networks, RFID (radio frequency identification) tagging, GPS services, social networks, smart phones, tablets, clouds and Mashups, WiFi, Bluetooth, wireless Internet+, and 4G/5G core networks with the emerging Internet of Things (IoT) to build a productive big data industry in the years to come. In particular, we will examine the idea of technology fusion among the SMACT technologies.

1.1.1 Data Science and Related Disciplines

The concept of data science has a long history, but only recently became very popular due to the increasing use of clouds and IoT for building a smart world. As illustrated in Figure 1.1, today's big data possesses three important characteristics: data in large volume, demanding high velocity to process them, and many varieties of data types. These are often known as the five V's of big data, because some people add two more V's of big data: one is the veracity, which refers to the difficulty to trace data or predict data. The other is the data value, which can vary drastically if the data are handled differently.

Figure 1.1 Big data characteristics: Five V's and corresponding challenges.

By today's standards, one Terabyte or greater is considered a big data. IDC has predicted that 40 ZB of data will be processed by 2030, meaning each person may have 5.2 TB of data to be processed. The high volume demands large storage capacity and analytical capabilities to handle such massive volumes of data. The high variety implies that data comes in many different formats, which can be very difficult and expensive to manage accurately. The high velocity refers to the inability to process big data in real time to extract meaningful information or knowledge from it. The veracity implies that it is rather difficult to verify data. The value of big data varies with its application domains. All the five V's make it difficult to capture, manage and process big data using the existing hardware/software infrastructure. These 5 V's justify the call for smarter clouds and IoT support.

Forbes, Wikipedia and NIST have provided some historical reviews of this field. To illustrate its evolution to a big data era, we divide the timeline into four stages, as shown in Figure 1.2. In the 1970s, some considered data science equivalent to data logy, as noted by Peter Naur: “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.” At one time, data science was regarded as part of statistics in a wide range of applications. Since the 2000s, the scope of data science has become enlarged. It became a continuation of the field of data mining and predictive analytics, also known as the field of knowledge discovery and data mining (KDD).

Figure 1.2 The evolution of data science up to the big data era.

In this context, programming is viewed as part of data science. Over the past two decades, data has increased on an escalating scale in various fields. The data science evolution enables the extraction of knowledge from massive volumes of data that are structured or unstructured. Unstructured data include emails, videos, photos, social media, and other user-generated contents. The management of big data requires scalability across large amounts of storage, computing and communication resources.

Formally, we define data science as the process of extraction of actionable knowledge directly from data through data discovery, hypothesis and analytical hypothesis. A data scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific process through each stage in the big data life cycle.

Today's data science requires aggregation and sorting through a great amount of information and writing algorithms to extract insights from such a large scale of data elements. Data science has a wide range of applications, especially in clinical trials, biological science, agriculture, medical care and social networks, etc [1]. We divide the value chain of big data into four phases: namely data generation, acquisition, storage and analysis. If we take data as a raw material, data generation and data acquisition are an exploitation process. Data storage and data analysis form a production process that adds values to the raw material.

In Figure 1.3, data science is considered as the intersection of three interdisciplinary areas: computer science or programming skills, mathematics and statistics, and application domain expertise. Most data scientists started as domain experts who are mastered in mathematical modeling, data mining techniques and data analytics. Through the combination of domain knowledge and mathematical skills, specific models are developed while algorithms are designed. Data science runs across the entire data life cycle. It incorporates principles, techniques and methods from many disciplines and domains, including data mining and analytics, especially when machine learning and pattern recognition are applied.

Figure 1.3 Functional components of data science supported by some software libraries on the cloud in 2016.

Statistics, operations research, visualization and domain knowledge are also indispensable. Data science teams solve very complex data problems. As shown in Figure 1.3, when ever two areas overlap, they generate three important specialized fields of interest. The modeling field is formed by intersecting domain expertise with mathematical statistics. The knowledge to be discovered is often described by abstract mathematical language. Another field is data analytics, which has resulted from the intersection of domain expertise and programming skills. Domain experts apply special programming tools to discover knowledge by solving practical problem in their domain. Finally, the field of algorithms is the intersection of programming skills and mathematical statistics. Summarized below are some open challenges in big data research, development and applications:

Structured versus unstructured data with effective indexing;

Identification, de-identification and re-identification;

Ontologies and semantics of big data;

Data introspection and reduction techniques;

Design, construction, operation and description;

Data integration and software interoperability;

Immutability and immortality;

Data measurement methods;

Data range, denominators, trending and estimation.

1.1.2 Emerging Technologies in the Next Decade

Garnter Research is an authoritative source of new technologies. They identify the hottest emerging new technologies in hype cycles every year. In Figure 1.4 we examine Gartner's Hype Cycle for new emerging technologies across many fields in 2016. The time taken for an emerging technology to become mature may take 2 to 10 years to reach its plateau of productivity. By 2016, the most expected technologies are identified at the peak of the hype cycle. The top 12 include cognitive expert advisors, machine learning, software defined security, connected home, autonomous vehicles, blockchain, nanotube electronics, smart robots, micro datacenters, gesture control devices, IoT platforms, and drones (commercial UAVs).

Figure 1.4 Hype cycle for emerging high technologies to reach maturity and industrial productivity within the next decade. (Source: Gartner Research, July 2016, reprinted with permission.) [19]

As identified by the dark solid circles, most technologies take 5 to 10 years to mature. The light solid circles, such as machine learning, software defined anything (SDx) and natural language answering, are those that may become mature in 2 to 5 years’ time. Readers should check hype cycles released in previous years to find more hot technologies. The triangles identify those that may take more than 10 years of further development. They are 4-D printing, general-purpose machine intelligence, neuromorphic hardware, quantum computing and autonomous vehicles, etc. Self-driving cars were a hot topic in 2016, but may need more time to be accepted, either technically or legally. The enterprise taxonomy and ontology management are entering the disillusion stage, but still they may take a long shot at becoming a reality.

Other hot technologies, like augmented reality and virtual reality, resulted in disillusionment, but they are heading towards industrial productivity now. At the early innovation trigger stage, we observe that Wifi 11.ac and context brokering are rising on the horizon, together with Data broker PaaS (dbrPaaS), personal analytics, smart workplace, conversational user interfaces, smart data discovery, affective computing, virtual personal assistant, digital security and people-literate technology. Many other technologies on the rising edge of the expectation curve include 3-D bio-printing, connected homes, biochips, software-defined security, etc. This hype cycle does include more mature technologies such as hybrid cloud computing, cryptocurrency exchange and enterprise 3-D printing identified in previous years.

Some of the more mature technologies such as cloud computing, social networks, near-field communication (NFC), 3-D scanners, consumer telematics and speech recognition, that have appeared in hype cycles released from 2010 to 2015, do not appear in Figure 1.4. The depth of disillusionment may not be bad, because as interest wanes after extensive experiments, useful lessons are learned to deliver products more successfully. Those long-shot technologies marked by triangles in the hype cycle cannot be ignored either. Most industrial developers are near-sighted or very conservative in the sense that they only adopt mature technologies that can generate a profitable product quickly. Traditionally, the long-shot or high-risk technologies such as quantum computing, smart dust, bio-acoustic sensing, volumetric displays, brain–human interface and neurocomputers are only heavily pursued in academia.

It has been well accepted that technology will continue to become more human-centric, to the point where it will introduce transparency between people, businesses and things. This relationship will surface more as the evolution of technology becomes more adaptive, contextual and fluid within the workplace, at home, and interacting with the business world. As hinted above, we see the emergence of 4-D printing, brain-like computing, human augmentation, volumetric displays, affective computing, connected homes, nanotube electronics, augmented reality, virtual reality and gesture control devices. Some of these will be covered in subsequent chapters.

There are predictable trends in technology that drive computing applications. Designers and programmers want to predict the technological capabilities of future systems. Jim Gray's “Rules of Thumb in Data Engineering” paper is an excellent example of how technology affects applications and vice versa. Moore's Law indicates that the processor speed doubles every 18 months. This was indeed true for the past 30 years. However, it is hard to say that Moore's Law will hold for much longer in the future. Gilder's Law indicates that the network bandwidth doubled yearly in the past. The tremendous price/performance ratio of commodity hardware was driven by the smart phone, tablets and notebook markets. This has also enriched commodity technologies in large-scale computing.

It is interesting to see the high expectation of IoT in recent years. The cloud computing in mashup or other applications demands computing economics, web-scale data collection, system reliability and scalable performance. For example, distributed transaction processing is often practised in the banking and finance industries. Transactions represent 90% of the existing market for reliable banking systems. Users must deal with multiple database servers in distributed transactions. How to maintain the consistency of replicated transaction records is crucial in real-time banking services. Other complications include shortage of software support, network saturation and security threats in these business applications.

A number of more mature technologies that may take 2 to 5 years to reach the plateau are highlighted by light gray dots in Figure 1.4. These include biochip, advanced analytics, speech-to-speech translation, machine learning, hybrid cloud computing, cryptocurrency exchange, autonomous field vehicles, gesture control and enterprise 3-D printing. Some of the mature technologies that are pursued heavily by industry now are not shown in the 2016 hype cycle as emerging technologies. These may include cloud computing, social networks, near-field communication (NFC), 3-D scanners, consumer telematics and speech recognition that appeared in the hype cycles in last several years.

It is interesting to see the high expectation of IoT in recent years. The cloud computing in mashup or hybrid clouds has already been adopted in the mainstream. As time goes by, most technologies will advance to better stages of expectation. As mentioned above, the depth of disillusionment may not be too bad, as interest wanes after extensive experiments, and useful lessons are learned to deliver products successfully. It should noted that those long-shot technologies marked by triangles in the hype cycle may take more than 10 years to become an industrial reality. These include the rising areas of quantum computing, smart dust, bio-acoustic sensing, volumetric displays, human augmentation, brain–human interface and neuro-business popular in the academia and research communities.

The general computing trend is to leverage more and more on shared web resources over the Internet. As illustrated in Figure 1.5, we see the evolution from two tracks of system development: HPC versus HTC systems. On the HPC side, supercomputers (massively parallel processors, MPP) are gradually replaced by clusters of cooperative computers out of a desire to share computing resources. The cluster is often a collection of homogeneous compute nodes that are physically connected in close range to each other.

Figure 1.5 Evolutional trend towards parallel, distributed and cloud computing using clusters, MPPs, P2P networks, computing grids, Internet clouds, web services and the Internet of things. (HPC: high-performance computing; HTC: high-throughput computing; P2P: peer-to-peer; MPP: massively parallel processors; RFID: Radio Frequency Identification [2].)

On the HTC side, Peer-to-Peer (P2P) networks are formed for distributed file sharing and content delivery applications. Both P2P, cloud computing and web service platforms place more emphasis on HTC rather than HPC applications. For many years, HPC systems emphasized raw speed performance. Therefore, we are facing a strategic change from the HPC to the HTC paradigm. This HTC paradigm pays more attention to high-flux multi-computing, where Internet searches and web services are requested by millions or more users simultaneously. The performance goal is thus shifted to measure the high throughput or the number of tasks completed per unit of time.

In the big data era, we are facing a data deluge problem. Data comes from IoT sensors, lab experiments, simulations, society archives and the web in all scales and formats. Preservation, movement and access of massive datasets require generic tools supporting high performance scalable file systems, databases, algorithms, workflow and visualization. With science becoming data centric, a new paradigm of scientific discovery is based on data intensive computing. We need to foster tools for data capture, data creation and data analysis. The cloud and IoT technologies are driven by the surge of interest in the deluge of data.

The Internet and WWW are used by billions of people every day. As a result, large datacenters or clouds must be designed to provide not only big storage but also distributed computing power to satisfy the requests of a large number of users simultaneously. The emergence of public or hybrid clouds demands the upgrade of many datacenters using larger server clusters, distributed file systems and high-bandwidth networks. With massive smart phones and tablets requesting services, the cloud engines, distributed storage and mobile networks must interact with the Internet to deliver mashup services in web-scale mobile computing over the social and media networks closely.

Both P2P, cloud computing and web service platforms emphasize high-throughput over a large number of user tasks, rather than high performance as often targeted in using supercomputers. This high-throughput paradigm pays more attention to the high flux of user tasks concurrently or simultaneously. The main application of the high-flux cloud system lies in Internet searches and web services. The performance goal is thus shifted to measure the high throughput or the number of tasks completed per unit of time. This not only demands improvement in the high speed of batch processing, but also addresses the acute problem of cost, energy saving, security and reliability in the clouds.

The advances in virtualization make it possible to use Internet clouds in massive user services. In fact, the differences among clusters, P2P systems and clouds may become blurred. Some view the clouds as computing clusters with modest changes in virtualization. Others anticipate the effective processing of huge datasets generated by web services, social networks and IoT. In this sense, many users consider cloud platforms a form of utility computing or service computing.

1.1.2.1 Convergence of Technologies

Cloud computing is enabled by the convergence of the four technologies illustrated in Figure 1.6. Hardware virtualization and multicore chips make it possible to have dynamic configurations in clouds. Utility and grid computing technologies lay the necessary foundation of computing clouds. Recent advances in service oriented architecture (SOA), Web 2.0 and mashups of platforms are pushing the cloud to another forward step. Autonomic computing and automated datacenter operations have enabled cloud computing.

Figure 1.6 Technological convergence enabling cloud computing over the Internet. (Courtesy of Buyya, Broberg and Goscinski, reprinted with permission [3])

Cloud computing explores the muti-core and parallel computing technologies. To realize the vision on data-intensive systems, we need to converge from four areas: namely hardware, Internet technology, distributed computing and system management, as illustrated in Figure 1.6. Today's Internet technology places the emphasis on SOA and Web 2.0 services. Utility and grid computing lay the distributed computing foundation needed for cloud computing. Finally, we cannot ignore the widespread use of datacenters with virtualization techniques applied to automate the resources provisioning process in clouds.

1.1.2.2 Utility Computing