Machine Learning and Big Data -  - E-Book

Machine Learning and Big Data E-Book

0,0
197,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book is intended for academic and industrial developers, exploring and developing applications in the area of big data and machine learning, including those that are solving technology requirements, evaluation of methodology advances and algorithm demonstrations. The intent of this book is to provide awareness of algorithms used for machine learning and big data in the academic and professional community. The 17 chapters are divided into 5 sections: Theoretical Fundamentals; Big Data and Pattern Recognition; Machine Learning: Algorithms & Applications; Machine Learning's Next Frontier and Hands-On and Case Study. While it dwells on the foundations of machine learning and big data as a part of analytics, it also focuses on contemporary topics for research and development. In this regard, the book covers machine learning algorithms and their modern applications in developing automated systems. Subjects covered in detail include: * Mathematical foundations of machine learning with various examples. * An empirical study of supervised learning algorithms like Naïve Bayes, KNN and semi-supervised learning algorithms viz. S3VM, Graph-Based, Multiview. * Precise study on unsupervised learning algorithms like GMM, K-mean clustering, Dritchlet process mixture model, X-means and Reinforcement learning algorithm with Q learning, R learning, TD learning, SARSA Learning, and so forth. * Hands-on machine leaning open source tools viz. Apache Mahout, H2O. * Case studies for readers to analyze the prescribed cases and present their solutions or interpretations with intrusion detection in MANETS using machine learning. * Showcase on novel user-cases: Implications of Electronic Governance as well as Pragmatic Study of BD/ML technologies for agriculture, healthcare, social media, industry, banking, insurance and so on.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 632

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright Page

Preface

Section 1: THEORETICAL FUNDAMENTALS

1 Mathematical Foundation

1.1 Concept of Linear Algebra

1.2 Eigenvalues, Eigenvectors, and Eigendecomposition of a Matrix

1.3 Introduction to Calculus

References

2 Theory of Probability

2.1 Introduction

2.2 Independence in Probability

2.3 Conditional Probability

2.4 Cumulative Distribution Function

2.5 Baye’s Theorem

2.6 Multivariate Gaussian Function

References

3 Correlation and Regression

3.1 Introduction

3.2 Correlation

3.3 Regression

3.4 Conclusion

References

Section 2: BIG DATA AND PATTERN RECOGNITION

4 Data Preprocess

4.1 Introduction

4.2 Data Cleaning

4.3 Data Integration

4.4 Data Transformation

4.5 Data Reduction

4.6 Conclusion

Acknowledgements

References

5 Big Data

5.1 Introduction

5.2 Big Data Evaluation With Its Tools

5.3 Architecture of Big Data

5.4 Issues and Challenges

5.5 Big Data Analytics Tools

5.6 Big Data Use Cases

5.7 Where IoT Meets Big Data

5.8 Role of Machine Learning For Big Data and IoT

5.9 Conclusion

References

6 Pattern Recognition Concepts

6.1 Classifier

6.2 Feature Processing

6.3 Clustering

6.4 Conclusion

References

Section 3: MACHINE LEARNING: ALGORITHMS & APPLICATIONS

7 Machine Learning

7.1 History and Purpose of Machine Learning

7.2 Concept of Well-Defined Learning Problem

7.3 General-to-Specific Ordering Over Hypotheses

7.4 Version Spaces and Candidate Elimination Algorithm

7.5 Concepts of Machine Learning Algorithm

Conclusion

References

8 Performance of Supervised Learning Algorithms on Multi-Variate Datasets

8.1 Introduction

8.2 Supervised Learning Algorithms

8.3 Classification

8.4 Neural Network

8.5 Comparisons and Discussions

8.6 Summary and Conclusion

References

9 Unsupervised Learning

9.1 Introduction

9.2 Related Work

9.3 Unsupervised Learning Algorithms

9.4 Classification of Unsupervised Learning Algorithms

9.5 Unsupervised Learning Algorithms in ML

9.6 Summary and Conclusions

References

10 Semi-Supervised Learning

10.1 Introduction

10.2 Training Models

10.3 Generative Models—Introduction

10.4 S3VMs

10.5 Graph-Based Algorithms

10.6 Multiview Learning

10.7 Conclusion

References

11 Reinforcement Learning

11.1 Introduction: Reinforcement Learning

11.2 Model-Free RL

11.3 Model-Based RL

11.4 Conclusion

References

12 Application of Big Data and Machine Learning

12.1 Introduction

12.2 Motivation

12.3 Related Work

12.4 Application of Big Data and ML

12.5 Issues and Challenges

12.6 Conclusion

References

Section 4: MACHINE LEARNING’S NEXT FRONTIER

13 Transfer Learning

13.1 Introduction

13.2 Traditional Learning vs. Transfer Learning

13.3 Key Takeaways: Functionality

13.4 Transfer Learning Methodologies

13.5 Inductive Transfer Learning

13.6 Unsupervised Transfer Learning

13.7 Transductive Transfer Learning

13.8 Categories in Transfer Learning

13.9 Instance Transfer

13.10 Feature Representation Transfer

13.11 Parameter Transfer

13.12 Relational Knowledge Transfer

13.13 Relationship With Deep Learning

13.14 Applications: Allied Classical Problems

13.15 Further Advancements and Conclusion

References

Section 5: HANDS-ON AND CASE STUDY

14 Hands on MAHOUT—Machine Learning Tool

14.1 Introduction to Mahout

14.2 Installation Steps of Apache Mahout Using Cloudera

14.3 Installation Steps of Apache Mahout Using Windows 10

14.4 Installation Steps of Apache Mahout Using Eclipse

14.5 Mahout Algorithms

14.6 Conclusion

References

15 Hands-On H2O Machine Learning Tool

15.1 Introduction

15.2 Installation

15.3 Interfaces

15.4 Programming Fundamentals

15.5 Machine Learning in H2O

15.6 Applications of H2O

15.7 Conclusion

References

16 Case Study: Intrusion Detection System Using Machine Learning

16.1 Introduction

16.2 System Design

16.3 Existing Proposals

16.4 Approaches Used in Designing the Scenario

16.5 Result Analysis

16.6 Conclusion

References

17 Inclusion of Security Features for Implications of Electronic Governance Activities

17.1 Introduction

17.2 Objective of E-Governance

17.3 Role of Identity in E-Governance

17.4 Status of E-Governance in Other Countries

17.5 Pros and Cons of E-Governance

17.6 Challenges of E-Governance in Machine Learning

17.7 Conclusion

References

Index

End User License Agreement

List of Tables

Chapter 4

Table 4.1 Dataset for house values in district.

Chapter 5

Table 5.1 Memory symbolize and size: handled by big data.

Table 5.2 Big data analytics evolution.

Table 5.3 Big data analytics tools evolution.

Table 5.4 10 V’s of big data.

Table 5.5 Big data tools.

Chapter 7

Table 7.1 Positive and negative training examples for the target concept Enjo...

Table 7.2 The EnjoySport concept learning task.

Table 7.3 The EnjoySport concept learning as a search.

Table 7.4 Find-S algorithm.

Table 7.5 The List-Eliminate method.

Table 7.6 The Candidate-Elimination method.

Table 7.7 Confusion matrix for multiple classes.

Table 7.8 Confusion matrix for binary class.

Chapter 8

Table 8.1 Details of the used datasets.

Table 8.2 Training Time for the datasets for different Algorithms.

Chapter 9

Table 9.1 Sample dataset.

Table 9.2 Dissimilarity computation for sample data.

Table 9.3 Dissimilarity computation for sample data second time.

Chapter 11

Table 11.1 Steps of R-learning algorithm.

Table 11.2 The pseudocode of SARSA-learning algorithm.

Table 11.3 Steps of Dyna-Q learning algorithm.

Table 11.4 Steps of first visit Monte Carlo algorithm.

Table 11.5 Computation for two samples of episodes.

Chapter 13

Table 13.1 Approaches to transfer learning.

Table 13.2 Transfer learning strategies and types of transferable components.

Chapter 16

Table 16.1 Comparisons of classification and clustering approaches.

Table 16.2 Description of the classification techniques.

Table 16.4 Commands to implement HITL.

Chapter 17

Table 17.1 Advance methods for establishment of IDENTITY in e-governance.

List of Illustrations

Chapter 1

Figure 1.1 Point of intersection.

Figure 1.2 Linearly dependent.

Figure 1.3 Linearly Independent.

Chapter 4

Figure 4.1 Steps of knowledge discovery process.

Figure 4.2 Data preprocessing tasks.

Figure 4.3 Linear regression.

Figure 4.4 Outlier analysis using clustering.

Figure 4.5 Unified view of data.

Figure 4.6 Models of data integration: (a) data warehousing, (b) federated d...

Figure 4.7 Example of concept hierarchy.

Figure 4.8 Example of automatic concept hierarchy.

Figure 4.9 Sales data quarterly for years 2008 to 2010 are aggregated.

Figure 4.10 Data cube for sales.

Figure 4.11 Example of attribute subset selection.

Figure 4.12 Example of histogram using singleton with equal frequency.

Figure 4.13 Example of histogram using multiton with equal width.

Figure 4.14 Two principal components with sample values.

Figure 4.15 Example of factor analysis.

Figure 4.16 Class separation by linear discriminant analysis.

Chapter 5

Figure 5.1 Big data architectural framework.

Figure 5.2 10 V’s of big data.

Figure 5.3 Fraud detection using big data.

Figure 5.4 Customer division using big data.

Figure 5.5 Risk analytics and management using big data.

Figure 5.6 Insurance industry handling using big data.

Figure 5.7 Health care handling using big data.

Figure 5.8 Internet of Things applications using big data.

Figure 5.9 Weather forecasting applications using big data.

Figure 5.10 IoT components and topology.

Chapter 6

Figure 6.1 Explanation of EBL. (a) Standard approach to explanation-based le...

Figure 6.2 EBL architecture.

Figure 6.3 Node u that belongs to G is locally compatible with node v that b...

Figure 6.4 Three phases of isomorphism algorithm.

Figure 6.5 (a) Showing single coin and (b) showing two coins.

Figure 6.6 Ball moving in five consecutive frames.

Figure 6.7 Hierarchal clustering methods

Figure 6.8 Dynamic-based clustering.

Chapter 7

Figure 7.1 Concept generality example.

Figure 7.2 Instances space, hypotheses space, and the more general relation ...

Figure 7.3 Most specific generalized and most general specialized relation....

Figure 7.4 The hypothesis space search performed by Find-S algorithm.

Figure 7.5 Consistent hypothesis in a set of training examples.

Figure 7.6 Version space bade on general boundary and specific boundary.

Figure 7.7 An example for Candidate-Elimination method.

Figure 7.9 Categorization of machine learning algorithm.

Figure 7.10 The supervised algorithms.

Figure 7.11 The unsupervised algorithms.

Figure 7.12 Deep learning.

Chapter 8

Figure 8.1 Classification accuracy - SVM.

Figure 8.2 Classification accuracy - NB.

Figure 8.3 Classification accuracy – BN.

Figure 8.4 Classification accuracy - HMM.

Figure 8.5 Classification accuracy – KNN.

Figure 8.6 Neural cell.

Figure 8.7 ANN architecture and data flow.

Figure 8.8 ANN structure [17].

Figure 8.9 ANN Application areas.

Figure 8.10 Classification accuracy - comparison.

Figure 8.11 RNN efficiency analysis.

Figure 8.12 BPNN efficiency analysis.

Figure 8.13 GRNN efficiency analysis.

Figure 8.14 Efficiency comparison of BPNN and GRNN.

Chapter 9

Figure 9.1 Clustering analysis.

Figure 9.2 Agglomerative and divisive hierarchical clustering.

Figure 9.3 Data points in the graph.

Figure 9.4 Final Clusters for the sample data.

Figure 9.5 Dense-based clustering.

Figure 9.6 Clustering process using DBSCAN algorithm.

Chapter 10

Figure 10.1 Self-training in progress. Circle representing the classified da...

Figure 10.2 Co-training in progress. Green and red are two classes and blue ...

Figure 10.3 Depicting generative models predicting a distribution based on t...

Figure 10.4 Discriminative (left) vs generative (right) approach.

Figure 10.5 Image classification using generative models.

Figure 10.6 Workflow of text categorization using naïve Bayes.

Figure 10.7 SVM.

Figure 10.8 SVM has hinge loss (left) and S3VM has hat loss (right).

Chapter 11

Figure 11.1 Elements of reinforcement learning.

Figure 11.2 Model-based and model-free RL.

Figure 11.3 Steps of Q-learning algorithm.

Figure 11.4 The status of initial Q-table and the puzzle.

Figure 11.5 8*8 chess board.

Figure 11.6 Illustration of SARSA method.

Figure 11.7 Generate a policy using Dyna-Q model.

Figure 11.8 Illustration of Dyna-Q model.

Figure 11.9 A square of unit length consisting quarter circle of unit radius...

Chapter 12

Figure 12.1 Overview of big data and machine learning application in healthc...

Figure 12.2 Overview of the applications of big data and machine learning in...

Figure 12.3 Well-known brands using big data and machine learning.

Figure 12.4 Big data and machine learning in education sector.

Figure 12.5 Ecosystem monitoring with big data and machine learning.

Figure 12.6 Overview of the sectors benefited by big data and machine learni...

Figure 12.7 Big data and machine learning in agriculture.

Figure 12.8 Roadblocks for big data and machine learning.

Chapter 13

Figure 13.1 Traditional learning and transfer learning.

Figure 13.2 Traditional learning vs transfer learning.

Figure 13.3 Summarization to the transfer learning methodologies.

Figure 13.4 Source domain and target domain have a lot in common.

Figure 13.5 Parameter transfer in transfer learning.

Chapter 14

Figure 14.1 Architecture of Mahout.

Figure 14.2 Downloading VMware player.

Figure 14.3 Path setting to install VMware.

Figure 14.4 Opening the Cloudera using VMWare Player.

Figure 14.5 Select the user and password.

Figure 14.6 Updating the software.

Figure 14.7 Installing the default Java.

Figure 14.8 Checking the installed Java version.

Figure 14.9 Creating a Hadoop system user.

Figure 14.10 Adding a directory Hadoop user into Hadoop system.

Figure 14.11 Adding username, password, and other user details.

Figure 14.12 Adding user “hdgouse” as super user.

Figure 14.13 Logging to the new hadoop user.

Figure 14.14 Configuration of SSH switching the user.

Figure 14.15 Creating a new SSH key.

Figure 14.16 Enabling SSH with key access to authorized_keys.

Figure 14.17 Checking SSH to connect to Hadoop user as hdgouse.

Figure 14.18 If error to SSH Localhost, then purge SSH.

Figure 14.19 Updating the SSH.

Figure 14.20 Checking the after downloading the Hadoop, Mahout, and Maven.

Figure 14.21 Checking the file after extracting hadoop-2.7.3.

Figure 14.22 Moving the extracted hadoop-2.7.3 file to Hadoop.

Figure 14.23 Changing the owner permission of hadoop.

Figure 14.24 Modifying the source.bashrc file.

Figure 14.25 Editing the JAVA_HOME and HADOOP_HOME.

Figure 14.26 Listing of files to configured.

Figure 14.27 Command for modifying the hadoop-env.sh.

Figure 14.28 Adding the JAVA_HOME path.

Figure 14.29 Command for modifying the core-site.xml.

Figure 14.30 Adding the configuration property of core-site.xml file.

Figure 14.31 Command for copying mapred site.

Figure 14.32 Command for modifying the mapred-site.xml.

Figure 14.33 Adding the configuration properties of mapred-site.xml.

Figure 14.34 Command for modifying the hdfs-site.xml.

Figure 14.35 Adding the properties to hdf-site.xml.

Figure 14.36 Command for modifying the hadoop.sh.

Figure 14.37 Adding the HADOOP_HOME path.

Figure 14.38 Adding the configuration properties of yarn-site.xml.

Figure 14.39 Adding the datanode and namenode in hdfs.

Figure 14.40 Changing the owner permission of hdfs.

Figure 14.41 Changing the modes of the hdfs file.

Figure 14.42 Formatting the namenode.

Figure 14.43 Downloading the Mahout.

Figure 14.44 Extracting the Mahout file.

Figure 14.45 Creating a Mahout Directory.

Figure 14.46 Moving the extracted file into Mahout directory.

Figure 14.47 Change the bin permission.

Figure 14.48 Extracting the maven tar file.

Figure 14.49 Creating a maven directory under usr/lib.

Figure 14.50 Setting the Maven path.

Figure 14.51 Adding environmental variables in bashrc.

Figure 14.52 Checking Mahout working or not.

Figure 14.53 (a) i. Copying the data command.

Figure 14.53 (b) ii. Performing k-mean analysis command.

Figure 14.54 Hadoop tar file.

Figure 14.55 Downloaded Mahout distribution tar file.

Figure 14.56 Downloaded Maven file.

Figure 14.57 Copying of Hadoop, Mahout, and Maven in C drive.

Figure 14.58 Creating New variable name and value for Hadoop home.

Figure 14.59 Creating New variable name and value for Mahout home.

Figure 14.60 Creating New variable name and value for Maven home.

Figure 14.61 Creating New variable name and value for M2 home.

Figure 14.62 Editing the Path of Java, Hadoop, Mahout and Maven.

Figure 14.63 Creating two new folders datanode and namenode under data of ha...

Figure 14.64 Listing of file to edited in hadoop folder.

Figure 14.65 Adding property fields.

Figure 14.66 Adding property fields.

Figure 14.67 Adding property fields.

Figure 14.68 Adding property fields.

Figure 14.69 Namenode formatting success.

Figure 14.70 Command for starting of namenode, datanode, resource manager an...

Figure 14.71 Starting of namenode, datanode, resource manager and node manag...

Figure 14.72 Name node information overview (a) and summary (b).

Figure 14.73 Name node status (a) and Datanode information (b).

Figure 14.74 Creating a directory Test1, Copying input file to cluster and d...

Figure 14.75 Checking Test1 directory from browser and Checking input file d...

Figure 14.76 Select the Install New Software from Help tab.

Figure 14.77 Work with url paste the maven link https://download.eclipse.org...

Figure 14.78 Select the Maven Integration for Eclipse.

Figure 14.79 Install the Remediation page.

Figure 14.80 Installing the Maven.

Figure 14.81 Maven installed.

Figure 14.82 Select File tab → New → Maven Project.

Figure 14.83 Creating New Maven Project. Select Next tab.

Figure 14.84 Select the show the last version of Archetype only.

Figure 14.85 Enter a group id for the artifact details.

Figure 14.86 Create GroupId, ArtifactId, Version, and Package.

Figure 14.87 Artifact created.

Figure 14.88 Select Properties for Recommender Application.

Figure 14.89 Select JavaBuildPath → Order and Exports.

Figure 14.90 Select the Build class path order. It creates main and test fol...

Figure 14.91 Select JavaBuildPath → Libraries → Add Library.

Figure 14.92 Select Workspace default JRE.

Figure 14.93 Remove the JRE System Library [J2SE-1.5].

Figure 14.94 Apply and Close.

Figure 14.95 pom.xml file.

Figure 14.96 pom.xml file create dependency.

Figure 14.97 pom.xml file change the version tag.

Figure 14.98 Creating the new data file:

Figure 14.99 Creating the new data file: Select the ProjectFile →Rightclick ...

Figure 14.100 Creating the new data file: Select the ProjectFile → Rightclic...

Figure 14.101 Creating the new data file: Select the Project File →Right cli...

Figure 14.102 Select→ Libraries →Add External JARs.

Figure 14.103 Select→ All jar files which downloaded → Open → Apply and Clos...

Figure 14.104 Create a new package. Right Click → src/main/java → New → Pack...

Figure 14.105 Create the new package name as Application.

Figure 14.106 Create new class Evaluation Recommender under src/main/java.

Figure 14.107 Run and the Result of the EvaluationRecommender class.

Figure 14.108 Copying the 20news data into MahoutTest.

Figure 14.109 Running the classify-20newsgroups.sh from bin folder of mahout...

Figure 14.110 Output of 20 dataset.

Figure 14.111 Clustering synthetic data copied into cluster.

Figure 14.112 (a) commands for creating directory for Synthetic data.

Figure 14.112 (b) Commands for running clustering algorithms.

Figure 14.113 (a) Eclipse path for Recommender.

Figure 14.113 (b) pom.xml.

Figure 14.113 (c) Recommender Dataset.

Figure 14.113 (d) App.java.

Figure 14.113 (e) EvaluatorRecommender.java file.

Figure 14.113 (f) Result of Recommender.

Chapter 15

Figure 15.1 Output screen for the command Pip install h2o.

Figure 15.2 Output screen of commands pip install requests and pip install t...

Figure 15.3 Output screen for the command pip install scikit-learn.

Figure 15.4 Output screen for the command pip install colorama.

Figure 15.5 Output screen of Pip install future.

Figure 15.6 Output screen of pip install -f http://h2o-release.s3.amazonaws....

Figure 15.7 Output screen of pip install -f http://h2o-release.s3.amazonaws....

Figure 15.8 Output screen of H2o.demo (“glm”) and h2o.init ().

Figure 15.9 Output screen of H2o.demo (“glm”) and h2o.init ().

Figure 15.10 Output screen of H2o.demo (“glm”) and h2o.init ().

Figure 15.11 Output screen of H2o.demo (“glm”) and h2o.init ().

Figure 15.12 Output screen of deep learning algorithm applied on diabetes da...

Figure 15.13 Output screen of deep learning algorithm applied on diabetes da...

Figure 15.14 Output Screen of parsing applied on diabetes dataset. In the Fi...

Figure 15.15 Output screen of classification applied on diabetes dataset.

Figure 15.16 Output screen of classification applied on diabetes dataset.

Figure 15.17 Output screen of classification applied on diabetes dataset.

Figure 15.18 Output Screen of five-fold cross-validation on diabetes dataset...

Figure 15.19 Output Screen of five-fold cross-validation on diabetes dataset...

Figure 15.20 Output screen of Stacked Ensemble and Random Forest Estimator i...

Figure 15.21 Output screen of Stacked Ensemble and Random Forest Estimator i...

Chapter 16

Figure 16.1 Architecture of a black hole node.

Figure 16.2 Types of data sets.

Figure 16.3 Types of classification techniques.

Figure 16.4 Categories of supervised learning.

Figure 16.5 Confusion matrix.

Figure 16.6 Scenario in QualNet.

Figure 16.7 Algorithm for detection and prevention of black hole node.

Figure 16.8 Network topology.

Figure 16.9 Packet drop versus speed for two black hole nodes.

Figure 16.10 Selection of deactivation time for avoidance.

Figure 16.11 Packet delivery ratio.

Figure 16.12 Dataset generated from QualNet.

Figure 16.13 Dataset imported from QualNet to MATLAB.

Figure 16.14 Dataset imported to MATLAB on classification.

Figure 16.15 Confusion matrix for KNN.

Figure 16.16 ROC for KNN.

Figure 16.17 Confusion matrix for SVM.

Figure 16.18 ROC for SVM.

Figure 16.19 Confusion matrix for decision tree.

Figure 16.20 ROC for decision tree.

Figure 16.21 Confusion matrix for naïve Bayes.

Figure 16.22 ROC for naïve Bayes.

Figure 16.23 Confusion matrix for neural network,

Figure 16.24 ROC for neural network.

Figure 16.25 Performance for KNN using TPR and FNR.

Figure 16.26 Positive predictive and false discovery rates for KNN.

Figure 16.27 Accuracy rates for different classifiers.

Chapter 17

Figure 17.1 Cycle of management of big data.

Figure 17.2 Analytics solution.

Figure 17.3 Digital certificate to prove identity for e-governance.

Figure 17.4 Basic model for storage of identity: fingerprint.

Figure 17.5 Aadhar as a digital identity.

Guide

Cover

Table of Contents

Begin Reading

Pages

ii

iii

iv

xix

xx

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

71

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

335

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

 

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Machine Learning and Big Data

Concepts, Algorithms, Tools and Applications

Edited by

Uma N. Dulhare, Khaleel Ahmad and Khairol Amali Bin Ahmad

This edition first published 2020 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2020 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 9781119654742

Cover image: Pixabay.ComCover design by Russell Richardson

Preface

Nowadays, increasing use of social sites, search engines, and various multimedia sharing, stock exchange, online gaming, online survey and news sites among others has caused the amount and variety of data to grow very rapidly to terabytes or even zettabytes. As a consequence, extracting useful information from these big data has become a major challenge.

Machine Learning is a subset of Artificial Intelligence that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. By using machine learning, computers are taught to perform complex tasks which humans are not able to accomplish. In this latest approach to digital transformation, computing processes are used to make intelligent decisions that are more efficient, cost-effective and reliable. Therefore, there are huge applications for machine learning algorithms for management of species, crops, field conditions and livestock in the agriculture domain; medical imaging and diagnostics, drug discovery and development, treatment and prediction of disease in the healthcare domain; social media monitoring, chatbot, sentiment analysis and image recognition in the social media domain; and fraud detection, customer data management, financial risk modeling, personalized marketing, lifetime value prediction, recommendation engine and customer segmentation in the banking and insurance services domain.

This field is so vast and popular these days that there are a lot of machine learning activities occurring in our daily lives that have become an integral part of our daily routines through applications like Siri, Cortana, Facebook, Twitter, Google Search, Gmail, Skype, Linkedln, Viber, WhatsApp, Pinterest, PayPal, Netflix, Uber, Lyst, Spotify, Instagram and so forth.

The intent of this book is to provide awareness of algorithms used for machine learning and big data in the academic and professional community. While it dwells on the foundations of machine learning and big data as a part of analytics, it also focuses on contemporary topics for research and development. In this regard, the book covers machine learning algorithms and their modern applications in developing automated systems.

The topics in this book are categorized into five sections including a total of seventeen chapters. The first section provides an insight into mathematical foundation, probability theory, correlation and regression techniques. The second section covers data pre-processing and the concept of big data and pattern recognition. The third section discusses machine learning algorithms, including supervised learning algorithm (Naïve-Bayes, KNN, HMM, Bayesian), semi-supervised learning algorithms (S3VM, Graph-Based, Multiview), unsupervised learning algorithms (GMM, K-mean clustering, Dirichlet process mixture model, X-means), and reinforcement learning algorithm (Q-learning, R learning, TD learning, SARSA Learning). The section also dwells on applications of machine learning for video surveillance, social media services, email spam and malware filtering, online fraud detection, financial services, healthcare, industry, manufacturing, transportation, etc.

While section four presents the theoretical principle, functionalities, methodologies and applications of transfer learning as well as its relationship with deep learning paradigms, the final section explores the hands-on machine learning open source tool. A case study is also discussed in detail. At the end of this section, various open challenges are discussed, such as the implication of electronic governance activities which can be solved by machine learning technique in order to help guide leadership to create well-versed decisions, appropriate economic planning and policy formulation that can solve the major issues of all developing countries like a weak economy, unemployment, corruption and many more.

It is a great pleasure for us to acknowledge the contributions and assistance of many individuals. We would like to thank all the authors who submitted chapters for their contributions and fruitful discussions that made this book a great success. We are also thankful to the team from Scrivener Publishing for providing the meticulous service for timely publication of this book. Also, we would like to express our gratitude for the encouragement offered by our college/university. Last but not least, we gratefully acknowledge the support, encouragement and patience of our families.

Uma N. DulhareKhaleel AhmadKhairol Amali Bin AhmadJune 2020

Section 1THEORETICAL FUNDAMENTALS