197,99 €
This book is intended for academic and industrial developers, exploring and developing applications in the area of big data and machine learning, including those that are solving technology requirements, evaluation of methodology advances and algorithm demonstrations. The intent of this book is to provide awareness of algorithms used for machine learning and big data in the academic and professional community. The 17 chapters are divided into 5 sections: Theoretical Fundamentals; Big Data and Pattern Recognition; Machine Learning: Algorithms & Applications; Machine Learning's Next Frontier and Hands-On and Case Study. While it dwells on the foundations of machine learning and big data as a part of analytics, it also focuses on contemporary topics for research and development. In this regard, the book covers machine learning algorithms and their modern applications in developing automated systems. Subjects covered in detail include: * Mathematical foundations of machine learning with various examples. * An empirical study of supervised learning algorithms like Naïve Bayes, KNN and semi-supervised learning algorithms viz. S3VM, Graph-Based, Multiview. * Precise study on unsupervised learning algorithms like GMM, K-mean clustering, Dritchlet process mixture model, X-means and Reinforcement learning algorithm with Q learning, R learning, TD learning, SARSA Learning, and so forth. * Hands-on machine leaning open source tools viz. Apache Mahout, H2O. * Case studies for readers to analyze the prescribed cases and present their solutions or interpretations with intrusion detection in MANETS using machine learning. * Showcase on novel user-cases: Implications of Electronic Governance as well as Pragmatic Study of BD/ML technologies for agriculture, healthcare, social media, industry, banking, insurance and so on.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 632
Veröffentlichungsjahr: 2020
Cover
Title Page
Copyright Page
Preface
Section 1: THEORETICAL FUNDAMENTALS
1 Mathematical Foundation
1.1 Concept of Linear Algebra
1.2 Eigenvalues, Eigenvectors, and Eigendecomposition of a Matrix
1.3 Introduction to Calculus
References
2 Theory of Probability
2.1 Introduction
2.2 Independence in Probability
2.3 Conditional Probability
2.4 Cumulative Distribution Function
2.5 Baye’s Theorem
2.6 Multivariate Gaussian Function
References
3 Correlation and Regression
3.1 Introduction
3.2 Correlation
3.3 Regression
3.4 Conclusion
References
Section 2: BIG DATA AND PATTERN RECOGNITION
4 Data Preprocess
4.1 Introduction
4.2 Data Cleaning
4.3 Data Integration
4.4 Data Transformation
4.5 Data Reduction
4.6 Conclusion
Acknowledgements
References
5 Big Data
5.1 Introduction
5.2 Big Data Evaluation With Its Tools
5.3 Architecture of Big Data
5.4 Issues and Challenges
5.5 Big Data Analytics Tools
5.6 Big Data Use Cases
5.7 Where IoT Meets Big Data
5.8 Role of Machine Learning For Big Data and IoT
5.9 Conclusion
References
6 Pattern Recognition Concepts
6.1 Classifier
6.2 Feature Processing
6.3 Clustering
6.4 Conclusion
References
Section 3: MACHINE LEARNING: ALGORITHMS & APPLICATIONS
7 Machine Learning
7.1 History and Purpose of Machine Learning
7.2 Concept of Well-Defined Learning Problem
7.3 General-to-Specific Ordering Over Hypotheses
7.4 Version Spaces and Candidate Elimination Algorithm
7.5 Concepts of Machine Learning Algorithm
Conclusion
References
8 Performance of Supervised Learning Algorithms on Multi-Variate Datasets
8.1 Introduction
8.2 Supervised Learning Algorithms
8.3 Classification
8.4 Neural Network
8.5 Comparisons and Discussions
8.6 Summary and Conclusion
References
9 Unsupervised Learning
9.1 Introduction
9.2 Related Work
9.3 Unsupervised Learning Algorithms
9.4 Classification of Unsupervised Learning Algorithms
9.5 Unsupervised Learning Algorithms in ML
9.6 Summary and Conclusions
References
10 Semi-Supervised Learning
10.1 Introduction
10.2 Training Models
10.3 Generative Models—Introduction
10.4 S3VMs
10.5 Graph-Based Algorithms
10.6 Multiview Learning
10.7 Conclusion
References
11 Reinforcement Learning
11.1 Introduction: Reinforcement Learning
11.2 Model-Free RL
11.3 Model-Based RL
11.4 Conclusion
References
12 Application of Big Data and Machine Learning
12.1 Introduction
12.2 Motivation
12.3 Related Work
12.4 Application of Big Data and ML
12.5 Issues and Challenges
12.6 Conclusion
References
Section 4: MACHINE LEARNING’S NEXT FRONTIER
13 Transfer Learning
13.1 Introduction
13.2 Traditional Learning vs. Transfer Learning
13.3 Key Takeaways: Functionality
13.4 Transfer Learning Methodologies
13.5 Inductive Transfer Learning
13.6 Unsupervised Transfer Learning
13.7 Transductive Transfer Learning
13.8 Categories in Transfer Learning
13.9 Instance Transfer
13.10 Feature Representation Transfer
13.11 Parameter Transfer
13.12 Relational Knowledge Transfer
13.13 Relationship With Deep Learning
13.14 Applications: Allied Classical Problems
13.15 Further Advancements and Conclusion
References
Section 5: HANDS-ON AND CASE STUDY
14 Hands on MAHOUT—Machine Learning Tool
14.1 Introduction to Mahout
14.2 Installation Steps of Apache Mahout Using Cloudera
14.3 Installation Steps of Apache Mahout Using Windows 10
14.4 Installation Steps of Apache Mahout Using Eclipse
14.5 Mahout Algorithms
14.6 Conclusion
References
15 Hands-On H2O Machine Learning Tool
15.1 Introduction
15.2 Installation
15.3 Interfaces
15.4 Programming Fundamentals
15.5 Machine Learning in H2O
15.6 Applications of H2O
15.7 Conclusion
References
16 Case Study: Intrusion Detection System Using Machine Learning
16.1 Introduction
16.2 System Design
16.3 Existing Proposals
16.4 Approaches Used in Designing the Scenario
16.5 Result Analysis
16.6 Conclusion
References
17 Inclusion of Security Features for Implications of Electronic Governance Activities
17.1 Introduction
17.2 Objective of E-Governance
17.3 Role of Identity in E-Governance
17.4 Status of E-Governance in Other Countries
17.5 Pros and Cons of E-Governance
17.6 Challenges of E-Governance in Machine Learning
17.7 Conclusion
References
Index
End User License Agreement
Chapter 4
Table 4.1 Dataset for house values in district.
Chapter 5
Table 5.1 Memory symbolize and size: handled by big data.
Table 5.2 Big data analytics evolution.
Table 5.3 Big data analytics tools evolution.
Table 5.4 10 V’s of big data.
Table 5.5 Big data tools.
Chapter 7
Table 7.1 Positive and negative training examples for the target concept Enjo...
Table 7.2 The EnjoySport concept learning task.
Table 7.3 The EnjoySport concept learning as a search.
Table 7.4 Find-S algorithm.
Table 7.5 The List-Eliminate method.
Table 7.6 The Candidate-Elimination method.
Table 7.7 Confusion matrix for multiple classes.
Table 7.8 Confusion matrix for binary class.
Chapter 8
Table 8.1 Details of the used datasets.
Table 8.2 Training Time for the datasets for different Algorithms.
Chapter 9
Table 9.1 Sample dataset.
Table 9.2 Dissimilarity computation for sample data.
Table 9.3 Dissimilarity computation for sample data second time.
Chapter 11
Table 11.1 Steps of R-learning algorithm.
Table 11.2 The pseudocode of SARSA-learning algorithm.
Table 11.3 Steps of Dyna-Q learning algorithm.
Table 11.4 Steps of first visit Monte Carlo algorithm.
Table 11.5 Computation for two samples of episodes.
Chapter 13
Table 13.1 Approaches to transfer learning.
Table 13.2 Transfer learning strategies and types of transferable components.
Chapter 16
Table 16.1 Comparisons of classification and clustering approaches.
Table 16.2 Description of the classification techniques.
Table 16.4 Commands to implement HITL.
Chapter 17
Table 17.1 Advance methods for establishment of IDENTITY in e-governance.
Chapter 1
Figure 1.1 Point of intersection.
Figure 1.2 Linearly dependent.
Figure 1.3 Linearly Independent.
Chapter 4
Figure 4.1 Steps of knowledge discovery process.
Figure 4.2 Data preprocessing tasks.
Figure 4.3 Linear regression.
Figure 4.4 Outlier analysis using clustering.
Figure 4.5 Unified view of data.
Figure 4.6 Models of data integration: (a) data warehousing, (b) federated d...
Figure 4.7 Example of concept hierarchy.
Figure 4.8 Example of automatic concept hierarchy.
Figure 4.9 Sales data quarterly for years 2008 to 2010 are aggregated.
Figure 4.10 Data cube for sales.
Figure 4.11 Example of attribute subset selection.
Figure 4.12 Example of histogram using singleton with equal frequency.
Figure 4.13 Example of histogram using multiton with equal width.
Figure 4.14 Two principal components with sample values.
Figure 4.15 Example of factor analysis.
Figure 4.16 Class separation by linear discriminant analysis.
Chapter 5
Figure 5.1 Big data architectural framework.
Figure 5.2 10 V’s of big data.
Figure 5.3 Fraud detection using big data.
Figure 5.4 Customer division using big data.
Figure 5.5 Risk analytics and management using big data.
Figure 5.6 Insurance industry handling using big data.
Figure 5.7 Health care handling using big data.
Figure 5.8 Internet of Things applications using big data.
Figure 5.9 Weather forecasting applications using big data.
Figure 5.10 IoT components and topology.
Chapter 6
Figure 6.1 Explanation of EBL. (a) Standard approach to explanation-based le...
Figure 6.2 EBL architecture.
Figure 6.3 Node u that belongs to G is locally compatible with node v that b...
Figure 6.4 Three phases of isomorphism algorithm.
Figure 6.5 (a) Showing single coin and (b) showing two coins.
Figure 6.6 Ball moving in five consecutive frames.
Figure 6.7 Hierarchal clustering methods
Figure 6.8 Dynamic-based clustering.
Chapter 7
Figure 7.1 Concept generality example.
Figure 7.2 Instances space, hypotheses space, and the more general relation ...
Figure 7.3 Most specific generalized and most general specialized relation....
Figure 7.4 The hypothesis space search performed by Find-S algorithm.
Figure 7.5 Consistent hypothesis in a set of training examples.
Figure 7.6 Version space bade on general boundary and specific boundary.
Figure 7.7 An example for Candidate-Elimination method.
Figure 7.9 Categorization of machine learning algorithm.
Figure 7.10 The supervised algorithms.
Figure 7.11 The unsupervised algorithms.
Figure 7.12 Deep learning.
Chapter 8
Figure 8.1 Classification accuracy - SVM.
Figure 8.2 Classification accuracy - NB.
Figure 8.3 Classification accuracy – BN.
Figure 8.4 Classification accuracy - HMM.
Figure 8.5 Classification accuracy – KNN.
Figure 8.6 Neural cell.
Figure 8.7 ANN architecture and data flow.
Figure 8.8 ANN structure [17].
Figure 8.9 ANN Application areas.
Figure 8.10 Classification accuracy - comparison.
Figure 8.11 RNN efficiency analysis.
Figure 8.12 BPNN efficiency analysis.
Figure 8.13 GRNN efficiency analysis.
Figure 8.14 Efficiency comparison of BPNN and GRNN.
Chapter 9
Figure 9.1 Clustering analysis.
Figure 9.2 Agglomerative and divisive hierarchical clustering.
Figure 9.3 Data points in the graph.
Figure 9.4 Final Clusters for the sample data.
Figure 9.5 Dense-based clustering.
Figure 9.6 Clustering process using DBSCAN algorithm.
Chapter 10
Figure 10.1 Self-training in progress. Circle representing the classified da...
Figure 10.2 Co-training in progress. Green and red are two classes and blue ...
Figure 10.3 Depicting generative models predicting a distribution based on t...
Figure 10.4 Discriminative (left) vs generative (right) approach.
Figure 10.5 Image classification using generative models.
Figure 10.6 Workflow of text categorization using naïve Bayes.
Figure 10.7 SVM.
Figure 10.8 SVM has hinge loss (left) and S3VM has hat loss (right).
Chapter 11
Figure 11.1 Elements of reinforcement learning.
Figure 11.2 Model-based and model-free RL.
Figure 11.3 Steps of Q-learning algorithm.
Figure 11.4 The status of initial Q-table and the puzzle.
Figure 11.5 8*8 chess board.
Figure 11.6 Illustration of SARSA method.
Figure 11.7 Generate a policy using Dyna-Q model.
Figure 11.8 Illustration of Dyna-Q model.
Figure 11.9 A square of unit length consisting quarter circle of unit radius...
Chapter 12
Figure 12.1 Overview of big data and machine learning application in healthc...
Figure 12.2 Overview of the applications of big data and machine learning in...
Figure 12.3 Well-known brands using big data and machine learning.
Figure 12.4 Big data and machine learning in education sector.
Figure 12.5 Ecosystem monitoring with big data and machine learning.
Figure 12.6 Overview of the sectors benefited by big data and machine learni...
Figure 12.7 Big data and machine learning in agriculture.
Figure 12.8 Roadblocks for big data and machine learning.
Chapter 13
Figure 13.1 Traditional learning and transfer learning.
Figure 13.2 Traditional learning vs transfer learning.
Figure 13.3 Summarization to the transfer learning methodologies.
Figure 13.4 Source domain and target domain have a lot in common.
Figure 13.5 Parameter transfer in transfer learning.
Chapter 14
Figure 14.1 Architecture of Mahout.
Figure 14.2 Downloading VMware player.
Figure 14.3 Path setting to install VMware.
Figure 14.4 Opening the Cloudera using VMWare Player.
Figure 14.5 Select the user and password.
Figure 14.6 Updating the software.
Figure 14.7 Installing the default Java.
Figure 14.8 Checking the installed Java version.
Figure 14.9 Creating a Hadoop system user.
Figure 14.10 Adding a directory Hadoop user into Hadoop system.
Figure 14.11 Adding username, password, and other user details.
Figure 14.12 Adding user “hdgouse” as super user.
Figure 14.13 Logging to the new hadoop user.
Figure 14.14 Configuration of SSH switching the user.
Figure 14.15 Creating a new SSH key.
Figure 14.16 Enabling SSH with key access to authorized_keys.
Figure 14.17 Checking SSH to connect to Hadoop user as hdgouse.
Figure 14.18 If error to SSH Localhost, then purge SSH.
Figure 14.19 Updating the SSH.
Figure 14.20 Checking the after downloading the Hadoop, Mahout, and Maven.
Figure 14.21 Checking the file after extracting hadoop-2.7.3.
Figure 14.22 Moving the extracted hadoop-2.7.3 file to Hadoop.
Figure 14.23 Changing the owner permission of hadoop.
Figure 14.24 Modifying the source.bashrc file.
Figure 14.25 Editing the JAVA_HOME and HADOOP_HOME.
Figure 14.26 Listing of files to configured.
Figure 14.27 Command for modifying the hadoop-env.sh.
Figure 14.28 Adding the JAVA_HOME path.
Figure 14.29 Command for modifying the core-site.xml.
Figure 14.30 Adding the configuration property of core-site.xml file.
Figure 14.31 Command for copying mapred site.
Figure 14.32 Command for modifying the mapred-site.xml.
Figure 14.33 Adding the configuration properties of mapred-site.xml.
Figure 14.34 Command for modifying the hdfs-site.xml.
Figure 14.35 Adding the properties to hdf-site.xml.
Figure 14.36 Command for modifying the hadoop.sh.
Figure 14.37 Adding the HADOOP_HOME path.
Figure 14.38 Adding the configuration properties of yarn-site.xml.
Figure 14.39 Adding the datanode and namenode in hdfs.
Figure 14.40 Changing the owner permission of hdfs.
Figure 14.41 Changing the modes of the hdfs file.
Figure 14.42 Formatting the namenode.
Figure 14.43 Downloading the Mahout.
Figure 14.44 Extracting the Mahout file.
Figure 14.45 Creating a Mahout Directory.
Figure 14.46 Moving the extracted file into Mahout directory.
Figure 14.47 Change the bin permission.
Figure 14.48 Extracting the maven tar file.
Figure 14.49 Creating a maven directory under usr/lib.
Figure 14.50 Setting the Maven path.
Figure 14.51 Adding environmental variables in bashrc.
Figure 14.52 Checking Mahout working or not.
Figure 14.53 (a) i. Copying the data command.
Figure 14.53 (b) ii. Performing k-mean analysis command.
Figure 14.54 Hadoop tar file.
Figure 14.55 Downloaded Mahout distribution tar file.
Figure 14.56 Downloaded Maven file.
Figure 14.57 Copying of Hadoop, Mahout, and Maven in C drive.
Figure 14.58 Creating New variable name and value for Hadoop home.
Figure 14.59 Creating New variable name and value for Mahout home.
Figure 14.60 Creating New variable name and value for Maven home.
Figure 14.61 Creating New variable name and value for M2 home.
Figure 14.62 Editing the Path of Java, Hadoop, Mahout and Maven.
Figure 14.63 Creating two new folders datanode and namenode under data of ha...
Figure 14.64 Listing of file to edited in hadoop folder.
Figure 14.65 Adding property fields.
Figure 14.66 Adding property fields.
Figure 14.67 Adding property fields.
Figure 14.68 Adding property fields.
Figure 14.69 Namenode formatting success.
Figure 14.70 Command for starting of namenode, datanode, resource manager an...
Figure 14.71 Starting of namenode, datanode, resource manager and node manag...
Figure 14.72 Name node information overview (a) and summary (b).
Figure 14.73 Name node status (a) and Datanode information (b).
Figure 14.74 Creating a directory Test1, Copying input file to cluster and d...
Figure 14.75 Checking Test1 directory from browser and Checking input file d...
Figure 14.76 Select the Install New Software from Help tab.
Figure 14.77 Work with url paste the maven link https://download.eclipse.org...
Figure 14.78 Select the Maven Integration for Eclipse.
Figure 14.79 Install the Remediation page.
Figure 14.80 Installing the Maven.
Figure 14.81 Maven installed.
Figure 14.82 Select File tab → New → Maven Project.
Figure 14.83 Creating New Maven Project. Select Next tab.
Figure 14.84 Select the show the last version of Archetype only.
Figure 14.85 Enter a group id for the artifact details.
Figure 14.86 Create GroupId, ArtifactId, Version, and Package.
Figure 14.87 Artifact created.
Figure 14.88 Select Properties for Recommender Application.
Figure 14.89 Select JavaBuildPath → Order and Exports.
Figure 14.90 Select the Build class path order. It creates main and test fol...
Figure 14.91 Select JavaBuildPath → Libraries → Add Library.
Figure 14.92 Select Workspace default JRE.
Figure 14.93 Remove the JRE System Library [J2SE-1.5].
Figure 14.94 Apply and Close.
Figure 14.95 pom.xml file.
Figure 14.96 pom.xml file create dependency.
Figure 14.97 pom.xml file change the version tag.
Figure 14.98 Creating the new data file:
Figure 14.99 Creating the new data file: Select the ProjectFile →Rightclick ...
Figure 14.100 Creating the new data file: Select the ProjectFile → Rightclic...
Figure 14.101 Creating the new data file: Select the Project File →Right cli...
Figure 14.102 Select→ Libraries →Add External JARs.
Figure 14.103 Select→ All jar files which downloaded → Open → Apply and Clos...
Figure 14.104 Create a new package. Right Click → src/main/java → New → Pack...
Figure 14.105 Create the new package name as Application.
Figure 14.106 Create new class Evaluation Recommender under src/main/java.
Figure 14.107 Run and the Result of the EvaluationRecommender class.
Figure 14.108 Copying the 20news data into MahoutTest.
Figure 14.109 Running the classify-20newsgroups.sh from bin folder of mahout...
Figure 14.110 Output of 20 dataset.
Figure 14.111 Clustering synthetic data copied into cluster.
Figure 14.112 (a) commands for creating directory for Synthetic data.
Figure 14.112 (b) Commands for running clustering algorithms.
Figure 14.113 (a) Eclipse path for Recommender.
Figure 14.113 (b) pom.xml.
Figure 14.113 (c) Recommender Dataset.
Figure 14.113 (d) App.java.
Figure 14.113 (e) EvaluatorRecommender.java file.
Figure 14.113 (f) Result of Recommender.
Chapter 15
Figure 15.1 Output screen for the command Pip install h2o.
Figure 15.2 Output screen of commands pip install requests and pip install t...
Figure 15.3 Output screen for the command pip install scikit-learn.
Figure 15.4 Output screen for the command pip install colorama.
Figure 15.5 Output screen of Pip install future.
Figure 15.6 Output screen of pip install -f http://h2o-release.s3.amazonaws....
Figure 15.7 Output screen of pip install -f http://h2o-release.s3.amazonaws....
Figure 15.8 Output screen of H2o.demo (“glm”) and h2o.init ().
Figure 15.9 Output screen of H2o.demo (“glm”) and h2o.init ().
Figure 15.10 Output screen of H2o.demo (“glm”) and h2o.init ().
Figure 15.11 Output screen of H2o.demo (“glm”) and h2o.init ().
Figure 15.12 Output screen of deep learning algorithm applied on diabetes da...
Figure 15.13 Output screen of deep learning algorithm applied on diabetes da...
Figure 15.14 Output Screen of parsing applied on diabetes dataset. In the Fi...
Figure 15.15 Output screen of classification applied on diabetes dataset.
Figure 15.16 Output screen of classification applied on diabetes dataset.
Figure 15.17 Output screen of classification applied on diabetes dataset.
Figure 15.18 Output Screen of five-fold cross-validation on diabetes dataset...
Figure 15.19 Output Screen of five-fold cross-validation on diabetes dataset...
Figure 15.20 Output screen of Stacked Ensemble and Random Forest Estimator i...
Figure 15.21 Output screen of Stacked Ensemble and Random Forest Estimator i...
Chapter 16
Figure 16.1 Architecture of a black hole node.
Figure 16.2 Types of data sets.
Figure 16.3 Types of classification techniques.
Figure 16.4 Categories of supervised learning.
Figure 16.5 Confusion matrix.
Figure 16.6 Scenario in QualNet.
Figure 16.7 Algorithm for detection and prevention of black hole node.
Figure 16.8 Network topology.
Figure 16.9 Packet drop versus speed for two black hole nodes.
Figure 16.10 Selection of deactivation time for avoidance.
Figure 16.11 Packet delivery ratio.
Figure 16.12 Dataset generated from QualNet.
Figure 16.13 Dataset imported from QualNet to MATLAB.
Figure 16.14 Dataset imported to MATLAB on classification.
Figure 16.15 Confusion matrix for KNN.
Figure 16.16 ROC for KNN.
Figure 16.17 Confusion matrix for SVM.
Figure 16.18 ROC for SVM.
Figure 16.19 Confusion matrix for decision tree.
Figure 16.20 ROC for decision tree.
Figure 16.21 Confusion matrix for naïve Bayes.
Figure 16.22 ROC for naïve Bayes.
Figure 16.23 Confusion matrix for neural network,
Figure 16.24 ROC for neural network.
Figure 16.25 Performance for KNN using TPR and FNR.
Figure 16.26 Positive predictive and false discovery rates for KNN.
Figure 16.27 Accuracy rates for different classifiers.
Chapter 17
Figure 17.1 Cycle of management of big data.
Figure 17.2 Analytics solution.
Figure 17.3 Digital certificate to prove identity for e-governance.
Figure 17.4 Basic model for storage of identity: fingerprint.
Figure 17.5 Aadhar as a digital identity.
Cover
Table of Contents
Begin Reading
ii
iii
iv
xix
xx
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
71
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
335
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106
Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])
Edited by
Uma N. Dulhare, Khaleel Ahmad and Khairol Amali Bin Ahmad
This edition first published 2020 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2020 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 9781119654742
Cover image: Pixabay.ComCover design by Russell Richardson
Nowadays, increasing use of social sites, search engines, and various multimedia sharing, stock exchange, online gaming, online survey and news sites among others has caused the amount and variety of data to grow very rapidly to terabytes or even zettabytes. As a consequence, extracting useful information from these big data has become a major challenge.
Machine Learning is a subset of Artificial Intelligence that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. By using machine learning, computers are taught to perform complex tasks which humans are not able to accomplish. In this latest approach to digital transformation, computing processes are used to make intelligent decisions that are more efficient, cost-effective and reliable. Therefore, there are huge applications for machine learning algorithms for management of species, crops, field conditions and livestock in the agriculture domain; medical imaging and diagnostics, drug discovery and development, treatment and prediction of disease in the healthcare domain; social media monitoring, chatbot, sentiment analysis and image recognition in the social media domain; and fraud detection, customer data management, financial risk modeling, personalized marketing, lifetime value prediction, recommendation engine and customer segmentation in the banking and insurance services domain.
This field is so vast and popular these days that there are a lot of machine learning activities occurring in our daily lives that have become an integral part of our daily routines through applications like Siri, Cortana, Facebook, Twitter, Google Search, Gmail, Skype, Linkedln, Viber, WhatsApp, Pinterest, PayPal, Netflix, Uber, Lyst, Spotify, Instagram and so forth.
The intent of this book is to provide awareness of algorithms used for machine learning and big data in the academic and professional community. While it dwells on the foundations of machine learning and big data as a part of analytics, it also focuses on contemporary topics for research and development. In this regard, the book covers machine learning algorithms and their modern applications in developing automated systems.
The topics in this book are categorized into five sections including a total of seventeen chapters. The first section provides an insight into mathematical foundation, probability theory, correlation and regression techniques. The second section covers data pre-processing and the concept of big data and pattern recognition. The third section discusses machine learning algorithms, including supervised learning algorithm (Naïve-Bayes, KNN, HMM, Bayesian), semi-supervised learning algorithms (S3VM, Graph-Based, Multiview), unsupervised learning algorithms (GMM, K-mean clustering, Dirichlet process mixture model, X-means), and reinforcement learning algorithm (Q-learning, R learning, TD learning, SARSA Learning). The section also dwells on applications of machine learning for video surveillance, social media services, email spam and malware filtering, online fraud detection, financial services, healthcare, industry, manufacturing, transportation, etc.
While section four presents the theoretical principle, functionalities, methodologies and applications of transfer learning as well as its relationship with deep learning paradigms, the final section explores the hands-on machine learning open source tool. A case study is also discussed in detail. At the end of this section, various open challenges are discussed, such as the implication of electronic governance activities which can be solved by machine learning technique in order to help guide leadership to create well-versed decisions, appropriate economic planning and policy formulation that can solve the major issues of all developing countries like a weak economy, unemployment, corruption and many more.
It is a great pleasure for us to acknowledge the contributions and assistance of many individuals. We would like to thank all the authors who submitted chapters for their contributions and fruitful discussions that made this book a great success. We are also thankful to the team from Scrivener Publishing for providing the meticulous service for timely publication of this book. Also, we would like to express our gratitude for the encouragement offered by our college/university. Last but not least, we gratefully acknowledge the support, encouragement and patience of our families.
Uma N. DulhareKhaleel AhmadKhairol Amali Bin AhmadJune 2020
