MySQL 8 for Big Data - Shabbir Challawala - E-Book

MySQL 8 for Big Data E-Book

Shabbir Challawala

0,0
41,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Uncover the power of MySQL 8 for Big Data

About This Book

  • Combine the powers of MySQL and Hadoop to build a solid Big Data solution for your organization
  • Integrate MySQL with different NoSQL APIs and Big Data tools such as Apache Sqoop
  • A comprehensive guide with practical examples on building a high performance Big Data pipeline with MySQL

Who This Book Is For

This book is intended for MySQL database administrators and Big Data professionals looking to integrate MySQL 8 and Hadoop to implement a high performance Big Data solution. Some previous experience with MySQL will be helpful, although the book will highlight the newer features introduced in MySQL 8.

What You Will Learn

  • Explore the features of MySQL 8 and how they can be leveraged to handle Big Data
  • Unlock the new features of MySQL 8 for managing structured and unstructured Big Data
  • Integrate MySQL 8 and Hadoop for efficient data processing
  • Perform aggregation using MySQL 8 for optimum data utilization
  • Explore different kinds of join and union in MySQL 8 to process Big Data efficiently
  • Accelerate Big Data processing with Memcached
  • Integrate MySQL with the NoSQL API
  • Implement replication to build highly available solutions for Big Data

In Detail

With organizations handling large amounts of data on a regular basis, MySQL has become a popular solution to handle this structured Big Data. In this book, you will see how DBAs can use MySQL 8 to handle billions of records, and load and retrieve data with performance comparable or superior to commercial DB solutions with higher costs.

Many organizations today depend on MySQL for their websites and a Big Data solution for their data archiving, storage, and analysis needs. However, integrating them can be challenging. This book will show you how to implement a successful Big Data strategy with Apache Hadoop and MySQL 8. It will cover real-time use case scenario to explain integration and achieve Big Data solutions using technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. Also, the book includes case studies on Apache Sqoop and real-time event processing.

By the end of this book, you will know how to efficiently use MySQL 8 to manage data for your Big Data applications.

Style and approach

Step by Step guide filled with real-world practical examples.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 326

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



MySQL 8 for Big Data

 

 

 

 

 

 

 

 

 

Effective data processing with MySQL 8, Hadoop, NoSQL APIs, and other Big Data tools

 

 

 

 

 

 

 

 

 

 

Shabbir Challawala

 

Jaydip Lakhatariya

 

Chintan Mehta

 

Kandarp Patel

 

 

 

 

 

BIRMINGHAM - MUMBAI

MySQL 8 for Big Data

Copyright © 2017 Packt Publishing

 

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

First published: October 2017

 

Production reference: 1161017

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78839-718-6

 

www.packtpub.com

Credits

Authors

Shabbir Challawala Jaydip Lakhatariya Chintan Mehta Kandarp Patel

Copy Editor

Tasneem Fatehi

Reviewers

Ankit Bhavsar Chintan Gajjar Nikunj Ranpura Subhash Shah

Project Coordinator

Manthan Patel

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Aman Singh

Indexer

Rekha Nair

Content Development Editor

Snehal Kolte

Graphics

Tania Dutta

Technical Editor

Sagar Sawant

Production Coordinator

Shantanu Zagade

About the Authors

Shabbir Challawala has over 8 years of rich experience in providing solutions based on MySQL and PHP technologies. He is currently working with KNOWARTH Technologies. He has worked in various PHP-based e-commerce solutions and learning portals for enterprises. He has worked on different PHP-based frameworks, such as Magento E-commerce, Drupal CMS, and Laravel.

Shabbir has been involved in various enterprise solutions at different phases, such as architecture design, database optimization, and performance tuning. He has been carrying good exposure of Software Development Life Cycle process thoroughly. He has worked on integrating Big Data technologies such as MongoDB and Elasticsearch with a PHP-based framework.

I am sincerely thankful to Chintan Mehta for showing confidence in me writing this book. I would like to thank KNOWARTH Technologies for providing the opportunity and support to be part of this book. I also want to thank my co-authors and PacktPub team for providing wonderful support throughout. I would especially like to thank my mom, dad, wife Sakina, lovely son Mohammad, and family members for supporting me throughout the project.

Jaydip Lakhatariya has rich experience in portal and J2EE frameworks. He adapts quickly to any new technology and has a keen desire for constant improvement. Currently, Jaydip is associated with a leading open source enterprise development company, KNOWARTH Technologies (www.knowarth.com), where he is engaged in various enterprise projects.

Jaydip, a full-stack developer, has proven his versatility by adopting technologies such as Liferay, Java, Spring, Struts, Hadoop, MySQL, Elasticsearch, Cassandra, MongoDB, Jenkins, SCM, PostgreSQL, and many more.

He has been recognized with awards such as Merit, Commitment to Service, and also as a Star Performer. He loves mentoring people and has been delivering training for Portals and J2EE frameworks.

I am sincerely thankful to my splendid co-authors, and especially to Mr. Chintan Mehta, for providing such motivation and having faith in me. I would like to thank KNOWARTH for constantly providing new opportunities to help me enhance myself. I would also like to appreciate the entire team at Packt Publishing for providing wonderful support throughout the project. Finally, I am utterly grateful to my parents and my younger brother Keyur, for supporting me throughout the journey while authoring. Thank you my friends and colleagues for being around.

 

Chintan Mehta is the co-founder at KNOWARTH Technologies (www.knowarth.com) and heads Cloud/RIMS/DevOps. He has rich progressive experience in Systems and Server Administration of Linux, AWS Cloud, DevOps, RIMS, and Server Administration on Open Source Technologies. He is also an AWS Certified Solutions Architect-Associate.

Chintan's vital role during his career in Infrastructure and Operations has also included Requirement Analysis, Architecture design, Security design, High-availability and Disaster recovery planning, Automated monitoring, Automated deployment, Build processes to help customers, performance tuning, infrastructure setup and deployment, and application setup and deployment. He has also been responsible for setting up various offices at different locations, with fantastic sole ownership to achieve Operation Readiness for the organizations he had been associated with.

He headed Managed Cloud Services practices with his previous employer and received multiple awards in recognition of very valuable contributions made to the business of the group. He also led the ISO 27001:2005 implementation team as a joint management representative. Chintan has authored Hadoop Backup and Recovery Solutions and reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.

He has a Diploma in Computer Hardware and Network from a reputed institute in India.

I have relied on many people, both directly and indirectly, in writing this book. First, I would like to thank my co-authors and the wonderful team at PacktPub for this effort. I would like to especially thank my wonderful wife, Mittal, and my sweet son, Devam, for putting up with the long days, nights, and weekends when I was camped out in front of my laptop. Many people have inspired and made contributions to this book and provided comments, edits, insights, and ideas, especially Krupal Khatri and Chintan Gajjar. There were several things that could have interfered with my book. I also want to thank all the reviewers of this book. Last, but not the least, I want to thank my mom and dad, friends, family, and colleagues for supporting me throughout the writing of this book.

Kandarp Patel leads PHP practices at KNOWARTH Technologies (www.knowarth.com). He has vast experience in providing end-to-end solutions in CMS, LMS, WCM, and e-commerce, along with various integrations for enterprise customers. He has over 9 years of rich experience in providing solutions in MySQL, MongoDB, and PHP-based frameworks. Kandarp is also a certified MongoDB and Magento developer.

Kandarp has experience in various Enterprise Application development phases of the Software Development Life Cycle and has played prominent role in requirement gathering, architecture design, database design, application development, performance tuning, and CD/CI.

Kandarp has a Bachelor of Engineering in Information Technology from a reputed university in India.

I would like to acknowledge Chintan Mehta for guiding me through the various stages of the roller-coaster ride while writing the book. I would like to thank KNOWARTH Technologies for providing me the opportunity to be a part of this book. I would also like to thank my splendid co-authors and PacktPublishing team for providing wonderful support throughout the journey.Last, but not the least, I want to thank my mom and dad, and my wife Jalpa, for continuously supporting and encouraging me throughout the writing of the book. I dedicate my first book to my lovely princesses, Jayna and Jaisvi.

About the Reviewers

Ankit Bhavsar is a senior consultant at KNOWARTH Technologies (www.knowarth.com) and leads the team working on Enterprise Resource Planning Solutions. He has rich knowledge in Java, JEE, MySQL, PostgreSQL, Apache Spark, and many more open source tools and technologies utilized in building enterprise-grade applications.

Ankit has played dynamic roles during his career in Development and Maintenance of Astrology Portal's Content Management and Enterprise Resource Planning Solutions, which includes Object-Oriented Programming, Technical Architecture analysis, design, and development, as well as Database design, development and enhancement, process, and data and object modeling in a variety of applications and environments to provide technical and business solutions to clients.

Ankit has a Masters of Computer Application from North Gujarat University.

First, I would like to thank my co-reviewers and the wonderful team at Packt Publishing for this effort. I would also like to thank Subhash Shah and Chintan Mehta. I also want to thank all the authors of this book. Last, but not least, I want to thank my mom, friends, family, and colleagues for supporting me throughout the reviewing of this book.

Chintan Gajjar is a consultant at KNOWARTH Technologies (www.knowarth.com). He has rich progressive experience in advanced Javascript, NodeJS, BackboneJS, AngularJS, Java, and MongoDB, and also provides enterprise services such as Enterprise Portal Development, ERP Implementation, and Enterprise Integration services in Open Source Technologies.

Chintan's vital role during his career in enterprise services has also included Requirement Analysis, Architecture design, UI Implementation, Build processes to help customers, following best practices in development and processes, and development to deployment processes with great ownership, to develop the reality of a customer's idea and his organizations he had been associated with.

Chintan has played dynamic roles during his career in Development Enterprise Resource Planning Solutions, worked on development of single page application (SPA) also worked on mobile application which including NodeJS, MongoDB and AngularJS. Chintan received multiple awards in recognition to very valuable contribution made to team and the business of the company. Chintan has contributed in book Hadoop Backup and Recovery Solutions. Chintan has completed Master in Computer Application (MCA) degree from Ganpat University.

I would like to thank co-reviewers and the wonderful team at Packt Publishing forthis effort. I would also like to thank Subhash Shah, Chintan Mehta, and Ankit Bhavsar and colleagues for supporting me throughout the reviewing of this book. I also want to thank all the authors of this book.

Nikunj Ranpura has rich progressive experience in Systems and Server Administration of Linux, AWS Cloud, Devops, RIMS, Networking, Storage, Backup, and Security and Server Administration on Open Source Technologies. He adapts quickly to any technology and has a keen desire for constant improvement. He is also an AWS Certified Solutions Architect-Associate.

Nikunj has played distinct roles in his career, such as Systems Analyst, IT Manager, Managed Cloud Services Practice Lead, Infrastructure Architect, Infrastructure Developer, DevOps Architect, AWS Architect, and support manager for various large implementations. He has been involved in creating solutions and consulting to build SAAS, IAAS, and PAAS services on cloud.

Currently, Nikunj is associated with a leading open source enterprise development company, KNOWARTH Technologies, as a lead consultant, where he takes care of enterprise projects with regards to requirement analysis, architecture design, security design, high availability, and disaster recovery planning to help customers, along with leading the team.

Nikunj graduated from Bhavnagar University and has done CISCO and UTM firewall certifications as well. He has been recognized with two awards for his valuable contribution to the company. He is also a contributor on Stack Overflow. He can be contacted at [email protected].

I would like to thank my family for their immense support and faith in me throughout my learning stage. My friends have developed the confidence in me to a level that makes me bring the best out of myself. I am happy that God has blessed me with such wonderful people around me, without whom my success as it is today would not have been possible.

Subhash Shah is a software architect with over 11 years of experience in developing web-based software solutions based on varying platforms and programming languages. He is an object-oriented programming enthusiast and a strong advocate of free and open source software development, and its use by businesses to reduce risk, reduce costs, and be more flexible. His career interests include designing sustainable software solutions. The best of his technical skills include, but are not limited to, requirement analysis, architecture design, project delivery monitoring, application and infrastructure setup, and execution process setup. He is an admirer of writing quality code and test-driven development.

Subhash works as a principal consultant at KNOWARTH Technologies Pvt Ltd. and heads ERP practices. He holds a degree in Information Technology from Hemchandracharya North Gujarat University.

It is a pleasure to hold the reviewer badge. I would like to thank the Packt Publishing team for offering such an opportunity. I would like to thank my family for supporting me throughout the course of reviewing this book. It would have been difficult without them understanding my priorities and being a source of inspiration. I want to thank my colleagues for their constant support and help. Finally, I want to thank the authors for writing such useful and detailed content.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.in/dp/1788397185.

If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Introduction to Big Data and MySQL 8

The importance of Big Data

Social media

Politics

Science and research

Power and energy

Fraud detection

Healthcare

Business mapping

The life cycle of Big Data

Volume

Variety

Velocity

Veracity

Phases of the Big Data life cycle

Collect

Store

Analyze

Governance

Structured databases

Basics of MySQL

MySQL as a relational database management system

Licensing

Reliability and scalability

Platform compatibility

Releases

New features in MySQL 8

Transactional data dictionary

Roles

InnoDB auto increment

Supporting invisible indexes

Improving descending indexes

SET PERSIST

Expanded GIS support

The default character set

Extended bit-wise operations

InnoDB Memcached

NOWAIT and SKIP LOCKED

Benefits of using MySQL

Security

Scalability

An open source relational database management system

High performance

High availability

Cross-platform capabilities

Installing MySQL 8

Obtaining MySQL 8

MySQL 8 installation

MySQL service commands

Evolution of MySQL for Big Data

Acquiring data in MySQL

Organizing data in Hadoop

Analyzing data

Results of analysis

Summary

Data Query Techniques in MySQL 8

Overview of SQL

Database storage engines and types

InnoDB

Important notes about InnoDB

MyISAM

Important notes about MyISAM tables

Memory

Archive

Blackhole

CSV

Merge

Federated

NDB cluster

Select statement in MySQL 8

WHERE clause

Equal To and Not Equal To

Greater than and Less than

LIKE

IN/NOT IN

BETWEEN

ORDER BY clause

LIMIT clause

SQL JOINS

INNER JOIN

LEFT JOIN

RIGHT JOIN

CROSS JOIN

UNION

Subquery

Optimizing SELECT statements

Insert, replace, and update statements in MySQL 8

Insert

Update

Replace

Transactions in MySQL 8

Aggregating data in MySQL 8

The importance of aggregate functions

GROUP BY clause

HAVING clause

Minimum

Maximum

Average

Count

Sum

JSON

JSON_OBJECTAGG

JSON_ARRAYAGG

Summary

Indexing your data for High-Performing Queries

MySQL indexing

Index structures

Bitmap indexes

Sparse indexes

Dense indexes

B-Tree indexes

Hash indexes

Creating or dropping indexes

UNIQUE | FULLTEXT | SPATIAL

Index_col_name

Index_options

KEY_BLOCK_SIZE

With Parser

COMMENT

VISIBILITY

index_type

algorithm_option

lock_option

When to avoid indexing

MySQL 8 index types

Defining a primary index

Primary indexes

Natural keys versus surrogate keys

Unique keys

Defining a column index

Composite indexes in MySQL 8

Covering index

Invisible indexes

Descending indexes

Defining a foreign key in the MySQL table

RESTRICT

CASCADE

SET NULL

NO ACTION

SET DEFAULT

Dropping foreign keys

Full-text indexing

Natural language fulltext search on InnoDB and MyISAM

Fulltext indexing on InnoDB

Fulltext search in Boolean mode

Differentiating full-text indexing and like queries

Spatial indexes

Indexing JSON data

Generated columns

Virtual generated columns

Stored generated columns

Defining indexes on JSON

Summary

Using Memcached with MySQL 8

Overview of Memcached

Setting up Memcached

Installation

Verification

Using of Memcached

Performance tuner

Caching tool

Easy to use

Analyzing data stored in Memcached

Memcached replication configuration

Memcached APIs for different technologies

Memcached with Java

Memcached with PHP

Memcached with Ruby

Memcached with Python

Summary

Partitioning High Volume Data

Partitioning in MySQL 8

What is partitioning?

Partitioning types

Horizontal partitioning

Vertical partitioning

Horizontal partitioning in MySQL 8

Range partitioning

List partitioning

Hash partitioning

Column partitioning

Range column partitioning

List column partitioning

Key partitioning

Sub partitioning

Vertical partitioning

Splitting data into multiple tables

Data normalization

First normal form

Second normal form

Third normal form

Boyce-Codd normal form

Fourth normal form

Fifth normal form

Pruning partitions in MySQL

Pruning with list partitioning

Pruning with key partitioning

Querying on partitioned data

DELETE query with the partition option

UPDATE query with the partition option

INSERT query with the partition option

Summary

Replication for building highly available solutions

High availability

MySQL replication

MySQL cluster

Oracle MySQL cloud service

MySQL with the Solaris cluster

Replication with MySQL

Benefits of replication in MySQL 8

Scalable applications

Secure architecture

Large data analysis

Geographical data sharing

Methods of replication in MySQL 8

Replication using binary logs

Replication using global transaction identifiers

Replication configuration

Replication with binary log file

Replication master configuration

Replication slave configuration

Replication with GTIDs

Global transaction identifiers

The gtid_executed table

GTID master's side configurations

GTID slave's side configurations

MySQL multi-source replication

Multi-source replication configuration

Statement-based versus row-based replication

Group replication

Requirements for group replication

Group replication configuration

Group replication settings

Choosing a single master or multi-master

Host-specific configuration settings

Configuring a Replication User and enabling the Group Replication Plugin

Starting group replication

Bootstrap node

Summary

MySQL 8 Best Practices

MySQL benchmarks and configurations

Resource utilization

Stretch your timelines of benchmarks

Replicating production settings

Consistency of throughput and latency

Sysbench can do more

Virtualization world

Concurrency

Hidden workloads

Nerves of your query

Benchmarks

Best practices for MySQL queries

Data types

Not null

Indexing

Search fields index

Data types and joins

Compound index

Shorten up primary keys

Index everything

Fetch all data

Application does the job

Existence of data

Limit yourself

Analyze slow queries

Query cost

Best practices for the Memcached configuration

Resource allocation

Operating system architecture

Default configurations

Max object size

Backlog queue limit

Large pages support

Sensitive data

Restrict exposure

Failover

Namespaces

Caching mechanism

Memcached general statistics

Best practices for replication

Throughput in group replication

Infrastructure sizing

Constant throughput

Contradictory workloads

Write scalability

Summary

NoSQL API for Integrating with Big Data Solutions

NoSQL overview

Changing rapidly over time

Scaling

Less management

Best for big data

NoSQL versus SQL

Implementing NoSQL APIs

NoSQL with the Memcached API layer

Prerequisites

NoSQL API with Java

NoSQL API with PHP

NoSQL API with Python

NoSQL API with Perl

NDB Cluster API

NDB API for NodeJS

NDB API for Java

NDB API with C++

Summary

Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop

Case study for log analysis

Using MySQL 8 and Hadoop for analyzing log

Apache Sqoop overview

Integrating Apache Sqoop with MySQL and Hadoop

Hadoop

MapReduce

Hadoop distributed file system

YARN

Setting up Hadoop on Linux

Installing Apache Sqoop

Configuring MySQL connector

Importing unstructured data to Hadoop HDFS from MySQL

Sqoop import for fetching data from MySQL 8

Incremental imports using Sqoop

Loading structured data to MySQL using Apache Sqoop

Sqoop export for storing structured data from MySQL 8

Sqoop saved jobs

Summary

Case study: Part II - Real time event processing using MySQL applier

Case study overview

MySQL Applier

SQL Dump and Import

Sqoop

Tungsten replicator

Apache Kafka

Talend

Dell Shareplex

Comparison of Tools

MySQL Applier overview

MySQL Applier installation

libhdfs

cmake

gcc

FindHDFS.cmake

Hive

Real-time integration with MySQL Applier

Organizing and analyzing data in Hadoop

Summary

Preface

With the organizations handling large amounts of data on a regular basis, MySQL has become a popular solution to handle this Structured Big Data. In this book, you will see how Database Administrators (DAs) can use MySQL to handle billions of records and load and retrieve data with performance comparable or superior to commercial DB solutions with higher costs.

Many organizations today depend on MySQL for their websites, and Big Data solutions for their data archiving, storage, and analysis needs. However, integrating them can be challenging. This book will show how to implement a successful Big Data strategy with Apache Hadoop and MySQL 8. It will cover real-time use case scenarios to explain integration and achieving Big Data solutions using different technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier.

The book will have discussion on topics such as features of MySQL 8, best practices for using MySQL 8, and NoSQL APIs provided by MySQL 8, and will also have a use case on using MySQL 8 for managing Big Data. By the end of this book, you will learn how to efficiently use MySQL 8 to manage data for your Big Data applications.

What this book covers

Chapter 1, Introduction to Big Data and MySQL 8, provides an overview of Big Data and MySQL 8, their importance, and life cycle of big data. It covers the basic idea of Big Data and its trends in the current market. Along with that, it also explains the benefits of using MySQL, takes us through the steps to install MySQL 8, and acquaints us with newly introduced features in MySQL 8.

Chapter 2, Data Query Techniques in MySQL 8, covers the basics of querying data on MySQL 8 and how to join or aggregate data set in it.

Chapter 3, Indexing your data for High-Performing Queries, explains about indexing in MySQL 8, introduces the different types of indexing available in MySQL, and shows how to do indexing for faster performance on large quantities of data.

Chapter 4, Using Memcached with MySQL 8, provides an overview of Memcached with MySQL and informs us of the various advantages of using it. It covers the Memcached installation steps, replication configuration, and various Memcached APIs in different programming languages.

Chapter 5, Partitioning High Volume Data, explains how high-volume data can be partitioned in MySQL 8 using different partitioning methods. It covers the various types of partitioning that we can implement in MySQL 8 and their use with Big Data.

Chapter 6, Replication for building highly available solutions, explains implementing group replication in MySQL 8. Chapter talks about how large data can be scaled and replicating of data can be faster using different techniques of replication.

Chapter 7, MySQL 8 Best Practices, covers the best practices of using MySQL 8 for Big Data. It has all the different kinds of dos and don'ts for using MySQL 8.

Chapter 8, NoSQL API for Integrating with Big Data Solutions, explains integration of NoSQL API for acquiring data. It also explains NoSQL and its various APIs in different programming languages for connecting NoSQL with MySQL.

Chapter 9, Case Study: Part I - Apache Sqoop for Exchanging Data between MySQL and Hadoop, explains how bulk data can be efficiently transferred between Hadoop and MySQL using Apache Sqoop.

Chapter 10, Case Study: Part II - Realtime event processing using MySQL applier, explains real-time integration of MySQL with Hadoop, and reading binary log events as soon as they are committed and writing them into a file in HDFS.

What you need for this book

This book will guide you through the installation of all the tools that you need to follow the examples. You will need to install the following software to effectively run the code samples present in this book:

MySQL 8.0.3

Hadoop 2.8.1

Apache Sqoop 1.4.6

Who this book is for

This book is intended for MySQL database administrators and Big Data professionals looking to integrate MySQL and Hadoop to implement a high performance Big Data solution. Some previous experience with MySQL will be helpful.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and explanations of their meanings.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We can include other contexts through the use of the include directive."

A block of code is set as follows:

[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)exten => s,102,Voicemail(b100)exten => i,1,Voicemail(s0)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]exten => s,1,

Dial

(Zap/1|30)exten => s,

2

,Voicemail(u100)exten => s,102,Voicemail(

b100

)exten => i,1,Voicemail(

s0

)

Any command-line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample /etc/asterisk/cdr_mysql.conf

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Clicking the Next button moves you to the next screen".

Warnings or important notes appear in a box like this.

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book--what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply email us at [email protected] and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your email address and password.

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

.

 

Enter the name of the book in the

Search

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Note that you need to be logged in to your Packt account.

Once the file is downloaded, make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/MySQL-8-for-Big-Data. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/MySQL8forBigData_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books, maybe a mistake in the text or the code, we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, do provide us with the location address or the website name immediately, so that we can pursue a remedy.

Contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Introduction to Big Data and MySQL 8

Today we are in the age of digitalization. We are producing enormous amounts of data in many ways--social networking, purchasing at grocery stores, bank/credit card transactions, emails, storing data on clouds, and so on. One of the first questions that comes to mind is: are you getting the utmost out of the collected data? For this data tsunami, we need to have appropriate tools to fetch data in an organized way that can be used in various fields such as scientific research, real-time traffic, fighting crime, fraud detection, digital personalization, and so on. All this data needs to be captured, stored, searched, shared, transferred, analyzed, and visualized.

Analysis of structured, unstructured, or semi-structured ubiquitous data helps us discover hidden patterns, market trends, correlations, personal preferences, and so on. With the help of the right tools to process and analyze, data organization can result in much better marketing plans, additional revenue opportunities, improved customer service, healthier operational efficiency, competitive benefits, and much more.

Every company collects data and uses it; however, to potentially flourish, a company needs to use data more effectively. Every company must carve out direct links to produced data, which can improve business either directly or indirectly.

Okay, now you have Big Data, which is generally being referred to as a large quantity of data, and you are doing analysis--is this what you need? Hold on! The other most critical factor is to successfully monetize the data. So, get ready and fasten your seatbelts to fly in understanding the importance of Big Data!

In this chapter we will learn about below points to find out Big Data's role in today's life and basic installation steps for MySQL 8:

Importance of Big Data

Life cycle of Big Data

What is structured database

MySQL's basics

New feature introduced in MySQL 8

Benefits of using MySQL 8

How to install MySQL 8

Evolution of MySQL for Big Data

The importance of Big Data

The importance of Big Data doesn't depend only on how much data you have, it's rather what you are going to do with the data. Data can be sourced and analyzed from unpredictable sources and can be used to address many things. Let's see use cases with real-life importance made on renowned scenarios with the help of Big Data.

The following image helps us understand a Big Data solution serving various industries. Though it's not an extensive list of industries where Big Data has been playing a prominent role in business decisions, let's discuss a few of the industries:

Social media

Social media content is information, and so are engagements such as views, likes, demographics, shares, follows, unique visitors, comments, and downloads. So, in regards to social media and Big Data, they are interrelated. At the end of the day, what matters is how your social media-related efforts contribute to business.

I came across one wonderful title: There's No Such Thing as Social Media ROI - It's Called Business ROI.

One notable example of Big Data possibilities on Facebook is providing insights about consumers lifestyles, search patterns, likes, demographics, purchasing habits, and so on. Facebook stores around 100PBs of data and piles up 500TB of data almost daily. Considering the number of subscribers and data collected, it is expected to be more than 60 zettabytes in the next three years. The more data you have, the more analysis you can have with sophisticated precision approaches for better Return on Investment (ROI). Information fetched from social media is also leveraged when targeting audiences for attractive and profitable ads.

Facebook has a service called Graph Search, which can help you do advanced searches with multiple criteria. For instance, you can search for people of male gender living in Ahmedabad who work with KNOWARTH Technologies. Google also helps you refine the search. Such searches and filters are not limited to these; it might also contain school, political views, age, and name. In the same way, you can also try for hotels, photos, songs, and more. So here, you have the business ROI of the Facebook company, which provides Facebook ad services which can be based on specific criteria such as regions, interests, or other specific features of user data. Google also provides a similar platform called Google AdWords.

Politics

The era of Big Data has been playing a significant role in politics too; political parties have been using various sources of data to target voters and better their election campaigns. Big Data analytics also made a significant contribution to the 2012 re-election of Barack Obama by enhancing engagement and speaking about the precise things that were significant for voters.

Narendra Modi is considered one of the most technology and social media-savvy politicians in the world! He has almost 500 million views on Google+, 30 million followers on Twitter, and 35 million likes on Facebook! Narendra Modi belongs to the Bhartiya Janta Party (BJP); Big Data analysis carried major responsibility for the BJP party and its associates for their successful Indian General Election in 2014, using open source tools that helped them get in direct touch with their voters. BJP reached their fluctuating voters and negative voters too, as they kept monitoring social media conversations and accordingly sent messages and used tactics to improve their vision for the election campaign.

Narendra Modi made a statement about prioritizing toilets before temples seven months earlier, after which the digital team closely monitored social media conversations around this. It was noticed that at least 50% of users were in line with the statement. This was when the opportunity to win the hearts of voters was converted to the mission of Swacch Bharat, which means hygienic India. The results were astonishing; BJP party support rose to around 30% in merely 50 hours.

Science and research

Did you know that with the help of Big Data, human genome decoding, which actually took 10 years to process, is now decoded in hardly a day, and there is almost a 100 times reduction in cost predicted by Moore's Law? Back in the year 2000, when the Sloan Digital Sky Survey (SDSS) started gathering astronomical data, it was with a rate of around 200 GB per night, which, at that time, was much higher than the data collected in astronomy history.

National Aeronautics and Space Administration (NASA