41,99 €
Uncover the power of MySQL 8 for Big Data
This book is intended for MySQL database administrators and Big Data professionals looking to integrate MySQL 8 and Hadoop to implement a high performance Big Data solution. Some previous experience with MySQL will be helpful, although the book will highlight the newer features introduced in MySQL 8.
With organizations handling large amounts of data on a regular basis, MySQL has become a popular solution to handle this structured Big Data. In this book, you will see how DBAs can use MySQL 8 to handle billions of records, and load and retrieve data with performance comparable or superior to commercial DB solutions with higher costs.
Many organizations today depend on MySQL for their websites and a Big Data solution for their data archiving, storage, and analysis needs. However, integrating them can be challenging. This book will show you how to implement a successful Big Data strategy with Apache Hadoop and MySQL 8. It will cover real-time use case scenario to explain integration and achieve Big Data solutions using technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. Also, the book includes case studies on Apache Sqoop and real-time event processing.
By the end of this book, you will know how to efficiently use MySQL 8 to manage data for your Big Data applications.
Step by Step guide filled with real-world practical examples.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 326
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2017
Production reference: 1161017
ISBN 978-1-78839-718-6
www.packtpub.com
Authors
Shabbir Challawala Jaydip Lakhatariya Chintan Mehta Kandarp Patel
Copy Editor
Tasneem Fatehi
Reviewers
Ankit Bhavsar Chintan Gajjar Nikunj Ranpura Subhash Shah
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Aman Singh
Indexer
Rekha Nair
Content Development Editor
Snehal Kolte
Graphics
Tania Dutta
Technical Editor
Sagar Sawant
Production Coordinator
Shantanu Zagade
Shabbir Challawala has over 8 years of rich experience in providing solutions based on MySQL and PHP technologies. He is currently working with KNOWARTH Technologies. He has worked in various PHP-based e-commerce solutions and learning portals for enterprises. He has worked on different PHP-based frameworks, such as Magento E-commerce, Drupal CMS, and Laravel.
Shabbir has been involved in various enterprise solutions at different phases, such as architecture design, database optimization, and performance tuning. He has been carrying good exposure of Software Development Life Cycle process thoroughly. He has worked on integrating Big Data technologies such as MongoDB and Elasticsearch with a PHP-based framework.
Jaydip Lakhatariya has rich experience in portal and J2EE frameworks. He adapts quickly to any new technology and has a keen desire for constant improvement. Currently, Jaydip is associated with a leading open source enterprise development company, KNOWARTH Technologies (www.knowarth.com), where he is engaged in various enterprise projects.
Jaydip, a full-stack developer, has proven his versatility by adopting technologies such as Liferay, Java, Spring, Struts, Hadoop, MySQL, Elasticsearch, Cassandra, MongoDB, Jenkins, SCM, PostgreSQL, and many more.
He has been recognized with awards such as Merit, Commitment to Service, and also as a Star Performer. He loves mentoring people and has been delivering training for Portals and J2EE frameworks.
Chintan Mehta is the co-founder at KNOWARTH Technologies (www.knowarth.com) and heads Cloud/RIMS/DevOps. He has rich progressive experience in Systems and Server Administration of Linux, AWS Cloud, DevOps, RIMS, and Server Administration on Open Source Technologies. He is also an AWS Certified Solutions Architect-Associate.
Chintan's vital role during his career in Infrastructure and Operations has also included Requirement Analysis, Architecture design, Security design, High-availability and Disaster recovery planning, Automated monitoring, Automated deployment, Build processes to help customers, performance tuning, infrastructure setup and deployment, and application setup and deployment. He has also been responsible for setting up various offices at different locations, with fantastic sole ownership to achieve Operation Readiness for the organizations he had been associated with.
He headed Managed Cloud Services practices with his previous employer and received multiple awards in recognition of very valuable contributions made to the business of the group. He also led the ISO 27001:2005 implementation team as a joint management representative. Chintan has authored Hadoop Backup and Recovery Solutions and reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
He has a Diploma in Computer Hardware and Network from a reputed institute in India.
Kandarp Patel leads PHP practices at KNOWARTH Technologies (www.knowarth.com). He has vast experience in providing end-to-end solutions in CMS, LMS, WCM, and e-commerce, along with various integrations for enterprise customers. He has over 9 years of rich experience in providing solutions in MySQL, MongoDB, and PHP-based frameworks. Kandarp is also a certified MongoDB and Magento developer.
Kandarp has experience in various Enterprise Application development phases of the Software Development Life Cycle and has played prominent role in requirement gathering, architecture design, database design, application development, performance tuning, and CD/CI.
Kandarp has a Bachelor of Engineering in Information Technology from a reputed university in India.
Ankit Bhavsar is a senior consultant at KNOWARTH Technologies (www.knowarth.com) and leads the team working on Enterprise Resource Planning Solutions. He has rich knowledge in Java, JEE, MySQL, PostgreSQL, Apache Spark, and many more open source tools and technologies utilized in building enterprise-grade applications.
Ankit has played dynamic roles during his career in Development and Maintenance of Astrology Portal's Content Management and Enterprise Resource Planning Solutions, which includes Object-Oriented Programming, Technical Architecture analysis, design, and development, as well as Database design, development and enhancement, process, and data and object modeling in a variety of applications and environments to provide technical and business solutions to clients.
Ankit has a Masters of Computer Application from North Gujarat University.
Chintan Gajjar is a consultant at KNOWARTH Technologies (www.knowarth.com). He has rich progressive experience in advanced Javascript, NodeJS, BackboneJS, AngularJS, Java, and MongoDB, and also provides enterprise services such as Enterprise Portal Development, ERP Implementation, and Enterprise Integration services in Open Source Technologies.
Chintan's vital role during his career in enterprise services has also included Requirement Analysis, Architecture design, UI Implementation, Build processes to help customers, following best practices in development and processes, and development to deployment processes with great ownership, to develop the reality of a customer's idea and his organizations he had been associated with.
Chintan has played dynamic roles during his career in Development Enterprise Resource Planning Solutions, worked on development of single page application (SPA) also worked on mobile application which including NodeJS, MongoDB and AngularJS. Chintan received multiple awards in recognition to very valuable contribution made to team and the business of the company. Chintan has contributed in book Hadoop Backup and Recovery Solutions. Chintan has completed Master in Computer Application (MCA) degree from Ganpat University.
Nikunj Ranpura has rich progressive experience in Systems and Server Administration of Linux, AWS Cloud, Devops, RIMS, Networking, Storage, Backup, and Security and Server Administration on Open Source Technologies. He adapts quickly to any technology and has a keen desire for constant improvement. He is also an AWS Certified Solutions Architect-Associate.
Nikunj has played distinct roles in his career, such as Systems Analyst, IT Manager, Managed Cloud Services Practice Lead, Infrastructure Architect, Infrastructure Developer, DevOps Architect, AWS Architect, and support manager for various large implementations. He has been involved in creating solutions and consulting to build SAAS, IAAS, and PAAS services on cloud.
Currently, Nikunj is associated with a leading open source enterprise development company, KNOWARTH Technologies, as a lead consultant, where he takes care of enterprise projects with regards to requirement analysis, architecture design, security design, high availability, and disaster recovery planning to help customers, along with leading the team.
Nikunj graduated from Bhavnagar University and has done CISCO and UTM firewall certifications as well. He has been recognized with two awards for his valuable contribution to the company. He is also a contributor on Stack Overflow. He can be contacted at [email protected].
Subhash Shah is a software architect with over 11 years of experience in developing web-based software solutions based on varying platforms and programming languages. He is an object-oriented programming enthusiast and a strong advocate of free and open source software development, and its use by businesses to reduce risk, reduce costs, and be more flexible. His career interests include designing sustainable software solutions. The best of his technical skills include, but are not limited to, requirement analysis, architecture design, project delivery monitoring, application and infrastructure setup, and execution process setup. He is an admirer of writing quality code and test-driven development.
Subhash works as a principal consultant at KNOWARTH Technologies Pvt Ltd. and heads ERP practices. He holds a degree in Information Technology from Hemchandracharya North Gujarat University.
For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.in/dp/1788397185.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Introduction to Big Data and MySQL 8
The importance of Big Data
Social media
Politics
Science and research
Power and energy
Fraud detection
Healthcare
Business mapping
The life cycle of Big Data
Volume
Variety
Velocity
Veracity
Phases of the Big Data life cycle
Collect
Store
Analyze
Governance
Structured databases
Basics of MySQL
MySQL as a relational database management system
Licensing
Reliability and scalability
Platform compatibility
Releases
New features in MySQL 8
Transactional data dictionary
Roles
InnoDB auto increment
Supporting invisible indexes
Improving descending indexes
SET PERSIST
Expanded GIS support
The default character set
Extended bit-wise operations
InnoDB Memcached
NOWAIT and SKIP LOCKED
Benefits of using MySQL
Security
Scalability
An open source relational database management system
High performance
High availability
Cross-platform capabilities
Installing MySQL 8
Obtaining MySQL 8
MySQL 8 installation
MySQL service commands
Evolution of MySQL for Big Data
Acquiring data in MySQL
Organizing data in Hadoop
Analyzing data
Results of analysis
Summary
Data Query Techniques in MySQL 8
Overview of SQL
Database storage engines and types
InnoDB
Important notes about InnoDB
MyISAM
Important notes about MyISAM tables
Memory
Archive
Blackhole
CSV
Merge
Federated
NDB cluster
Select statement in MySQL 8
WHERE clause
Equal To and Not Equal To
Greater than and Less than
LIKE
IN/NOT IN
BETWEEN
ORDER BY clause
LIMIT clause
SQL JOINS
INNER JOIN
LEFT JOIN
RIGHT JOIN
CROSS JOIN
UNION
Subquery
Optimizing SELECT statements
Insert, replace, and update statements in MySQL 8
Insert
Update
Replace
Transactions in MySQL 8
Aggregating data in MySQL 8
The importance of aggregate functions
GROUP BY clause
HAVING clause
Minimum
Maximum
Average
Count
Sum
JSON
JSON_OBJECTAGG
JSON_ARRAYAGG
Summary
Indexing your data for High-Performing Queries
MySQL indexing
Index structures
Bitmap indexes
Sparse indexes
Dense indexes
B-Tree indexes
Hash indexes
Creating or dropping indexes
UNIQUE | FULLTEXT | SPATIAL
Index_col_name
Index_options
KEY_BLOCK_SIZE
With Parser
COMMENT
VISIBILITY
index_type
algorithm_option
lock_option
When to avoid indexing
MySQL 8 index types
Defining a primary index
Primary indexes
Natural keys versus surrogate keys
Unique keys
Defining a column index
Composite indexes in MySQL 8
Covering index
Invisible indexes
Descending indexes
Defining a foreign key in the MySQL table
RESTRICT
CASCADE
SET NULL
NO ACTION
SET DEFAULT
Dropping foreign keys
Full-text indexing
Natural language fulltext search on InnoDB and MyISAM
Fulltext indexing on InnoDB
Fulltext search in Boolean mode
Differentiating full-text indexing and like queries
Spatial indexes
Indexing JSON data
Generated columns
Virtual generated columns
Stored generated columns
Defining indexes on JSON
Summary
Using Memcached with MySQL 8
Overview of Memcached
Setting up Memcached
Installation
Verification
Using of Memcached
Performance tuner
Caching tool
Easy to use
Analyzing data stored in Memcached
Memcached replication configuration
Memcached APIs for different technologies
Memcached with Java
Memcached with PHP
Memcached with Ruby
Memcached with Python
Summary
Partitioning High Volume Data
Partitioning in MySQL 8
What is partitioning?
Partitioning types
Horizontal partitioning
Vertical partitioning
Horizontal partitioning in MySQL 8
Range partitioning
List partitioning
Hash partitioning
Column partitioning
Range column partitioning
List column partitioning
Key partitioning
Sub partitioning
Vertical partitioning
Splitting data into multiple tables
Data normalization
First normal form
Second normal form
Third normal form
Boyce-Codd normal form
Fourth normal form
Fifth normal form
Pruning partitions in MySQL
Pruning with list partitioning
Pruning with key partitioning
Querying on partitioned data
DELETE query with the partition option
UPDATE query with the partition option
INSERT query with the partition option
Summary
Replication for building highly available solutions
High availability
MySQL replication
MySQL cluster
Oracle MySQL cloud service
MySQL with the Solaris cluster
Replication with MySQL
Benefits of replication in MySQL 8
Scalable applications
Secure architecture
Large data analysis
Geographical data sharing
Methods of replication in MySQL 8
Replication using binary logs
Replication using global transaction identifiers
Replication configuration
Replication with binary log file
Replication master configuration
Replication slave configuration
Replication with GTIDs
Global transaction identifiers
The gtid_executed table
GTID master's side configurations
GTID slave's side configurations
MySQL multi-source replication
Multi-source replication configuration
Statement-based versus row-based replication
Group replication
Requirements for group replication
Group replication configuration
Group replication settings
Choosing a single master or multi-master
Host-specific configuration settings
Configuring a Replication User and enabling the Group Replication Plugin
Starting group replication
Bootstrap node
Summary
MySQL 8 Best Practices
MySQL benchmarks and configurations
Resource utilization
Stretch your timelines of benchmarks
Replicating production settings
Consistency of throughput and latency
Sysbench can do more
Virtualization world
Concurrency
Hidden workloads
Nerves of your query
Benchmarks
Best practices for MySQL queries
Data types
Not null
Indexing
Search fields index
Data types and joins
Compound index
Shorten up primary keys
Index everything
Fetch all data
Application does the job
Existence of data
Limit yourself
Analyze slow queries
Query cost
Best practices for the Memcached configuration
Resource allocation
Operating system architecture
Default configurations
Max object size
Backlog queue limit
Large pages support
Sensitive data
Restrict exposure
Failover
Namespaces
Caching mechanism
Memcached general statistics
Best practices for replication
Throughput in group replication
Infrastructure sizing
Constant throughput
Contradictory workloads
Write scalability
Summary
NoSQL API for Integrating with Big Data Solutions
NoSQL overview
Changing rapidly over time
Scaling
Less management
Best for big data
NoSQL versus SQL
Implementing NoSQL APIs
NoSQL with the Memcached API layer
Prerequisites
NoSQL API with Java
NoSQL API with PHP
NoSQL API with Python
NoSQL API with Perl
NDB Cluster API
NDB API for NodeJS
NDB API for Java
NDB API with C++
Summary
Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop
Case study for log analysis
Using MySQL 8 and Hadoop for analyzing log
Apache Sqoop overview
Integrating Apache Sqoop with MySQL and Hadoop
Hadoop
MapReduce
Hadoop distributed file system
YARN
Setting up Hadoop on Linux
Installing Apache Sqoop
Configuring MySQL connector
Importing unstructured data to Hadoop HDFS from MySQL
Sqoop import for fetching data from MySQL 8
Incremental imports using Sqoop
Loading structured data to MySQL using Apache Sqoop
Sqoop export for storing structured data from MySQL 8
Sqoop saved jobs
Summary
Case study: Part II - Real time event processing using MySQL applier
Case study overview
MySQL Applier
SQL Dump and Import
Sqoop
Tungsten replicator
Apache Kafka
Talend
Dell Shareplex
Comparison of Tools
MySQL Applier overview
MySQL Applier installation
libhdfs
cmake
gcc
FindHDFS.cmake
Hive
Real-time integration with MySQL Applier
Organizing and analyzing data in Hadoop
Summary
With the organizations handling large amounts of data on a regular basis, MySQL has become a popular solution to handle this Structured Big Data. In this book, you will see how Database Administrators (DAs) can use MySQL to handle billions of records and load and retrieve data with performance comparable or superior to commercial DB solutions with higher costs.
Many organizations today depend on MySQL for their websites, and Big Data solutions for their data archiving, storage, and analysis needs. However, integrating them can be challenging. This book will show how to implement a successful Big Data strategy with Apache Hadoop and MySQL 8. It will cover real-time use case scenarios to explain integration and achieving Big Data solutions using different technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier.
The book will have discussion on topics such as features of MySQL 8, best practices for using MySQL 8, and NoSQL APIs provided by MySQL 8, and will also have a use case on using MySQL 8 for managing Big Data. By the end of this book, you will learn how to efficiently use MySQL 8 to manage data for your Big Data applications.
Chapter 1, Introduction to Big Data and MySQL 8, provides an overview of Big Data and MySQL 8, their importance, and life cycle of big data. It covers the basic idea of Big Data and its trends in the current market. Along with that, it also explains the benefits of using MySQL, takes us through the steps to install MySQL 8, and acquaints us with newly introduced features in MySQL 8.
Chapter 2, Data Query Techniques in MySQL 8, covers the basics of querying data on MySQL 8 and how to join or aggregate data set in it.
Chapter 3, Indexing your data for High-Performing Queries, explains about indexing in MySQL 8, introduces the different types of indexing available in MySQL, and shows how to do indexing for faster performance on large quantities of data.
Chapter 4, Using Memcached with MySQL 8, provides an overview of Memcached with MySQL and informs us of the various advantages of using it. It covers the Memcached installation steps, replication configuration, and various Memcached APIs in different programming languages.
Chapter 5, Partitioning High Volume Data, explains how high-volume data can be partitioned in MySQL 8 using different partitioning methods. It covers the various types of partitioning that we can implement in MySQL 8 and their use with Big Data.
Chapter 6, Replication for building highly available solutions, explains implementing group replication in MySQL 8. Chapter talks about how large data can be scaled and replicating of data can be faster using different techniques of replication.
Chapter 7, MySQL 8 Best Practices, covers the best practices of using MySQL 8 for Big Data. It has all the different kinds of dos and don'ts for using MySQL 8.
Chapter 8, NoSQL API for Integrating with Big Data Solutions, explains integration of NoSQL API for acquiring data. It also explains NoSQL and its various APIs in different programming languages for connecting NoSQL with MySQL.
Chapter 9, Case Study: Part I - Apache Sqoop for Exchanging Data between MySQL and Hadoop, explains how bulk data can be efficiently transferred between Hadoop and MySQL using Apache Sqoop.
Chapter 10, Case Study: Part II - Realtime event processing using MySQL applier, explains real-time integration of MySQL with Hadoop, and reading binary log events as soon as they are committed and writing them into a file in HDFS.
This book will guide you through the installation of all the tools that you need to follow the examples. You will need to install the following software to effectively run the code samples present in this book:
MySQL 8.0.3
Hadoop 2.8.1
Apache Sqoop 1.4.6
This book is intended for MySQL database administrators and Big Data professionals looking to integrate MySQL and Hadoop to implement a high performance Big Data solution. Some previous experience with MySQL will be helpful.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and explanations of their meanings.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We can include other contexts through the use of the include directive."
A block of code is set as follows:
[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)exten => s,102,Voicemail(b100)exten => i,1,Voicemail(s0)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]exten => s,1,
Dial
(Zap/1|30)exten => s,
2
,Voicemail(u100)exten => s,102,Voicemail(
b100
)exten => i,1,Voicemail(
s0
)
Any command-line input or output is written as follows:
# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample /etc/asterisk/cdr_mysql.conf
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Clicking the Next button moves you to the next screen".
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book--what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply email us at [email protected] and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Note that you need to be logged in to your Packt account.
Once the file is downloaded, make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/MySQL-8-for-Big-Data. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/MySQL8forBigData_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books, maybe a mistake in the text or the code, we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, do provide us with the location address or the website name immediately, so that we can pursue a remedy.
Contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
Today we are in the age of digitalization. We are producing enormous amounts of data in many ways--social networking, purchasing at grocery stores, bank/credit card transactions, emails, storing data on clouds, and so on. One of the first questions that comes to mind is: are you getting the utmost out of the collected data? For this data tsunami, we need to have appropriate tools to fetch data in an organized way that can be used in various fields such as scientific research, real-time traffic, fighting crime, fraud detection, digital personalization, and so on. All this data needs to be captured, stored, searched, shared, transferred, analyzed, and visualized.
Analysis of structured, unstructured, or semi-structured ubiquitous data helps us discover hidden patterns, market trends, correlations, personal preferences, and so on. With the help of the right tools to process and analyze, data organization can result in much better marketing plans, additional revenue opportunities, improved customer service, healthier operational efficiency, competitive benefits, and much more.
Every company collects data and uses it; however, to potentially flourish, a company needs to use data more effectively. Every company must carve out direct links to produced data, which can improve business either directly or indirectly.
Okay, now you have Big Data, which is generally being referred to as a large quantity of data, and you are doing analysis--is this what you need? Hold on! The other most critical factor is to successfully monetize the data. So, get ready and fasten your seatbelts to fly in understanding the importance of Big Data!
In this chapter we will learn about below points to find out Big Data's role in today's life and basic installation steps for MySQL 8:
Importance of Big Data
Life cycle of Big Data
What is structured database
MySQL's basics
New feature introduced in MySQL 8
Benefits of using MySQL 8
How to install MySQL 8
Evolution of MySQL for Big Data
The importance of Big Data doesn't depend only on how much data you have, it's rather what you are going to do with the data. Data can be sourced and analyzed from unpredictable sources and can be used to address many things. Let's see use cases with real-life importance made on renowned scenarios with the help of Big Data.
The following image helps us understand a Big Data solution serving various industries. Though it's not an extensive list of industries where Big Data has been playing a prominent role in business decisions, let's discuss a few of the industries:
Social media content is information, and so are engagements such as views, likes, demographics, shares, follows, unique visitors, comments, and downloads. So, in regards to social media and Big Data, they are interrelated. At the end of the day, what matters is how your social media-related efforts contribute to business.
One notable example of Big Data possibilities on Facebook is providing insights about consumers lifestyles, search patterns, likes, demographics, purchasing habits, and so on. Facebook stores around 100PBs of data and piles up 500TB of data almost daily. Considering the number of subscribers and data collected, it is expected to be more than 60 zettabytes in the next three years. The more data you have, the more analysis you can have with sophisticated precision approaches for better Return on Investment (ROI). Information fetched from social media is also leveraged when targeting audiences for attractive and profitable ads.
Facebook has a service called Graph Search, which can help you do advanced searches with multiple criteria. For instance, you can search for people of male gender living in Ahmedabad who work with KNOWARTH Technologies. Google also helps you refine the search. Such searches and filters are not limited to these; it might also contain school, political views, age, and name. In the same way, you can also try for hotels, photos, songs, and more. So here, you have the business ROI of the Facebook company, which provides Facebook ad services which can be based on specific criteria such as regions, interests, or other specific features of user data. Google also provides a similar platform called Google AdWords.
The era of Big Data has been playing a significant role in politics too; political parties have been using various sources of data to target voters and better their election campaigns. Big Data analytics also made a significant contribution to the 2012 re-election of Barack Obama by enhancing engagement and speaking about the precise things that were significant for voters.
Narendra Modi is considered one of the most technology and social media-savvy politicians in the world! He has almost 500 million views on Google+, 30 million followers on Twitter, and 35 million likes on Facebook! Narendra Modi belongs to the Bhartiya Janta Party (BJP); Big Data analysis carried major responsibility for the BJP party and its associates for their successful Indian General Election in 2014, using open source tools that helped them get in direct touch with their voters. BJP reached their fluctuating voters and negative voters too, as they kept monitoring social media conversations and accordingly sent messages and used tactics to improve their vision for the election campaign.
Narendra Modi made a statement about prioritizing toilets before temples seven months earlier, after which the digital team closely monitored social media conversations around this. It was noticed that at least 50% of users were in line with the statement. This was when the opportunity to win the hearts of voters was converted to the mission of Swacch Bharat, which means hygienic India. The results were astonishing; BJP party support rose to around 30% in merely 50 hours.
Did you know that with the help of Big Data, human genome decoding, which actually took 10 years to process, is now decoded in hardly a day, and there is almost a 100 times reduction in cost predicted by Moore's Law? Back in the year 2000, when the Sloan Digital Sky Survey (SDSS) started gathering astronomical data, it was with a rate of around 200 GB per night, which, at that time, was much higher than the data collected in astronomy history.
National Aeronautics and Space Administration (NASA
