0,99 €
In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. The book covers the basic matter to understand the concept of Big Data, its architecture, challenges and applications in concise manner.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2020
CHAPTER 1
INTRODUCTION
These days internet is being widely used than it was used a few years back. It has become a core part of our life. Billions of people are using social media and social networking every day all across the globe. Such a huge number of people generate a flood of data which have become quite complex to manage. Considering this enormous data, a term has been coined to represent it. This term is called Big Data. Big Data is the term coined to refer this huge amount of data. The concept of big data is fast spreading its arms all over the world.
In this chapter, we’ll discuss about the definition, categories and characteristics of big data.
INTRODUCTION TO BIG DATAThe process of storing and analysing data to make some sense for the organization is called big data. In simple terms, data which is very large in size and yet growing exponentially with time is called as Big data.
Fig 1.1: Big Data [1]
Big Data refers to the large volume of data which may be structured or unstructured and which make use of certain new technologies and techniques to handle it. An organized form of data is known as structured data while an unorganised form of data is known as unstructured data. Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, and updating and information privacy. There are five dimensions to big data known as Volume, Variety, Velocity and the recently added Veracity and Value. A consensual definition that states that "Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value". The data sets in big data are so large and complex that we cannot handle them using traditional application software. There are certain frameworks like Hadoop designed for processing big data. These techniques are also used to extract useful insights from data using predictive analysis, user behaviour, and analytics.
For any application that contains limited amount of data we normally use Sql/Postgresql/Oracle/MySQL, but what in case of large applications like Facebook, Google, YouTube? This data is so large and complex that none of the traditional data management system is able to store and process it.
Facebook generates 500+ TB data per day as people upload various images, videos, posts etc. Similarly sending text/multimedia messages, updating Facebook/WhatsApp status, comments etc. generates huge data. If we use traditional data processing applications (SQL/Oracle/MySQL) to handle it, it will lead to loss of efficiency. So in order to handle exponential growth of data, data analysis becomes a required task. To overcome this problem, we use Big data. Big data includes both structured and unstructured data.
Traditional data management systems and existing tools are facing difficulties to process such a big data. R is one of the main computing tools used in statistical education and research. It is also widely used for data analysis and numerical computing in scientific research.
WHERE DOES BIG DATA COME FROM?Social data : This could be data coming from social media services such as Facebook Likes, photos and videos uploads, putting comments, Tweets and YouTube views.Share Market: Stock exchange generates huge amount of data through its daily transaction.E-commerce site: E-commerce Sites like Flipkart, Amazon, Snapdeal generates huge amount of data.Airplane: Single airplane can generate 10+ TB of data in 30 minutes of a flight time. WHAT IS THE NEED FOR STORING SUCH HUGE AMOUNT OF DATA?The main reason behind storing data is analysis. Data analysis is a process used to clean, transform and remodel data with a view to reach to a certain conclusion for a given situation. More accurate analysis leads to better decision making and better decision making leads to increase in efficiency and risk reduction.
Example-
When we search anything on e-commerce websites (Flipkart, Amazon), we get some recommendations of product that we search. The analysis of data that we entered is done by these websites, and then accordingly the related products are displayed.Example - When we search any smart phone, we get recommendations to buy back covers, screen guard etc.
Similarly, why facebook stores our images, videos? The reason is advertisement.There are two types of marketing-
Global marketing - Show advertisement to all users.Target marketing - Show advertisement to particular groups/people. So in target marketing, facebook analyses its data and it shows advertisements to selected people.Example - If advertiser wants to advertise for cricket kit and he/she wants to show that advertisement to only interested set of people so facebook tracks a record of all those people who are member of cricket groups or post anything related to cricket and displays it to them.
CATEGORIES OF BIG DATA