E-Book
45,59 €

Python Social Media Analytics E-Book

Siddhartha Chatterjee

0,0

45,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Leverage the power of Python to collect, process, and mine deep insights from social media data

About This Book

Acquire data from various social media platforms such as Facebook, Twitter, YouTube, GitHub, and more
Analyze and extract actionable insights from your social data using various Python tools
A highly practical guide to conducting efficient social media analytics at scale

Who This Book Is For

If you are a programmer or a data analyst familiar with the Python programming language and want to perform analyses of your social data to acquire valuable business insights, this book is for you. The book does not assume any prior knowledge of any data analysis tool or process.

What You Will Learn

Understand the basics of social media mining
Use PyMongo to clean, store, and access data in MongoDB
Understand user reactions and emotion detection on Facebook
Perform Twitter sentiment analysis and entity recognition using Python
Analyze video and campaign performance on YouTube
Mine popular trends on GitHub and predict the next big technology
Extract conversational topics on public internet forums
Analyze user interests on Pinterest
Perform large-scale social media analytics on the cloud

In Detail

Social Media platforms such as Facebook, Twitter, Forums, Pinterest, and YouTube have become part of everyday life in a big way. However, these complex and noisy data streams pose a potent challenge to everyone when it comes to harnessing them properly and benefiting from them. This book will introduce you to the concept of social media analytics, and how you can leverage its capabilities to empower your business.

Right from acquiring data from various social networking sources such as Twitter, Facebook, YouTube, Pinterest, and social forums, you will see how to clean data and make it ready for analytical operations using various Python APIs. This book explains how to structure the clean data obtained and store in MongoDB using PyMongo. You will also perform web scraping and visualize data using Scrappy and Beautifulsoup.

Finally, you will be introduced to different techniques to perform analytics at scale for your social data on the cloud, using Python and Spark. By the end of this book, you will be able to utilize the power of Python to gain valuable insights from social media data and use them to enhance your business processes.

Style and approach

This book follows a step-by-step approach to teach readers the concepts of social media analytics using the Python programming language. To explain various data analysis processes, real-world datasets are used wherever required.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 312

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Python Social Media Analytics

Analyze and visualize data from Twitter, YouTube, GitHub, and more

Siddhartha Chatterjee

Michal Krystyanczuk

BIRMINGHAM - MUMBAI

< html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

Python Social Media Analytics

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2017

Production reference: 1260717

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78712-148-5

www.packtpub.com

Credits

Authors

Siddhartha Chatterjee

Michal Krystyanczuk

Copy Editor

Safis Editing

Reviewer

Ruben Oliva Ramos

Project Coordinator

Nidhi Joshi

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Divya Poojari

Indexer

Tejal Daruwale Soni

Content Development Editor

Cheryl Dsa

Graphics

Tania Dutta

Technical Editor

Vivek Arora

Production Coordinator

Arvindkumar Gupta

About the Authors

Siddhartha Chatterjee is an experienced data scientist with a strong focus in the area of machine learning and big data applied to digital (e-commerce and CRM) and social media analytics.

He worked between 2007 to 2012 with companies such as IBM, Cognizant Technologies, and Technicolor Research and Innovation. He completed a Pan-European Masters in Data Mining and Knowledge Management at Ecole Polytechnique of the University of Nantes and University of Eastern Piedmont, Italy.

Since 2012, he has worked at OgilvyOne Worldwide, a leading global customer engagement agency in Paris, as a lead data scientist and set up the social media analytics and predictive analytics offering. From 2014 to 2016, he was a senior data scientist and head of semantic data of Publicis, France. During his time at Ogilvy and Publicis, he worked on international projects for brands such as Nestle, AXA, BNP Paribas, McDonald's, Orange, Netflix, and others. Currently, Siddhartha is serving as head of data and analytics of Groupe Aeroport des Paris.

Michal Krystyanczuk is the co-founder of The Data Strategy, a start-up company based in Paris that builds artificial intelligence technologies to provide consumer insights from unstructured data. Previously, he worked as a data scientist in the financial sector using machine learning and big data techniques for tasks such as pattern recognition on financial markets, credit scoring, and hedging strategies optimization.

He specializes in social media analysis for brands using advanced natural language processing and machine learning algorithms. He has managed semantic data projects for global brands, such as Mulberry, BNP Paribas, Groupe SEB, Publicis, Chipotle, and others.

He is an enthusiast of cognitive computing and information retrieval from different types of data, such as text, image, and video.

Acknowledgments

This book is a result of our experience with data science and working with huge amounts of unstructured data from the web. Our intention was to provide a practical book on social media analytics with strong storytelling. In the whole process of analytics, the scripting of a story around the results is as important as the technicalities involved. It's been a long journey, chapter to chapter, and it would not have been possible without our support team that has helped us all through. We would like to deeply thank our mentors, Air commodore TK Chatterjee (retired) and Mr. Wojciech Krystyanczuk, who have motivated and helped us with their feedback, edits, and reviews throughout the journey.

We would also like to thank our co-author, Mr. Arjun Chatterjee, for sharing his brilliant technical knowledge and writing the chapter on Social Media Analytics at Scale. Above all, we would also like to thank the Packt editorial team for their encouragement and patience with us. We sincerely hope that the readers will find this book useful in their efforts to explore social media for creative purposes.

About the Reviewer

Ruben Oliva Ramos is a computer systems engineer with a master's degree in computer and electronic systems engineering, teleinformatics, and networking specialization from University of Salle Bajio in Leon, Guanajuato, Mexico. He has more than five years of experience in developing web applications to control and monitor devices connected with Arduino and Raspberry Pi using web frameworks and cloud services to build Internet of Things applications.

He is a mechatronics teacher at University of Salle Bajio and teaches students studying the master's degree in Design and Engineering of Mechatronics Systems. He also works at Centro de Bachillerato Tecnologico Industrial 225 in Leon, Guanajuato, Mexico, teaching electronics, robotics and control, automation, and microcontrollers at Mechatronics Technician Career. He has worked on consultant and developer projects in areas such as monitoring systems and datalogger data using technologies such as Android, iOS, Windows Phone, Visual Studio .NET, HTML5, PHP, CSS, Ajax, JavaScript, Angular, ASP .NET databases (SQlite, MongoDB, and MySQL), and web servers (Node.js and IIS). Ruben has done hardware programming on Arduino, Raspberry Pi, Ethernet Shield, GPS, and GSM/GPRS, ESP8266, and control and monitor systems for data acquisition and programming.

I would like to thank my savior and lord, Jesus Christ, for giving me strength and courage to pursue this project, to my dearest wife, Mayte, our two lovely sons, Ruben and Dario. To my father, Ruben, my dearest mom, Rosalia, my brother, Juan Tomas, and my sister, Rosalia, whom I love, for all their support while reviewing this book, for allowing me to pursue my dream, and tolerating not being with them after my busy day job.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787121488.

If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Introduction to the Latest Social Media Landscape and Importance

Introducing social graph

Notion of influence

Social impacts

Platforms on platform

Delving into social data

Understanding semantics

Defining the semantic web

Exploring social data applications

Understanding the process

Working environment

Defining Python

Selecting an IDE

Illustrating Git

Getting the data

Defining API

Scraping and crawling

Analyzing the data

Brief introduction to machine learning

Techniques for social media analysis

Setting up data structure libraries

Visualizing the data

Getting started with the toolset

Summary

Harnessing Social Data - Connecting, Capturing, and Cleaning

APIs in a nutshell

Different types of API

RESTful API

Stream API

Advantages of social media APIs

Limitations of social media APIs

Connecting principles of APIs

Introduction to authentication techniques

What is OAuth?

User authentication

Application authentication

Why do we need to use OAuth?

Connecting to social network platforms without OAuth

OAuth1 and OAuth2

Practical usage of OAuth

Parsing API outputs

Twitter

Creating application

Selecting the endpoint

Using requests to connect

Facebook

Creating an app and getting an access token

Selecting the endpoint

Connect to the API

GitHub

Obtaining OAuth tokens programmatically

Selecting the endpoint

Connecting to the API

YouTube

Creating an application and obtaining an access token programmatically

Selecting the endpoint

Connecting to the API

Creating an application

Selecting the endpoint

Connecting to the API

Basic cleaning techniques

Data type and encoding

Structure of data

Pre-processing and text normalization

Duplicate removal

MongoDB to store and access social data

Installing MongoDB

Setting up the environment

Starting MongoDB

MongoDB using Python

Summary

Uncovering Brand Activity, Popularity, and Emotions on Facebook

Facebook brand page

The Facebook API

Project planning

Scope and process

Data type

Analysis

Step 1 – data extraction

Step 2 – data pull

Step 3 – feature extraction

Step 4 – content analysis

Keywords

Extracting verbatims for keywords

User keywords

Brand posts

User hashtags

Noun phrases

Brand posts

User comments

Detecting trends in time series

Maximum shares

Brand posts

User comments

Maximum likes

Brand posts

Comments

Uncovering emotions

How to extract emotions?

Introducing the Alchemy API

Connecting to the Alchemy API

Setting up an application

Applying Alchemy API

How can brands benefit from it?

Summary

Analyzing Twitter Using Sentiment Analysis and Entity Recognition

Scope and process

Getting the data

Getting Twitter API keys

Data extraction

REST API Search endpoint

Rate Limits

Streaming API

Data pull

Data cleaning

Sentiment analysis

Customized sentiment analysis

Labeling the data

Creating the model

Model performance evaluation and cross-validation

Confusion matrix

K-fold cross-validation

Named entity recognition

Installing NER

Combining NER and sentiment analysis

Summary

Campaigns and Consumer Reaction Analytics on YouTube – Structured and Unstructured

Scope and process

Getting the data

How to get a YouTube API key

Data pull

Data processing

Data analysis

Sentiment analysis in time

Sentiment by weekday

Comments in time

Number of comments by weekday

Summary

The Next Great Technology – Trends Mining on GitHub

Scope and process

Getting the data

Rate Limits

Connection to GitHub

Data pull

Data processing

Textual data

Numerical data

Data analysis

Top technologies

Programming languages

Programming languages used in top technologies

Top repositories by technology

Comparison of technologies in terms of forks, open issues, size, and watchers count

Forks versus open issues

Forks versus size

Forks versus watchers

Open issues versus Size

Open issues versus Watchers

Size versus watchers

Summary

Scraping and Extracting Conversational Topics on Internet Forums

Scope and process

Getting the data

Introduction to scraping

Scrapy framework

How it works

Related tools

Creating a project

Creating spiders

Teamspeed forum spider

Data pull and pre-processing

Data cleaning

Part-of-speech extraction

Data analysis

Introduction to topic models

Latent Dirichlet Allocation

Applying LDA to forum conversations

Topic interpretation

Summary

Demystifying Pinterest through Network Analysis of Users Interests

Scope and process

Getting the data

Pinterest API

Step 1 - creating an application and obtaining app ID and app secret

Step 2 - getting your authorization code (access code)

Step 3 - exchanging the access code for an access token

Step 4 - testing the connection

Getting Pinterest API data

Scraping Pinterest search results

Building a scraper with Selenium

Scraping time constraints

Data pull and pre-processing

Pinterest API data

Bigram extraction

Building a graph

Pinterest search results data

Bigram extraction

Building a graph

Data analysis

Understanding relationships between our own topics

Finding influencers

Conclusions

Community structure

Summary

Social Data Analytics at Scale – Spark and Amazon Web Services

Different scaling methods and platforms

Parallel computing

Distributed computing with Celery

Celery multiple node deployment

Distributed computing with Spark

Text mining With Spark

Topic models at scale

Spark on the Cloud – Amazon Elastic MapReduce

Summary

Preface

Social media in the last decade has taken the world by storm. Billions of interactions take place around the world among the different users of Facebook, Twitter, YouTube, online forums, Pinterest, GitHub, and others. All these interactions, either captured through the data provided by the APIs of these platforms or through custom crawlers, have become a hotbed of information and insights for organizations and scientists around the world. Python Social Media Analytics has been written to show the most practical means of capturing this data, cleaning it, and making it relevant for advanced analytics and insight hunting. The book will cover basic to advanced concepts for dealing with highly unstructured data, followed by extensive analysis and conclusions to give sense to all of the processing.

What this book covers

Chapter 1, Introduction to the Latest Social Media Landscape and Importance, covers the updated social media landscape and key figures. We also cover the technical environment around Python, algorithms, and social networks, which we later explain in detail.

Chapter 2, Harnessing Social Data - Connecting, Capturing, and Cleaning, introduces methods to connect to the most popular social networks. It involves the creation of developer applications on chosen social media and then using Python libraries to make connections to those applications and querying the data. We take you through the advantages and limitations of each social media platform, basic techniques to clean, structure, and normalize the data using text mining and data pre-processing. Finally, you are introduced to MongoDB and essential administration methods.

Chapter 3, Uncovering Brand Activity, Emotions, and Popularity on Facebook, introduces the role of Facebook for brand activity and reputation. We will also introduce you to the Facebook API ecosystem and the methodology to extract data. You will learn the concepts of feature extraction and content analysis using keywords, hashtags, noun phrases, and verbatim extraction to derive insights from a Facebook brand page. Trend analysis on time-series data, and emotion analysis via the AlchemyAPI from IBM, are also introduced.

Chapter 4, Analyzing Twitter Using Sentiment Analysis and Entity Recognition, introduces you to Twitter, its uses, and the methodology to extract data using its REST and Streaming APIs using Python. You will learn to perform text mining techniques, such as stopword removal, stemming using NLTK, and more customized cleaning such as device detection. We will also introduce the concept and application of sentiment analysis using a popular Python library, VADER. This chapter will demonstrate the classification technique of machine learning to build a custom sentiment analysis algorithm.

Chapter 5, Campaigns and Consumer Reaction Analytics on YouTube - Structured and Unstructured, demonstrates the analysis of both structured and unstructured data, combining the concepts we learned earlier with newer ones. We will explain the characteristics of YouTube and how campaigns and channel popularity are measured using a combination of traffic and sentiment data from user comments. This will also serve as an introduction to the Google developer platform needed to access and extract the data.

Chapter 6, The Next Great Technology - Trends Mining on GitHub, introduces you to GitHub, its API, and characteristics. This chapter will demonstrate how to analyze trends on GitHub to discover projects and technologies that gather the most interest from users. We use GitHub data around repositories such as watchers, forks, and open issues to while making interesting analysis to infer the most emerging projects and technologies.

Chapter 7, Scraping and Extracting Conversational Topics on Internet Forums, introduces public consumer forums with real-world examples and explains the importance of forum conversations for extracting insights about people and topics. You will learn the methodology to extract forum data using Scrapy and BeautifulSoup in Python. We'll apply the preceding techniques on a popular car forum and use Topic Models to analyze all the conversations around cars.

Chapter 8, Demystifying Pinterest through Network Analysis of Users Interests, introduces an emerging and important social network, Pinterest, along with the advanced social network analysis concept of Graph Mining. Along with the Pinterest API, we will introduce the technique of advanced scraping using Selenium. You will learn to extract data from Pinterest to build a graph of pins and boards. The concepts will help you analyze and visualize the data to find the most influential topics and users on Pinterest. You will also be introduced to the concept of community detection using Python modules.

Chapter 9, Social Data Analytics at Scale - Spark and Amazon Web Services, takes the reader on a tour of distributed and parallel computing. This chapter will be an introduction to implementing Spark, a popular open source cluster-computing framework. You will learn to get Python scripts ready to run at scale and execute Spark jobs on the Amazon Web Services cloud.

What you need for this book

The goal of the book is to explain the concept of social media analytics and demonstrate its applications using Python. We use Python 3 for the different concepts and projects in the book. You will need a Linux/macOS or Windows machine with Python 3 and an IDE of your choice (Sublime Text 2, Atom, or gedit). All the libraries presented in the chapters can be easily installed with pip package manager. It is advisable to use the Python library Jupyter to work in notebook mode.

The data will be stored in MongoDB, which is compatible with all operating systems. You can follow the installation instruction on the official website (https://www.mongodb.com).

Lastly, a good internet connection is a must to be able to process big volumes of data from social networks.

Who this book is for

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Let's first create a project calledtutorial."

A block of code is set as follows:

#import packages into the project from bs4 import BeautifulSoup from urllib.request import urlopen import pandas as pd

Any command-line input or output is written as follows:

mkdir tutorial

cd tutorial

scrapy startproject tutorial

New terms and important words are shown in bold.

Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "On a forum, usually the depth of pages is between three and five due to the standard structure such as Topics | Conversations | Threads, which means the spider usually has to travel three to five levels of depth to actually reach the conversational data."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support, and register to have the files e-mailed directly to you. You can download the code files by following these steps:

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

Enter the name of the book in the

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Python-Social-Media-Analytics. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support, and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Introduction to the Latest Social Media Landscape and Importance

Have you seen the movie The Social Network? If you have not, it could be a good idea to see it before you read this book. If you have, you may have seen the success story around Mark Zuckerberg and his company Facebook. This was possible due to power of the platform in connecting, enabling, sharing, and impacting the lives of almost two billion people on this planet.

The earliest social networks existed as far back as 1995; such as Yahoo (Geocities), theglobe.com, and tripod.com. These platforms were mainly to facilitate interaction among people through chat rooms. It was only at the end of the 90s that user profiles became the in thing in social networking platforms, allowing information about people to be discoverable, and therefore, providing a choice to make friends or not. Those embracing this new methodology were Makeoutclub, Friendster, SixDegrees.com, and so on.

MySpace, LinkedIn, and Orkut were thereafter created, and the social networks were on the verge of becoming mainstream. However, the biggest impact happened with the creation of Facebook in 2004; a total game changer for people's lives, business, and the world. The sophistication and the ease of using the platform made it into mainstream media for individuals and companies to advertise and sell their ideas and products. Hence, we are in the age of social media that has changed the way the world functions.

Since the last few years, there have been new entrants in the social media, which are essentially of different interaction models as compared to Facebook, LinkedIn, or Twitter. These are Pinterest, Instagram, Tinder, and others. Interesting example is Pinterest, which unlike Facebook, is not centered around people but is centered around interests and/or topics. It's essentially able to structure people based on their interest around these topics. CEO of Pinterest describes it as a catalog of ideas. Forums which are not considered as regular social networks, such as Facebook, Twitter, and others, are also very important social platforms. Unlike in Twitter or Facebook, forum users are often anonymous in nature, which enables them to make in-depth conversations with communities. Other non-typical social networks are video sharing platforms, such as YouTube and Dailymotion. They are non-typical because they are centered around the user-generated content, and the social nature is generated by the sharing of these content on various social networks and also the discussion it generates around the user commentaries. Social media is gradually changing from being platform centric to focusing more on experiences and features. In the future, we'll see more and more traditional content providers and services becoming social in nature through sharing and conversations. The term social media today includes not just social networks but every service that's social in nature with a wide audience.

To understand the importance of social media, it's interesting to look at the statistics of these platforms. It's estimated that out of around 3.4 billion internet users, 2.3 billion of them are active social media users. This is a staggering number, reinforcing the enormous importance of social media. In terms of users of individual social media platforms, Facebook leads the way with almost 1.6 billion active users. You must have heard the adage that if Facebook were a country, it would be second largest one after China and ahead of India. Other social platforms linked to Facebook are also benefiting from this user base, such as WhatsApp, hosting 1 billion users on its chat application, and Instagram, with 400 million on its image sharing social network.

Among other platforms, Tumblr and Twitter lead the way with 550 million and 320 million active users respectively. LinkedIn, the world's most popular professional social media has 100 million active users. Pinterest, which is a subject of a later chapter, also has 100 million active users. Seina and Weibo, the equivalents of Facebook and Twitter in China, alone host 222 million active users. In terms of growth and engagement, Facebook is still the fastest growing social media, way ahead of the rest. If we look at engagement, millennials (age group 18-34) spend close to 100 minutes on average per person per month on Facebook. The number is way lower for others. Among user-generated content and sharing platforms, YouTube is a leader with 300 hours of video uploaded every minute and 3.25 billion hours of video watched every month.

In this chapter, we will cover the following topics:

Social graph

Introduction to the latest social media landscape and importance

What does social data mean in the modern world?

Tools and their specificities to mine the social web (Python, APIs, and machine learning)

Introducing social graph

A social graph is created through this widespread interaction and exchange of information on social media. A social graph is a massive network that illustrates the relations between individuals and information on the internet. Facebook owns the largest social graph of relations of 1.5 billion users. Every social media has its own social graph. The nature of social graph and its utility can be of various types, based on the types of relations described as follows. We will show a concrete example of a piece of the social graph and how to analyze it.

User graph

: This is a network that shows the relationships between users or individuals connected to each other.

Content graph

: As there are billions of content being uploaded on social media, there is a relationship existing between different types of content (text, images, videos, or multimedia). These relations could be based on semantic sense around those content, or in-bond or out-bond links between them, like that of the Google's page rank.

Interest graph

: The interest graph takes the original graphs a step further, where individuals on the social media or the internet are not related based on their mere links, like being added as a friend or followed on Twitter, but on their mutual interests. This has a huge advantage over standard social graph, in the sense that it leads to finding communities of people with similar interests. Even if these people have no interaction or know each other personally, there is an inherent link based on their interests and passions.

Notion of influence

This massive growth and interaction on the social web is leading the way to understand these individuals. Like in a society there are influencers, the same phenomenon is getting replicated on the social web. There are people who have more influence over other users. The process of finding influencers and calculating influence is becoming an important science. If you have used a service called Klout, you'll know what we are talking about. Klout gives a 'social influence score' based on your social media activities. There are questions about the relevance of such scores, but that's only because the influence of a person is a very relative topic. In fact, in our view, no one is an influencer while everyone's an influencer. This can sound very confusing but what we are trying to say is that influence is relative. Someone who is an influencer to you may not be an influencer to another. If you need admission of your child to a school, the principal of the school is an influencer, but if you are seeking admission to a university, the same principal is not an influencer to you. This confusion makes the topic super exciting; trying to understand human dynamics and then figuring out who influences whom and how. Merely having thousands of followers on Twitter doesn't make one an influencer but the influencer of his or her followers and the way they are influenced to take action, sure does. Our book will not get into detailed aspects of influence but it's important to keep in mind this notion while trying to understand social media analytics.

Social impacts

Social media is already having a profound influence on both society and business. The societal impact has been both psychological and behavioral. Various events, crises, and issues in the world have received a boost because of the use of social media by millions of people. Stating a few examples would be that of the Arab Spring and the refugee crisis. In environmental crisis, such as earthquakes, social media like Twitter has proved in accelerating information and action because of its immediacy of dissemination and spread.

Platforms on platform

Social media companies like Facebook started presenting their technology as a platform, where programmers could build further tools to give rise to more social experiences, such as games, contests, and quizzes, which in turn gave rise to social interactions and experiences beyond mere conversational interaction. Today, there is a range of tools that allows one to build over the platforms. Another application of this is to gather intelligence through the data collected from these platforms. Twitter shares a lot of its data around the usage of its platform with programmers and companies. Similarly, most of the popular social networks have started sharing their data with developers and data warehousing companies. Sharing their data serves revenue growth, and is also a very interesting source for researchers and marketers to learn about people and the world.

Delving into social data

The data acquired from social media is called social data. Social data exists in many forms.

The types of social media data can be information around the users of social networks, like name, city, interests, and so on. These types of data that are numeric or quantifiable are known as structured data.

However, since social media are platforms for expression, a lot of the data is in the form of texts, images, videos, and such. These sources are rich in information, but not as direct to analyze as structured data described earlier. These types of data are known as unstructured data.

The process of applying rigorous methods to make sense of the social data is called social data analytics. In the book, we will go into great depth in social data analytics to demonstrate how we can extract valuable sense and information from these really interesting sources of social data. Since there are almost no restrictions on social media, there are lot of meaningless accounts, content, and interactions. So, the data coming out of these streams is quite noisy and polluted. Hence, a lot of effort is required to separate the information from the noise. Once the data is cleaned and we are focused on the most important and interesting aspects, we then require various statistical and algorithmic methods to make sense out of the filtered data and draw meaningful conclusions.

Understanding semantics

A concept important to understand when handling unstructured data is semantics. Dictionaries define the term as the branch of linguistics and logic concerned with meaning.

It is a concept that comes from linguistic science and philosophy, to deal with the study and research of meaning. These meanings are uncovered by understanding the relationship between words, phrases, and all types of symbols. From a social media point of view, symbol could be the popular emoticons, which are not exactly formal language but they signify emotions. These symbols can be extended to images and videos, where patterns in their content can be used to extract meanings. In the later chapters, we will show few techniques that can help you to get meaning out of textual data. Extracting meaning or sense from images and videos is out of scope for the book. Semantic technology is very central to effectively analyzing unstructured social data.

For effectively extracting sense out of social data, semantic technologies have underlying artificial intelligence or machine learning algorithms. These algorithms allow you to find patterns in the data, which are then humanly interpreted. That's why social data analytics is so exciting, as it brings together knowledge from the fields of semantics and machine learning, and then binds it with sociology for business or other objectives.