Learning Social Media Analytics with R - Raghav Bali - E-Book

Learning Social Media Analytics with R E-Book

Raghav Bali

0,0
39,59 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The Internet has truly become humongous, especially with the rise of various forms of social media in the last decade, which give users a platform to express themselves and also communicate and collaborate with each other. This book will help the reader to understand the current social media landscape and to learn how analytics can be leveraged to derive insights from it. This data can be analyzed to gain valuable insights into the behavior and engagement of users, organizations, businesses, and brands. It will help readers frame business problems and solve them using social data.
The book will also cover several practical real-world use cases on social media using R and its advanced packages to utilize data science methodologies such as sentiment analysis, topic modeling, text summarization, recommendation systems, social network analysis, classification, and clustering. This will enable readers to learn different hands-on approaches to obtain data from diverse social media sources such as Twitter and Facebook. It will also show readers how to establish detailed workflows to process, visualize, and analyze data to transform social data into actionable insights.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Learning Social Media Analytics with R
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with R and Social Media Analytics
Understanding social media
Advantages and significance
Disadvantages and pitfalls
Social media analytics
A typical social media analytics workflow
Data access
Data processing and normalization
Data analysis
Insights
Opportunities
Challenges
Getting started with R
Environment setup
Data types
Data structures
Vectors
Arrays
Matrices
Lists
DataFrames
Functions
Built-in functions
User-defined functions
Controlling code flow
Looping constructs
Conditional constructs
Advanced operations
apply
lapply
sapply
tapply
mapply
Visualizing data
Next steps
Getting help
Managing packages
Data analytics
Analytics workflow
Machine learning
Machine learning techniques
Supervised learning
Unsupervised learning
Text analytics
Summary
2. Twitter – What's Happening with 140 Characters
Understanding Twitter
APIs
Registering an application
Connecting to Twitter using R
Extracting sample Tweets
Revisiting analytics workflow
Trend analysis
Sentiment analysis
Key concepts of sentiment analysis
Subjectivity
Sentiment polarity
Opinion summarization
Features
Sentiment analysis in R
Follower graph analysis
Challenges
Summary
3. Analyzing Social Networks and Brand Engagements with Facebook
Accessing Facebook data
Understanding the Graph API
Understanding Rfacebook
Understanding Netvizz
Data access challenges
Analyzing your personal social network
Basic descriptive statistics
Analyzing mutual interests
Build your friend network graph
Visualizing your friend network graph
Analyzing node properties
Degree
Closeness
Betweenness
Analyzing network communities
Cliques
Communities
Analyzing an English football social network
Basic descriptive statistics
Visualizing the network
Analyzing network properties
Diameter
Page distances
Density
Transitivity
Coreness
Analyzing node properties
Degree
Closeness
Betweenness
Visualizing correlation among centrality measures
Eigenvector centrality
PageRank
HITS authority score
Page neighbours
Analyzing network communities
Cliques
Communities
Analyzing English Football Club's brand page engagements
Getting the data
Curating the data
Visualizing post counts per page
Visualizing post counts by post type per page
Visualizing average likes by post type per page
Visualizing average shares by post type per page
Visualizing page engagement over time
Visualizing user engagement with page over time
Trending posts by user likes per page
Trending posts by user shares per page
Top influential users on popular page posts
Summary
4. Foursquare – Are You Checked in Yet?
Foursquare – the app and data
Foursquare APIs – show me the data
Creating an application – let me in
Data access – the twist in the story
Handling JSON in R – the hidden art
Getting category data – introduction to JSON parsing and data extraction
Revisiting the analytics workflow
Category trend analysis
Getting the data – the usual hurdle
The required end point
Getting data for a city – geometry to the rescue
Analysis – the fun part
Basic descriptive statistics – the usual
Recommendation engine – let's open a restaurant
Recommendation engine – the clichés
Framing the recommendation problem
Building our restaurant recommender
The sentimental rankings
Extracting tips data – the go to step
The actual data
Analysis of tips
Basic descriptive statistics
The final rankings
Venue graph – where do people go next?
Challenges for Foursquare data analysis
Summary
5. Analyzing Software Collaboration Trends I – Social Coding with GitHub
Environment setup
Understanding GitHub
Accessing GitHub data
Using the rgithub package for data access
Registering an application on GitHub
Accessing data using the GitHub API
Analyzing repository activity
Analyzing weekly commit frequency
Analyzing commit frequency distribution versus day of the week
Analyzing daily commit frequency
Analyzing weekly commit frequency comparison
Analyzing weekly code modification history
Retrieving trending repositories
Analyzing repository trends
Analyzing trending repositories created over time
Analyzing trending repositories updated over time
Analyzing repository metrics
Visualizing repository metric distributions
Analyzing repository metric correlations
Analyzing relationship between stargazer and repository counts
Analyzing relationship between stargazer and fork counts
Analyzing relationship between total forks, repository count, and health
Analyzing language trends
Visualizing top trending languages
Visualizing top trending languages over time
Analyzing languages with the most open issues
Analyzing languages with the most open issues over time
Analyzing languages with the most helpful repositories
Analyzing languages with the highest popularity score
Analyzing language correlations
Analyzing user trends
Visualizing top contributing users
Analyzing user activity metrics
Summary
6. Analyzing Software Collaboration Trends II - Answering Your Questions with StackExchange
Understanding StackExchange
Data access
The StackExchange data dump
Accessing data dumps
Contents of data dumps
Quick overview of the data in data dumps
Posts
Users
Getting started with data dumps
Data Science and StackExchange
Demographics and data science
Challenges
Summary
7. Believe What You See – Flickr Data Analysis
A Flickr-ing world
Accessing Flickr's data
Creating the Flickr app
Connecting to R
Getting started with Flickr data
Understanding Flickr data
Understanding more about EXIF
Understanding interestingness – similarities
Finding K
Elbow method
Silhouette method
Are your photos interesting?
Preparing the data
Building the classifier
Challenges
Summary
8. News – The Collective Social Media!
News data – news is everywhere
Accessing news data
Creating applications for data access
Data extraction – not just an API call
The API call and JSON monster
HTML scraping from the links – the bigger monster
Sentiment trend analysis
Getting the data – not again
Basic descriptive statistics – the usual
Numerical sentiment trends
Emotion-based sentiment trends
Topic modeling
Getting to the data
Basic descriptive analysis
Topic modeling for Mr. Trump's phases
Cleaning the data
Pre-processing the data
The modeling part
Analysis of topics
Summarizing news articles
Document summarization
Understanding LexRank
Summarizing articles with lexRankr
Challenges to news data analysis
Summary
Index

Learning Social Media Analytics with R

Learning Social Media Analytics with R

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2017

Production reference: 1220517

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-752-4

www.packtpub.com

Credits

Authors

Raghav Bali

Dipanjan Sarkar

Tushar Sharma

Reviewer

Karthik Ganapathy

Commissioning Editor

Amey Varangaonkar

Acquisition Editor

Tushar Gupta

Content Development Editor

Amrita Noronha

Technical Editor

Akash Patel

Copy Editors

Vikrant Phadkay

Safis Editing

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Pratik Shirodkar

Graphics

Tania Dutta

Production Coordinator

Shantanu Zagade

Cover Work

Shantanu Zagade

About the Author

Raghav Bali has a master's degree (gold medalist) in information technology from International Institute of Information Technology, Bangalore. He is a data scientist at Intel, the world's largest silicon company, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has worked as an analyst and developer in domains such as ERP, finance, and BI with some of the top companies of the world.

Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He recently co-authored a book on machine learning titled R Machine Learning by Example, Packt Publishing. He is a shutterbug, capturing moments when he isn't busy solving problems.

I would like to express my gratitude to my family, teachers, friends, colleagues and mentors who have encouraged, supported and taught me over the years. I would also like to take this opportunity to thank my co-authors and good friends Dipanjan Sarkar and Tushar Sharma, who made this project a memorable and 
enjoyable one.

I would like to thank Tushar Gupta, Amrita Noronha, Akash Patel, and Packt for the opportunity and their support throughout this journey. Last but not least, thanks to the R community for the amazing stuff that they do!

Dipanjan Sarkar is a data scientist at Intel, the world's largest silicon company, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in information technology with specializations in data science and software engineering from the International Institute of Information Technology, Bangalore.

Dipanjan has been an analytics practitioner for over 5 years now, specializing in statistical, predictive, and text analytics. He has also authored several books on machine learning and analytics including R Machine Learning by Example and What you need to know about R, Packt. Besides this, he occasionally spends time reviewing technical books and courses. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups and data science. In his spare time he loves reading, gaming, watching popular sitcoms and football.

I am indebted to my parents, partner, friends, and well-wishers for always standing by my side and supporting me in all my endeavors. Your support keeps me going day in and day out to take on new challenges! I would also like to thank my good friends and fellow colleagues, Raghav Bali and Tushar Sharma, for co-authoring and making the experience more enjoyable. Last but never the least, I would like to thank Tushar Gupta, Amrita Noronha, Akash Patel, and Packt for giving me this wonderful opportunity to share my knowledge and experiences with analytics and R enthusiasts out there who are doing truly amazing things every day. And a big thumbs up to the R community for building an excellent analytics ecosystem.

Tushar Sharma has a master's degree specializing in data science from the International Institute of Information Technology, Bangalore. He works as a data scientist with Intel. In his previous job he used to work as a research engineer for a financial consultancy firm. His work involves handling big data at scale generated by the massive infrastructure at Intel. He engineers and delivers end to end solutions on this data using the latest machine learning tools and frameworks. He is proficient in R, Python, Spark, and mathematical aspects of machine learning among other things.

Tushar has a keen interest in everything related to technology. He likes to read a wide array of books ranging from history to philosophy and beyond. He is a running enthusiast and likes to play badminton and tennis.

I would like to express my gratitude to my family, teachers and friends who have encouraged, supported and taught me over the years. Special thanks to my classmates, friends, and colleagues, Dipanjan Sarkar and Raghav Bali for co-authoring and making this journey wonderful through their input and eye for detail.

I would like to thank Tushar Gupta, Amrita Noronha, and Packt for the opportunity and their support throughout the journey.

About the Reviewer

Karthik Ganapathy is an analytics professional with over 12 years of professional experience in analytics, predictive modeling, and project management. He has worked with several Fortune 500 clients and helped them derive business value using data.

I would like to thank my wife Sudharsana and my daughter 
Amrita for being a great support during the period I was 
reviewing the content.

www.PacktPub.com

eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787127524. If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

The Internet has truly grown to be humongous, especially in the last decade, with the rise of various forms of social media that give users a platform to express themselves and also communicate and collaborate with each other. The current social media landscape is a complex mesh of social network platforms and applications, catering to specific audiences with unique as well as overlapping features. Each of these social networks are potential gold mines of data which are being (and can be) used to study, leverage and improve our understanding of demographics, behaviors, collaboration, user engagement, branding and so on across different domains and spheres of our lives.

This book will help the reader to understand the current social media landscape and help in understanding how analytics and machine learning can be leveraged to derive insights from social media data. It will enable readers to utilize R and its ecosystem to visualize and analyze data from different social networks. This book will also leverage machine learning, data science and other advanced concepts and techniques to solve real-world use cases spread across diverse social network domains including Twitter, Facebook, GitHub, FourSquare, StackExchange, Flickr, and more.

What this book covers

Chapter 1, Getting Started with R and Social Media Analytics, builds on foundations related to social media platforms and analyzing data relevant to social media. A concise introduction to R is given, including coverage of R syntax, data constructs, and functions. Basic concepts from machine learning, data analytics, and text analytics are also covered, setting the tone for the content in subsequent chapters.

Chapter 2, Twitter – What's Happening with 140 Characters, sets the theme for social media analytics with a focus on Twitter. It leverages R packages to extract and analyze Twitter data to uncover interesting insights through multiple use-cases, involving machine learning techniques such as trend analysis, sentiment analysis, clustering, and social graph analysis.

Chapter 3, Analyzing Social Networks and Brand Engagements with Facebook, focuses on analyzing data from perhaps the most popular social network in the world—Facebook! Readers will learn how to use the Graph API to retrieve data as well as use frameworks such as Netvizz to extract brand page data. Techniques to analyze personal social networks will be covered in detail. Besides this, readers will gain conceptual knowledge about social network analysis and graph theory. This knowledge will be used in action by analyzing a huge network of football brand pages to understand relationships, page engagement, and popularity.

Chapter 4, Foursquare – Are You Checked in Yet?, targets the popular social media channel Foursquare. Readers will learn how to collect this data using the Foursquare APIs. Steps for visualizing and analyzing this data will be depicted to uncover insights into user behavior. This data will be used to define and solve some analytics use-cases, which include sentiment analysis, graph analytics, and much more.

Chapter 5, Analyzing Software Collaboration Trends I – Social Coding with GitHub, introduces the popular social coding and collaboration platform GitHub for analyzing software collaboration trends. Readers will gain insights into using the GitHub API from R to extract useful data pertaining to users and repositories. Detailed analyzes of repository activity, repository trends, language trends, and user trends will be presented with real-world examples.

Chapter 6, Analyzing Software Collaboration Trends II – Answering Your Questions with StackExchange, introduces the StackExchange platform through its data organization and access methods. Readers learn and uncover interesting collaboration, demographic, and other patterns through use cases which leverage visualizations and different analysis techniques learned in previous chapters.

Chapter 7, Believe What You See – Flickr Data Analysis, presents Flickr through its APIs and uses some amazing packages such as piper, dplyr, and so on to extract data and insights from some complex data formats. The chapter also leverages machine learning concepts like clustering and classification to better understand Flickr.

Chapter 8, News – The Collective Social Media!, deals with analysis of free and unstructured text. Readers will learn how to collect news data from web sources using methodologies like scraping. The basic analysis on the textual data will consist of various statistical measures. Readers will also gain hands-on knowledge on advanced analysis like sentiment analysis, topic modeling, and text summarization on news data based on some interesting use cases.

What you need for this book

Chapter number

Software required (with version)

Hardware specifications

OS required

1-8

R 3.3.x (or higher)

RStudio Desktop 1.0.x

At least 1 GB of RAM, a mouse, and enough disk space for recovered files, image files, and so onA network connection for installing packages, connecting to social networks, and downloading datasets

An Intel/AMD-compatible platform running Windows 2000/XP/2003/Vista/7/8/2012 Server/8.1/10 or any Unix-based OS

Who this book is for

This book is for IT professionals, data scientists, analysts, developers, machine learning enthusiasts, social media marketers, and anyone with a keen interest in data, analytics, and generating insights from social data. Some background experience in R would be helpful but is not necessary. The book has been written keeping in mind the varying levels of expertise of its readers. It also includes links, pointers, and exercises for intermediate to advanced readers to explore further.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to [email protected], and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

You can download the code files by following these steps:Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Social-Media-Analytics-with-R. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/LearningSocialMediaAnalyticswithR_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting Started with R and Social Media Analytics

The invention of computers, digital electronics, social media, and the Internet have truly ushered us from the industrial age into the information age. The Internet, and more specifically the invention of World Wide Web in the early 1990s, helped people to build an inter-connected universal platform where information can be stored, shared and consumed by anyone with an electronic device capable of connecting to the Web. This has led to the creation of vast amounts of information, ideas and opinions which people, brands, organizations and businesses want to share with everyone around the world. So, social media was born which provides interactive platforms to post content, share ideas, messages and opinions about everything under the sun.

This book will take you on a journey to understand various popular social media, analyzing rich data generated by these media and gaining valuable insights. We will focus on social media which cater to audiences in different forms, like micro-blogging, social networking, software collaboration, news and media sharing platforms. The main objective is to use standardized data access and retrieval techniques using social media application programming interfaces (APIs) to gather data from these websites and apply different data mining, statistical and machine learning, and natural language processing techniques on the data by leveraging the R programming language. This book will provide you with the tools, techniques, and approaches which would help you achieve the same. This introductory chapter will cover several important concepts which would help you get a jumpstart on social media analytics. They are mentioned as follows:

Social media – significance and pitfallsSocial media analytics – opportunities and challengesGetting started with RData analyticsMachine learningText analytics

We will look at social media, the various forms of social media which exist today, and how it has impacted our society. This will help us understand the entire scope pertaining to social media analytics and the opportunity presented by it which would be valuable for consumers as well as businesses and brands. Concepts related to analytics, machine learning and text analytics coupled with hands on examples depicting the various features of the R programming language will help you get a grip on essential things which are necessary for the rest of this book. Without further delay, let's get started!

Understanding social media

The Internet and the information age have been responsible for revolutionizing the way we humans interact with each other in the 21st Century. Almost everyone uses some form of electronic communication, be it a laptop, tablet, smartphone or a personal computer. Social media is built upon the concept of platforms where people use computer-mediated communication (CMC) methods to communicate with others. This can range from instant messaging, emails, and chat rooms to social forums and social networking. To understand social media, you need to understand the origins of legacy or traditional media which gradually evolved into social media. Entities like the popular television, newspapers, radio, movies, books and magazines are various ways of sharing and consuming information, ideas and opinions. It's important to remember that social media has not replaced the older legacy based media; they co-exist peacefully together as we use and consume them both in our day-to-day lives.

Legacy media typically follow a one-way communication system. For instance, I can always read a magazine or watch a show on the television or get updated about the news from newspapers, but I cannot voice my opinions or share my ideas using the same media instantly. The communication mechanism in the various forms of social media is a two-way street, where audiences can share information and ideas and others can consume them and voice their own ideas, opinions and feedback on the same, and even share their own content based on what they see. Legacy based media, like radio or television, now use social media to provide a two-way communication mechanism to support their communications, but it's much more seamless in social media where anyone and everyone can share content, communicate with others, freely voice their ideas and opinions on a huge scale.

We can now formally define social media as interactive applications or platforms based on the principles of Web 2.0 and computer-mediated communication, which enable users to be publishers as well as consumers, to create and share ideas, opinions, information, emotions and expressions in various forms. While different and diverse forms of social media exist, they have several key features in common which are mentioned briefly as follows:

Web 2.0 Internet based applications or platformsContent is created as well as consumed by usersProfiles give users have their own distinct and unique identitySocial networks help connect different users, similarly to communities

Indeed social media give users their own unique identity and the freedom to express themselves in their own user profiles. These profiles are maintained as accounts by social media companies. Features like what you see is what you get (WYSIWYG) editors, emoticons, photos and videos help users in creating and sharing rich content. Social networking capabilities enables users to add other users to their own friend or contact lists and create groups and forums where they can share and talk about like-minded interests. The following figure shows us some of the popular social media used today across the globe:

I am sure you recognize several of these popular social media from their logos, which you must have seen on your own smartphone or on the web. Social media is used in various ways and media can be grouped into distinct buckets by the nature of its usage and its features. We mention several popular social media in the following points, some of which we will be analyzing in the future chapters:

Micro-blogging platforms, like Twitter and TumblrBlogging platforms, like WordPress, Blogger and MediumInstant messaging application, like WhatsApp and HangoutsNetworking platforms, like Facebook and LinkedInSoftware collaboration platforms, like GitHub and StackOverflowAudio collaboration platforms, like SoundCloudPhoto sharing platforms, like Instagram and FlickrVideo sharing platforms, like YouTube and Vimeo

This list is not an exhaustive list of social media because there are so many applications and platforms out there. We apologize in advance if we missed out mentioning your favorite social media! The list should clarify the different forms of communication and content sharing mechanisms that are available for users, and that they can leverage any of these social media to share content and connect with other users. We will now discuss some of the key advantages and significance which social media has to offer.

Advantages and significance

Social media has gained immense popularity and importance so that today almost everyone can't stay away from it. Not only is social media a medium for people to express their views, but also a very powerful tool which can be used by businesses to target new and existing customers and increase revenue. We will discuss some of the main advantages of social media as follows:

Cost savings: One of the main challenges for businesses is to reach out to their customers or clients through advertising on traditional and legacy based media, which can be expensive. However, social media allows businesses to have branded pages and to post sponsored content and advertisements for a fraction of the cost, thus helping them save costs in the process of increasing visibility.Networking: With social media, you can build your social as well as professional network with people across the globe. This has opened up a myriad of possibilities where people from different continents and countries work together on cutting-edge innovations, share news, talk about their personal experiences, offer advice and share interesting opportunities, which can help develop personalities, careers, and skills.Ease of use: It is quite easy to get started with social media. All you need is to create an account by registering your details in the application or website and within minutes you are ready to go! Besides this, it is quite easy to navigate through any social media website or application without any sophisticated technical skills. You just need an Internet connection and an electronic device, like a smartphone or a computer. Perhaps this could be the reason that a lot of parents and grandparents are now taking to social media to share their moments and connect with their long lost friends.Global audience: With social media, you can also make your content reach out to a global audience across the world. The reason is quite simple: because social media applications are available openly on the web, users all across the world use it. Businesses that engage with customers in different parts of the world have a key advantage to push their promotions and new products and services.Prompt feedback: Businesses and organizations can get prompt feedback on their new product launches and services being used directly from the users. There is much less calling up people asking them about their satisfaction levels. Tweets, posts, videos, comments and many more features exist to give instant feedback to organizations by posting generally, or conversing with them directly on their official social media channels.Grievance redressal: One of the great advantages of social media is that users can now express any sort of grievances or inconveniences, like electricity, water supply or security issues. Most governments and organizations, including law enforcement have public social media channels which can be used for instant notification of grievances.Entertainment: This is perhaps the most popular advantage used to the maximum by most users. Social media provides an unlimited source of entertainment where you can spend your time playing interactive games, watching videos, and participating in competitions with users across the world. Indeed the possibilities of entertainment from social media are endless.Visibility: Anyone can leverage their social media profile to gain visibility in the world. Professional networking platforms like LinkedIn are an excellent way for people to get noticed by recruiters and also for companies to recruit great talent. Even small startups or individuals can develop inventions, build products, or announce discoveries and leverage social media to go viral and gain the necessary visibility which can propel them to the next level.

The significance and importance of social media is quite evident from the preceding points. In today's interconnected world, social media has almost become indispensable and although it might have a lot of disadvantages, including distractions, if we use it for the right reasons, it can indeed be a very important tool or medium to help us achieve great things.

Disadvantages and pitfalls

Even though we have been blowing the trumpet about social media and its significance, I'm sure you are already thinking about pitfalls and disadvantages, which are directly or indirectly caused from social media. We want to cover all aspects of social media including the good and the bad, so let's look at some negative aspects:

Privacy concerns: Perhaps one of the biggest concerns with regards to using social media is the lack of privacy. All our personal data, even though often guaranteed to be secure by the social media organizations that host it, have a risk of being illegally accessed. Further, many social media platforms have been accused, time and again, of selling or using users' personal data without their consent.Security issues: Often users enter personal information on their social media profiles which can be used by hackers and other harmful entities to gain insights into their personal lives and use it for their own personal gain. You will have heard, several times in the past, that social media websites have been hacked and personal information from user accounts has been leaked. There are other issues also like users' bank accounts being compromised, and even theft and other harmful actions happening as a result of sensitive information obtained from social media.Addiction: This is relevant to a large percentage of people using social media. Social media addiction is indeed real and a serious concern, especially among the millennials. There are so many forms of social media and you can really get engrossed in playing games, trying to keep up with what everyone is doing, or just sharing moments from your life every other minute. A lot of us tend to check social media websites every now and then, which can be a distraction, especially if you are trying to meet deadlines. There are even a few stories of people accessing social media whilst driving, with fatal results.Negativity: Social media allows you to express yourself freely and this is often misused by people, terrorist, and other extremist groups to spread hateful propaganda and negativity. People often post sarcastic and negative reactions based on their opinions and feelings, which can lead to trolling and racism. Even though there are ways to report such behavior, it is often not enough because it is impossible to monitor a vast social network all the time.Risks: There are several potential risks of leveraging social media for your personal use or business promotions and campaigns. One wrong post can potentially prove to be very costly. Besides this, there is the constant risk of hackers, fraud, security attacks and unwanted spam. Continuous usage of social media and addiction to it also poses a potential health risk. Organizations should have proper social media use policies to ensure that their employees do not end up being unproductive by wasting too much time on social media, and do not leak trade secrets or confidential information on social media.

We have discussed several pitfalls attached to using social media and some of them are very serious concerns. Proper social media usage guidelines and policies should be borne in mind by everyone because social media is like a magnifying glass: anything you post can be used against you or can potentially prove harmful later. Be it extremely sensitive personal information, or confidential information, like design plans for your next product launch, always think carefully before sharing anything with the rest of the world.

However, if you know what you are doing, social media can definitely be used as a proper tool for your personal as well as professional gain.

Social media analytics

We now have a detailed overview of social media, its significance, pitfalls, and various facets. We will now discuss social media analytics and the benefits it offers for data analysts, scientists and businesses in general looking to gather useful insights from social media. Social media analytics, also known as social media mining or social media intelligence, can be defined as the process of gathering data (usually unstructured) from social media platforms and analyzing the data using diverse analytical techniques to extract vital insights, which can be used to make data-driven business decisions. There are lots of opportunities and challenges involved in social media analytics, which we will be discussing in further detail in later sections. An important thing to remember is that the processes involved in social media analytics are usually domain-agnostic and you can apply them on data belonging to any organization or business in any domain.

The most important step in going forward with any social media analytics based workflow or process is to determine the business goals or objectives and the insights that we want to gather from our analyzes. These goals are usually in the form of key performance indicators (KPIs). For instance, the total number of followers, number of likes and shares can be KPIs to measure brand engagement with customers using social media. Sometimes data is not structured and the end objectives are not very concrete. Techniques like natural language processing and text analytics can be leveraged in such cases to extract insights from noisy unstructured text data like understanding the sentiment or mood of customers for a particular service or product and trying to understand the key trends and themes based on customer tweets or posts at any point in time.

A typical social media analytics workflow

We will be analyzing data from diverse social media applications and platforms throughout the course of this book. However, it is essential to have a good grasp of the essential concepts behind any typical analytics process or workflow. While we will be expanding more on data analytics and mining processes later, let us look at a typical social media analytics workflow in the following figure:

From the preceding diagram, we can broadly classify the main steps involved in the analytics workflow as follows:

Data accessData processing and normalizationData analysisInsights

We will now briefly expand upon each of these four processes since we will be using them extensively in future chapters.

Data access

For access to social media data, you can usually do it using standard data retrieval methods in two ways.

The first technique is to use official APIs provided by the social media platform or organization itself.

The second technique is to use unofficial mechanisms, like web crawling and scraping. An important point to remember is that crawling and scraping social media websites and using that data for commercial purposes, like selling the data to other organizations, is usually against their terms of service. We will therefore not be using such methods in our book. Besides this, we will be following the necessary politeness policies while accessing social media data using their APIs, so that we do not overload them with too many requests. The data we'll obtain is the raw data which can be further processed and normalized as needed.

Data processing and normalization

The raw data obtained from data retrieval using social media APIs may not be structured and clean. In fact most of the data obtained from social media is noisy, unstructured and often contains unnecessary tokens such as Hyper Text Markup Language (HTML) tags and other metadata. Usually, data streams from social media APIs have JavaScript Object Notation (JSON) response objects, which consist of key value pairs just like the example shown in the following snippet:

{ "user": { "profile_sidebar_fill_color": "DDFFCC", "profile_sidebar_border_color": "BDDCAD", "profile_background_tile": true, "name": "J'onn J'onzz", "profile_image_url": "http://test.com/img/prof.jpg", "created_at": "Tue Apr 07 19:05:07 +0000 2009", "location": "Ox City, UK", "follow_request_sent": null, "profile_link_color": "0084B4", "is_translator": false, "id_str": "2921138" }, "followers_count": 2452, "statuses_count": 7311, "friends_count": 427 }

The preceding JSON object consists of a typical response from the Twitter API showing details of a user profile. Some APIs might return data in other formats, such as Extensible Markup Language (XML) or Comma Separated Values (CSV), and each format needs to be handled properly.

Often social media data contains unstructured textual data which needs additional text pre-processing and normalization before it can be fed into any standard data mining or machine learning algorithm. Text normalization is usually done using several techniques to clean and standardize the text. Some of them are:

Text tokenizationRemoving special characters and symbolsSpelling correctionsContraction expansionsStemmingLemmatization

More advanced processing can insert additional metadata to describe the text better, such as adding parts of speech (POS) tags, phrase tags, named entity tags, and so on.

Data analysis

This is the core of the whole workflow, where we apply various techniques to analyze the data: this could be the raw native data itself, or the processed and curated data. Usually the techniques used in analysis can be broadly classified into three areas:

Data mining or analyticsMachine learningNatural language processing and text analytics

Data mining and machine learning have several overlapping concepts, including the fact that both use statistical techniques and try to find patterns from underlying data. Data mining is more about finding key patterns or insights from data; and machine learning is more about using mathematics, statistics, and even some of these data mining algorithms, to build models to predict or forecast outcomes. While both of these techniques need structured and numeric data to work with, more complex analyzes with unstructured textual data is usually handled in the separate realm of text analytics by leveraging natural language processing which enables us to use several tools, techniques and algorithms to analyze free-flowing unstructured text. We will be using techniques, from these three areas to analyze data from various social media platforms throughout this book. We will cover important concepts from data analytics and text analytics briefly towards the end of this chapter.

Insights

The end results from our workflow are the actual insights which act as facts or concrete data points to achieve the objective of the analysis. This can be anything from a business intelligence report to visualizations such as bar graphs, histograms, or even word or phrase clouds. Insights should be crisp, clear, and actionable so that it can be easy for businesses to take valuable decisions in time by leveraging them.

Opportunities

Based on the advantages of social media, we can derive plentiful opportunities which lie within the scope of social media analytics. You can save a lot of cost involved in targeted advertising and promotions by analyzing your social media traffic patterns. You can see how users engage with your brand or business using social media, for instance, when it is the perfect time to share something interesting, such as a new service, product, or even an interesting anecdote about your company. Based on traffic from different geographies, you can analyze and understand the preferences of users from different parts of the world. Users love it if you publish promotions in their local language, and businesses are already leveraging such capabilities from social media platforms such as Facebook to target users in specific countries based on localized content.

The social media analytics landscape is still young and emerging and has a lot of untapped potential.

Let us understand the potential of social media analytics better by taking a real-world example.

Consider you are running a profitable business with active engagement on various social media channels. How can you use the data generated from social media to know how you are doing and how your competitors are doing? Live data streams from Twitter could be continuously analyzed to get real-time mood, sentiment, emotion, and reactions of people to your products and services. You could even analyze the same for your rival competitors to see when they are launching their commodities and how users are reacting to them. With Facebook, you can do the same and even push localized promotions and advertisements to see if they help in generating better revenue. News portals would give you live feeds of trending news articles and insights into the current state of the economy and current events and help you decide if these are favorable times for a thriving business or should you be preparing for some hard times. Sentiment analysis, concept mining, topic models, clustering, and inference are just a few examples of using analytics on social media. The opportunities are huge—you just need to have a clear objective in mind so that you can use analytics effectively to solve that objective.

Challenges

Before we delve into the challenges associated with social media analytics let us look at the following interesting facts:

There are over 300 million active Twitter usersFacebook has over 1.8 billion active usersFacebook generates 600-700+ terabytes of data daily (and it could be more now)Twitter generates 8-10+ terabytes of data dailyFacebook generates over 4 to 5 million posts per minuteInstagram generates over 2 million likes per minute

These statistics give you a rough idea about the massive scale of data being generated and consumed in these social media platforms. This leads to some challenges:

Big data: Due to the massive amount of data produced by social media platforms, it is sometimes difficult to analyze the complete dataset using traditional analytical methods since the complete data would never fit in memory. Other approaches and tools, such as Hadoop and Spark, need to be leveraged.Accessibility issues: Social media platforms generate a lot of data but getting access to them directly is not always easy. There are rate limits for their official APIs and it's rare to be able to access and store complete datasets. Besides this, each platform has its own terms and conditions, which should be adhered to when accessing their data.Unstructured and noisy data: Most of the data from social media APIs are unstructured, noisy, and have a lot of junk in them. Dealing with data cleaning and processing becomes really cumbersome and often analysts and data scientists end up spending 70% of their time and effort in trying to clean and curate the data for analysis.

These are perhaps the most prevalent challenges when analyzing social media data, amongst many other challenges, that you might face in your social media analytics journey. Let's now get acquainted with the R programming language, which will be useful to us when we are performing our analyzes.

Data types

There are several basic data types in R for handling different types of data and values:

numeric: The numeric data type is used to store real or decimal vectors and is identical to the double data type.double: This data type can store and represent double precision vectors.integer: This data type is used for representing 32-bit integer vectors.character: This data type is used to represent character vectors, where each element can be a string of type characterlogical: The reserved words TRUE and FALSE are logical constants in the R language and T and F are global variables. All these four are logical type vectors.complex: This data type is used to store and represent complex numbersfactor: This type is used to represent nominal or categorical variables by storing the nominal values in a vector of integers ranging from (1…n) such that n is the number of distinct values for the variable. A vector of character strings for the actual variable values is then mapped to this vector of integersMiscellaneous: There are several other types including NA to denote missing values in data, NaN which denotes not a number, and ordered is used for factoring ordinal variables

Common functions for each data type include as and is, which are used for converting data types (typecasting) and checking the data type respectively.

For example, as.numeric(…) would typecast the data or vector indicated by the ellipses into numeric type and is.numeric(…) would check if the data is of numeric type.

Let us look at a few more examples for the various data types in the following code snippet to understand them better:

# typecasting and checking data types > n <- c(3.5, 0.0, 1.7, 0.0) > typeof(n) [1] "double" > is.numeric(n) [1] TRUE > is.double(n) [1] TRUE > is.integer(n) [1] FALSE > as.integer(n) [1] 3 0 1 0 > as.logical(n) [1] TRUE FALSE TRUE FALSE # complex numbers > comp <- 3 + 4i > typeof(comp) [1] "complex" # factoring nominal variables > size <- c(rep('large', 5), rep('small', 5), rep('medium', 3)) > size [1] "large" "large" "large" "large" "large" "small" "small" "small" "small" "small" [11] "medium" "medium" "medium" > size <- factor(size) > size [1] large large large large large small small small small small medium medium medium Levels: large medium small > summary(size) large medium small 5 3 5

The preceding examples should make the concepts clearer. Notice that non-zero numeric values are logically TRUE always, and zero values are FALSE, as we can see from typecasting our numeric vector to logical. We will now dive into the various data structures in R.

Data structures