43,19 €
Leverage Splunk's operational intelligence capabilities to unlock new hidden business insights and drive success
Splunk makes it easy for you to take control of your data, and with Splunk Operational Cookbook, you can be confident that you are taking advantage of the Big Data revolution and driving your business with the cutting edge of operational intelligence and business analytics.
With more than 80 recipes that demonstrate all of Splunk’s features, not only will you find quick solutions to common problems, but you’ll also learn a wide range of strategies and uncover new ideas that will make you rethink what operational intelligence means to you and your organization.
You’ll discover recipes on data processing, searching and reporting, dashboards, and visualizations to make data shareable, communicable, and most importantly meaningful. You’ll also find step-by-step demonstrations that walk you through building an operational intelligence application containing vital features essential to understanding data and to help you successfully integrate a data-driven way of thinking in your organization.
Throughout the book, you’ll dive deeper into Splunk, explore data models and pivots to extend your intelligence capabilities, and perform advanced searching with machine learning to explore your data in even more sophisticated ways. Splunk is changing the business landscape, so make sure you’re taking advantage of it.
This book is intended for data professionals who are looking to leverage the Splunk Enterprise platform as a valuable operational intelligence tool. The recipes provided in this book will appeal to individuals from all facets of business, IT, security, product, marketing, and many more! Even the existing users of Splunk who want to upgrade and get up and running with Splunk 7.x will find this book to be of great value.
Josh Diakun Josh Diakun is an IT operations and security specialist with a focus on creating data-driven operational processes. He has over 10 years of experience managing and architecting enterprise-grade IT environments. For the past 7 years, he has been architecting, deploying and developing on Splunk as the core platform for organizations to gain security and operational intelligence. Josh is a founding partner at Discovered Intelligence, a company specializing in data intelligence services and solutions. He is also a co-founder of the Splunk Toronto User Group. Paul R Johnson Paul R Johnson has over 10 years of data intelligence experience in the areas of information security, operations, and compliance. He is a partner at Discovered Intelligence, a company specializing in data intelligence services and solutions. Paul previously worked for a Fortune 10 company, leading IT risk intelligence initiatives and managing a global Splunk deployment. Paul co-founded the Splunk Toronto User Group and lives and works in Toronto, Canada. Derek Mock Derek Mock is a software developer and big data architect who specializes in IT operations, information security, and cloud technologies. He has 15 years' experience developing and operating large enterprise-grade deployments and SaaS applications. He is a founding partner at Discovered Intelligence, a company specializing in data intelligence services and solutions. For the past 6 years, he has been leveraging Splunk as the core tool to deliver key operational intelligence. Derek is based in Toronto, Canada, and is a co-founder of the Splunk Toronto User Group.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 484
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Veena PagareAcquisition Editor: Vinay ArgekarContent Development Editor: Aaryaman SinghTechnical Editor: Danish ShaikhCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Tania DuttaProduction Coordinator: Shraddha Falebhai
First published: October 2014 Second edition: June 2016 Third edition: May 2018
Production reference: 1220518
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78883-523-7
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Josh Diakun is an IT operations and information security specialist with over 15 years of experience managing and architecting enterprise-grade IT environments and security programs. He has spent the last 10 years specializing in the Splunk platform being recognize locally and globally for his expertise. Josh is a founding partner of Discovered Intelligence, a multi-award winning Splunk services company. Through Discovered Intelligence, Josh works with the most recognizable businesses worldwide helping them to achieve their IT operations, security and compliance goals.
Paul has over 15 year’s data intelligence experience in the areas of information security, operations and compliance. He is passionate about helping businesses gain intelligence and insight from their large data at scale. Paul has award winning Splunk expertise and is a founding partner of Discovered Intelligence; a company known for its quality of Splunk service delivery. He previously worked for a Fortune 10 company, leading global IT risk intelligence initiatives.
Derek Mock is a software developer and architect, with expertise in IT operations and cloud technologies. He has over 20 years experience developing, integrating, and operating large enterprise grade deployments and SaaS applications. Derek is a founding partner of Discovered Intelligence and had previously leveraged Splunk in a managed services company as a core tool for delivering key operational intelligence. Derek is a co-founder of the Splunk Toronto User Group and lives and works in Toronto, Canada.
Yogesh Raheja is a certified DevOps and cloud expert with a decade of IT experience. He has expertise in technologies such as OS, source code management, build & release tools, continuous integration/deployment/delivery tools, containers, config management tools, monitoring, logging tools, and public and private clouds. He loves to share his technical expertise with audience worldwide at various forums, conferences, webinars, blogs, and LinkedIn (https://in.linkedin.com/in/yogesh-raheja-b7503714). He has also reviewed Implementing Splunk 7 Third Edition written by James D. Miller. He has Published his online courses on Udemy and has written Automation with Puppet 5 and Automation with Ansible.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Splunk Operational Intelligence Cookbook Third Edition
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Sections
Getting ready
How to do it...
How it works...
There's more...
See also
Get in touch
Reviews
Play Time – Getting Data In
Introduction
Indexing files and directories
Getting ready
How to do it...
How it works...
There's more...
Adding a file or directory data input using the CLI
Adding a file or directory input using inputs.conf
One-time indexing of data files using the Splunk CLI
Indexing the Windows event logs
See also
Getting data through network ports
Getting ready
How to do it...
How it works...
There's more...
Adding a network input using the CLI
Adding a network input using inputs.conf
See also
Using scripted inputs
Getting ready
How to do it...
How it works...
See also
Using modular inputs
Getting ready
How to do it...
How it works...
There's more...
See also
Using the Universal Forwarder to gather data
Getting ready
How to do it...
How it works...
There's more...
Adding the receiving indexer via outputs.conf
Receiving data using the HTTP Event Collector
Getting ready
How to do it...
How it works...
Getting data from databases using DB Connect
Getting ready
How to do it...
How it works...
Loading the sample data for this book
Getting ready
How to do it...
How it works...
See also
Data onboarding – defining field extractions
Getting ready
How to do it...
How it works...
See also
Data onboarding - defining event types and tags
Getting ready
How to do it...
How it works...
There's more...
Adding event types and tags using eventtypes.conf and tags.conf
See also
Installing the Machine Learning Toolkit
Getting ready
How to do it...
How it works...
Diving into Data – Search and Report
Introduction
The Search Processing Language 
Searching in Splunk
Boolean operators
Common commands
Time modifiers
Working with fields
Saving searches in Splunk
Making raw event data readable
Getting ready
How to do it...
How it works...
There's more...
Tabulating every field
Removing fields, then tabulating everything else
Finding the most accessed web pages
Getting ready
How to do it...
How it works...
There's more...
Searching for the top 10 accessed web pages
Searching for the most accessed pages by user
See also
Finding the most used web browsers
Getting ready
How to do it...
How it works...
There's more...
Searching for the web browser data for the most used OS types
See also
Identifying the top-referring websites
Getting ready
How to do it...
How it works...
There's more...
Searching for the top 10 using stats instead of top
See also
Charting web page response codes
Getting ready
How to do it...
How it works...
There's more...
Totaling success and error web page response codes
See also
Displaying web page response time statistics
Getting ready
How to do it...
How it works...
There's more...
Displaying web page response time by action
See also
Listing the top-viewed products
Getting ready
How to do it...
How it works...
There's more...
Searching for the percentage of cart additions from product views
See also
Charting the application's functional performance
Getting ready
How to do it...
How it works...
There's more...
See also
Charting the application's memory usage
Getting ready
How to do it...
How it works...
See also
Counting the total number of database connections
Getting ready
How to do it...
How it works...
See also
Dashboards and Visualizations - Make Data Shine
Introduction
About Splunk dashboards
Using dashboards for Operational Intelligence
Enriching data with visualizations
Available visualizations
Trellis layout
Best practices for visualizations
Creating an Operational Intelligence dashboard
Getting ready
How to do it...
How it works...
There's more...
Changing dashboard permissions
Using a pie chart to show the most accessed web pages
Getting ready
How to do it...
How it works...
There's more...
Searching for the top ten accessed web pages
See also
Displaying the unique number of visitors
Getting ready
How to do it...
How it works...
There's more...
Adding labels to a single value panel
Coloring the value based on ranges
Adding trends and sparklines to the values
See also
Using a gauge to display the number of errors
Getting ready
How to do it...
How it works...
There's more...
See also
Charting the number of method requests by type and host
Getting ready
How to do it...
How it works...
See also
Creating a timechart of method requests, views, and response times
Getting ready
How to do it...
How it works...
There's more...
Method requests, views, and response times by host
See also
Using a scatter chart to identify discrete requests by size and response time
Getting ready
How to do it...
How it works...
There's more...
Using time series data points with a scatter chart
See also
Creating an area chart of the application's functional statistics
Getting ready
How to do it...
How it works...
See also
Using metrics data and a trellis layout to monitor physical environment operating conditions
Getting ready
How to do it...
How it works...
See also
Using a bar chart to show the average amount spent by category
Getting ready
How to do it...
How it works...
See also
Creating a line chart of item views and purchases over time
Getting ready
How to do it...
How it works...
See also
Building an Operational Intelligence Application
Introduction
Creating an Operational Intelligence application
Getting ready
How to do it...
How it works...
There's more...
Creating an application from another application
Downloading and installing a Splunk app
See also
Adding dashboards and reports
Getting ready
How to do it...
How it works...
There's more...
Changing permissions of saved reports
See also
Organizing the dashboards more efficiently
Getting ready
How to do it...
How it works...
There's more...
Modifying the Simple XML directly
See also
Dynamically drilling down on activity reports
Getting ready
How to do it...
How it works...
There's more...
Disabling the drilldown feature in tables and charts
See also
Creating a form for searching web activity
Getting ready
How to do it...
How it works...
There's more...
Adding a Submit button to your form
See also
Linking web page activity reports to the form
Getting ready
How to do it...
How it works...
There's more...
Adding an overlay to the Sessions Over Time chart
See also
Displaying a geographical map of visitors
Getting ready
How to do it...
How it works...
There's more...
Adding a map panel using Simple XML
Mapping different distributions by area
See also
Highlighting average product price
Getting ready
How to do it...
How it works...
See also
Scheduling the PDF delivery of a dashboard
Getting ready
How to do it...
How it works...
See also
Extending Intelligence – Datasets, Modeling and Pivoting
Introduction
Creating a data model for web access logs
Getting ready
How to do it...
How it works...
There's more...
Viewing datasets using the dataset listing page
Searching datasets using the search interface
See also
Creating a data model for application logs
Getting ready
How to do it...
How it works...
See also
Accelerating data models
Getting ready
How to do it...
How it works...
There's more...
Viewing data model and acceleration summary information
Advanced configuration of data model acceleration
See also
Pivoting total sales transactions
Getting ready
How to do it...
How it works...
There's more...
Searching datasets using the pivot command
Searching accelerated datasets using the tstats command
See also
Pivoting purchases by geographic location
Getting ready
How to do it...
How it works...
See also
Pivoting slowest responding web pages
Getting ready
How to do it...
How it works...
See also
Pivot charting top error codes
Getting ready
How to do it...
How it works...
See also
Diving Deeper – Advanced Searching, Machine Learning and Predictive Analytics
Introduction
Identifying and grouping transactions
Converging data sources
Identifying relationships between fields
Predicting future values
Discovering anomalous values
Leveraging machine learning
Calculating the average session time on a website
Getting ready
How to do it...
How it works...
There's more...
Starts with a website visit, ends with a checkout
Defining maximum pause, span, and events in a transaction
See also
Calculating the average execution time for multi-tier web requests
Getting ready
How to do it...
How it works...
There's more...
Calculating the average execution time without using a join
See also
Displaying the maximum concurrent checkouts
Getting ready
How to do it...
How it works...
See also
Analyzing the relationship of web requests
Getting ready
How to do it...
How it works...
There's more...
Analyzing relationships of DB actions to memory utilization
See also
Predicting website traffic volumes
Getting ready
How to do it...
How it works...
There's more...
Create and apply a machine learning model of traffic over time
Predicting the total number of items purchased
Predicting the average response time of function calls
See also
Finding abnormally-sized web requests
Getting ready
How to do it...
How it works...
There's more...
The anomalies command
The anomalousvalue command
The anomalydetection command
The cluster command
See also
Identifying potential session spoofing
Getting ready
How to do it...
How it works...
There's more...
Creating logic for urgency
See also
Detecting outliers in server response times
Getting ready
How to do it...
How it works...
Forecasting weekly sales
Getting ready
How to do it...
How it works...
Summary
Enriching Data – Lookups and Workflows
Introduction
Lookups
Workflows
DB Connect
Looking up product code descriptions
Getting ready
How to do it...
How it works...
There's more...
Manually adding the lookup to Splunk
See also
Flagging suspect IP addresses
Getting ready
How to do it...
How it works...
There's more...
Modifying an existing saved search to populate a lookup table
See also
Creating a session state table
Getting ready
How to do it...
How it works...
There's more...
Use the Splunk KV store to maintain the session state table
See also
Adding hostnames to IP addresses
Getting ready
How to do it...
How it works...
There's more...
Enabling automatic external field lookups
See also
Searching ARIN for a given IP address
Getting ready
How to do it...
How it works...
There's more...
Limiting workflow actions by event types
See also
Triggering a Google search for a given error
Getting ready
How to do it...
How it works...
There's more...
Triggering a Google search from the chart drilldown options
See also
Generating a chat notification for application errors
Getting ready
How to do it...
How it works...
There's more...
Adding a workflow action manually in Splunk
See also
Looking up inventory from an external database
Getting ready
How to do it...
How it works...
There's more...
Using DB Connect for direct external DB lookups
See also
Being Proactive – Creating Alerts
Introduction
About Splunk alerts
Types of alert
Alert Trigger Conditions
Alert Trigger Actions
Alerting on abnormal web page response times
Getting ready
How to do it...
How it works...
There's more...
Viewing alerts in Splunk's Triggered Alert view
See also
Alerting on errors during checkout in real time
Getting ready
How to do it...
How it works...
There's more...
Building alerts via a configuration file
Editing alert configuration attributes using Advanced edit
Identify the real-time searches that are running
See also
Alerting on abnormal user behavior
Getting ready
How to do it...
How it works...
There's more...
Alerting on abnormal user purchases without checkouts
See also
Alerting on failure and triggering a chat notification
Getting ready
How to do it...
How it works...
There's more...
See also
Alerting when predicted sales exceed inventory
Getting ready
How to do it...
How it works...
See also
Generating alert events for high sensor readings
Getting ready
How to do it...
How it works...
There's more...
Speeding Up Intelligence – Data Summarization
Introduction
Data summarization
Data summarization methods
About summary indexing
How summary indexing helps
About report acceleration
The simplicity of report acceleration
Calculating an hourly count of sessions versus completed transactions
Getting ready
How to do it...
How it works...
There's more...
Generating the summary more frequently
Avoiding summary index overlaps and gaps
See also
Backfilling the number of purchases by city
Getting ready
How to do it...
How it works...
There's more...
Backfilling a summary index from within a search directly
See also
Displaying the maximum number of concurrent sessions over time
Getting ready
How to do it...
How it works...
There's more...
Viewing the status of an accelerated report and how 
See also
Above and Beyond – Customization, Web Framework, HTTP Event Collector, REST API, and SDKs
Introduction
Web framework
REST API
Software development kits (SDKs)
HTTP Event Collector (HEC)
Customizing the application navigation
Getting ready
How to do it...
How it works...
There's more...
Adding a Sankey diagram of web hits
Getting ready
How to do it...
How it works...
There's more...
Changing the Sankey diagram options
See also
Developing a tag cloud of purchases by country
Getting ready
How to do it...
How it works...
There's More...
See also
Adding Cell Icons to Highlight Average Product Price
Getting ready
How to do it...
How it works...
See also
Remotely querying Splunk's REST API for unique page views
Getting ready
How to do it...
How it works...
There's more...
Authenticating with a session token
See also
Creating a Python application to return unique IP addresses
Getting ready
How to do it...
How it works...
There's more...
Paginating the results of your search
See also
Creating a custom search command to format product names
Getting ready
How to do it...
How it works...
See also
Collecting data from remote scanning devices
Getting ready
How to do it...
How it works...
See also
Other Books You May Enjoy
Leave a review - let other readers know what you think
Splunk makes it easy for you to take control of your data, and with Splunk Operational Intelligence Cookbook, you can be confident that you are taking advantage of the Big Data revolution and driving your business with the cutting edge of operational intelligence and business analytics.
With more than 80 recipes that demonstrate all of Splunk's features, not only will you find quick solutions to common problems, but you'll also learn a wide range of strategies and uncover new ideas that will make you rethink what operational intelligence means to you and your organization.
This book is intended for users of all levels who are looking to leverage the Splunk Enterprise platform as a valuable operational intelligence tool. The recipes provided in this book will appeal to individuals from all facets of business, IT, security, product, marketing, and many more!
Also, existing users of Splunk who want to upgrade and get up and running with the latest release of Splunk will find this book invaluable.
Chapter 1, Play Time – Getting Data In, introduces you to the many ways in which you can get data into Splunk, whether it is collecting data locally from files and directories, receiving it through TCP/UDP port inputs, directly from a Universal Forwarder, or simply utilizing Scripted and Modular Inputs. Regardless of how Operational Intelligence is approached, the right data at the right time is pivotal to success; this chapter will play a key role in highlighting what data to consider and how to efficiently and effectively get that data into Splunk. It will also introduce the data sets that will be used throughout this book and where to obtain samples that can be used to follow each of the recipes as they are written.
Chapter 2, Diving into Data – Search and Report, introduces you to the first set of recipes in the book. Leveraging the data now available as a result of the previous chapter, the information and recipes will guide you through searching event data using Splunk's SPL (Search Processing Language); applying field extractions; grouping common events based on field values; and then building basic reports using the table, top, chart, and stats commands.
Chapter 3, Dashboards and Visualizations – Make Data Shine, guides you through building visualizations based on reports that can now be created as a result of this chapter. The information and recipes provided in this chapter will empower you to take their data, and reports, and bring it to life through the powerful visualizations provided by Splunk. Visualizations introduced will include single values, charts (bar, pie, line, and area), scatter charts, and gauges.
Chapter 4, Building an Operational Intelligence Application, builds on the understanding of visualizations that you gained as a result of the previous chapter to now introduce the concept of dashboards. Dashboards provide a powerful way to bring visualizations together and provide the holistic visibility required to fully capture the operational intelligence that is most important. The information and recipes provided in this chapter will outline the purpose of dashboards, how to properly utilize dashboards, using the dashboard editor to build a dashboard, building a form for searching event data and much more.
Chapter 5, Extending Intelligence – Datasets, Modeling and Pivoting, covers powerful features found in Splunk Enterprise, the ability to create datasets and the pivot tool. The information and recipes provided in this chapter will introduce you to the concept of Splunk Datasets. You will build data models, use the pivot tool and write accelerated searches to quickly create intelligence driven reports and visualizations.
Chapter 6, Diving Deeper – Advanced Searching, Machine Learning and Predictive Analytics, helps you harness the ability to converge data from different sources and understand or build relationships between the events. By now you will have an understanding of how to derive operational intelligence from data by using some of Splunk's most common features. The information and recipes provided in this chapter will take you deeper into the data by introducing the Machine Learning Toolkit, transactions, subsearching, concurrency, associations, and more advanced search commands.
Chapter 7, Enriching Data – Lookups and Workflows, enables you to apply this functionality to further enhance their understanding of the data being analyzed. As illustrated in the preceding chapters, event data, whether from a single tier or multi-tier web application stack, can provide a wealth of operational intelligence and awareness. That intelligence can be further enriched through the use of lookups and workflow actions.The information and recipes provided in this chapter will introduce the concept of lookups and workflow actions for the purpose of augmenting the data being analyzed.
Chapter 8, Being Proactive – Creating Alerts, guides you through creating alerts based on the knowledge gained from previous chapters. A key asset to complete operational intelligence and awareness is the ability to be proactive through scheduled or real-time alerts. The information and recipes provided in this chapter will introduce you to this concept, the benefits of proactive alerts and provide context of when alerts are best applied.
Chapter 9, Speed Up Intelligence – Data Summarization, provides you with a short introduction to common situations where summary indexing can be leveraged to speed up reports or preserve focused statistics over long periods of time. With big data being just that, big, it can sometimes be very time consuming searching massive sets of data or costly to store the data for long periods of time. The information and recipes provided in this chapter will introduce you to the concept of summary indexing for the purposes of accelerating reports and speeding up the time it takes to unlock business insight.
Chapter 10, Above and Beyond – Customization, Web Framework, HTTP Event Collector, REST API, and SDKs, introduces you to four very powerful features of Splunk. These features provide the ability to create a very rich and powerful interactive experience with Splunk. This will open you up to the possibilities beyond core Splunk Enterprise and show you a method to create your own operational intelligence application including powerful visualizations. It will also provide a recipe for querying Splunk's REST API and a basic Python application leveraging Splunk's SDK to execute a search.
You'll need the Splunk Enterprise 7.1 (or greater) software.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Splunk-Operational-Intelligence-Cookbook-Third-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "However, in addition to using the GUI, you can also specify time ranges directly in your search string using the earliest and latest time modifiers. When a time modifier is used in this way, it will automatically override any time range that might be set in the GUI time range picker."
A block of code is set as follows:
index=main sourcetype=access_combined | eval browser=useragent | replace *Firefox* with Firefox, *Chrome* with Chrome, *MSIE* with "Internet Explorer", *Version*Safari* with Safari, *Opera* with Opera in browser | top limit=5 useother=t browser
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select the Search & Reporting application."
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:
This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
In this chapter, we will cover the basic ways to get data intoSplunk, in addition to some other recipes that will help prepare you for later chapters. You will learn about the following recipes:
Indexing files and directories
Getting data through network ports
Using scripted inputs
Using modular inputs
Using the Universal Forwarder to gather data
Receiving data using the HTTP Event Collector
Getting data from databases using DB Connect
Loading the sample data for this book
Data onboarding: Defining field extractions
Data onboarding: Defining event types and tags
Installing the Machine Learning Toolkit
The machine data that facilitates operational intelligence comes in many different forms and from many different sources. Splunk can collect and index data from several sources, including log files written by web servers or business applications, syslog data streaming in from network devices, or the output of custom developed scripts. Even data that looks complex at first can be easily collected, indexed, transformed, and presented back to you in real time.
This chapter will walk you through the basic recipes that will act as the building blocks to get the data you want into Splunk. The chapter will further serve as an introduction to the sample data sets that we will use to build our own operational intelligence Splunk app. The datasets will be coming from a hypothetical three-tier e-commerce web application and will contain web server logs, application logs, and database logs.
Splunk Enterprise can index any type of data; however, it works best with time-series data (data with timestamps). When Splunk Enterprise indexes data, it breaks it into events, based on timestamps and/or event size, and puts them into indexes. Indexes are data stores that Splunk has engineered to be very fast, searchable, and scalable across a distributed server environment.
All data indexed into Splunk is assigned a source type. The source type helps identify the data format type of the event and where it has come from. Splunk has several preconfigured source types, but you can also specify your own. The example source types include access_combined, cisco_syslog, and linux_secure. The source type is added to the data when the indexer indexes it into Splunk. It is a key field that is used when performing field extractions and when conducting many searches to filter the data being searched.
The Splunk community plays a big part in making it easy to get data into Splunk. The ability to extend Splunk has provided the opportunity for the development of inputs, commands, and applications that can be easily shared. If there is a particular system or application you are looking to index data from, there is most likely someone who has developed and published relevant configurations and tools that can be easily leveraged by your own Splunk Enterprise deployment.
Splunk Enterprise is designed to make the collection of data very easy, and it will not take long before you are being asked or you yourself try to get as much data into Splunk as possible—at least as much as your license will allow for!
File- and directory-based inputs are the most commonly used ways of getting data into Splunk. The primary need for these types of input will be to index logfiles. Almost every application or system produces a logfile, and it is generally full of data that you want to be able to search and report on.
Splunk can continuously monitor for new data being written to existing files or new files being added to a directory, and it is able to index this data in real time. Depending on the type of application that creates the logfiles, you would set up Splunk to either monitor an individual file based on its location, or scan an entire directory and monitor all the files that exist within it. The latter configuration is more commonly used when the logfiles being produced have unique filenames, such as filenames containing a timestamp.
This recipe will show you how to configure Splunk to continuously monitor and index the contents of a rolling logfile located on the Splunk server. The recipe specifically shows how to monitor and index a Red Hat Linux system's messages logfile (/var/log/messages). However, the same principle can be applied to a logfile on a Windows system, and a sample file is provided. Do not attempt to index the Windows event logs this way, as Splunk has specific Windows event inputs for this.
To step through this recipe, you will need a running Splunk Enterprise server and access to read the /var/log/messages file on Linux. No other prerequisites are required. If you are not using Linux and/or do not have access to the /var/log/messages location on your Splunk server, use the cp01_messages.log file that is provided and upload it to an accessible directory on your Splunk server.
Follow these steps to monitor and index the contents of a file:
Log in to your Splunk server.
From the menu in the top right-hand corner, click on the
Settings
menu and then click on the
Add Data
link:
If you are prompted to take a quick tour, click on
Skip
.
In the
How do you want to add data
section, click on
monitor
:
Click on the
Files & Directories
section:
In the
File or Directory
section, enter the path to the logfile (
/var/log/messages
or the location of the
cp01_messages.log
file), ensure
Continuously Monitor
is selected, and click on
Next
:
If you are using the provided file or the native
/var/log/messages
file, the data preview will show the correct line breaking of events and timestamp recognition. Click on the
Next
button.
A
Save Source Type
box will pop up. Enter
linux_messages
as the
Name
and then click on
Save
:
On the
Input Settings
page, leave all the default settings and click
Review
.
Review the settings and if everything is correct, click
Submit
.
If everything was successful, you should see a
File input has been created successfully
message:
Click on the
Start
searching button. The
Search & Reporting
app will open with the search already populated based on the settings supplied earlier in the recipe.
When you add a new file or directory data input, you are basically adding a new configuration stanza into an inputs.conf file behind the scenes. The Splunk server can contain one or more inputs.conf files, and these files are either located in $SPLUNK_HOME/etc/system/local or in the local directory of a Splunk app.
Splunk uses the monitor input type and is set to point to either a file or a directory. If you set the monitor to a directory, all the files within that directory will be monitored. When Splunk monitors files, it initially starts by indexing all the data that it can read from the beginning. Once complete, Splunk maintains a record of where it last read the data from, and if any new data comes into the file, it reads this data and advances the record. The process is nearly identical to using the tail command in Unix-based operating systems. If you are monitoring a directory, Splunk also provides many additional configuration options, such as blacklisting files you don't want Splunk to index.
While adding inputs to monitor files and directories can be done through the web interface of Splunk, as outlined in this recipe, there are other approaches to add multiple inputs quickly. These allow for customization of the many configuration options that Splunk provides.
Instead of using the GUI, you can add a file or directory input through the Splunk command-line interface (CLI). Navigate to your $SPLUNK_HOME/bin directory and execute the following command (replacing the file or directory to be monitored with your own):
For Unix, we will be using the following code to add a file or directory input:
./splunk add monitor /var/log/messages -sourcetype linux_messages
For Windows, we will be using the following code to add a file or directory input:
splunk add monitor c:/filelocation/cp01_messages.log -sourcetype linux_messages
There are a number of different parameters that can be passed along with the file location to monitor.
Although you can select Upload and Index a file from the Splunk GUI to upload and index a file, there are a couple of CLI functions that can be used to perform one-time bulk loads of data.
Use the oneshot command to tell Splunk where the file is located and which parameters to use, such as the source type:
./splunk add oneshot XXXXXXX
Another way is to place the file you wish to index into the Splunk spool directory, $SPLUNK_HOME/var/spool/splunk, and then add the file using the spool command, as shown in the following code:
./splunk spool XXXXXXX
The
Getting data through network ports
recipe
The
Using scripted inputs
recipe
The
Using modular inputs
recipe
Not every machine has the luxury of being able to write logfiles. Sending data over network ports and protocols is still very common. For instance, sending logs through syslog is still the primary method to capture network device data such as firewalls, routers, and switches.
Sending data to Splunk over network ports doesn't need to be limited to network devices. Applications and scripts can use socket communication to the network ports that Splunk is listening on. This can be a very useful tool in your back pocket, as there can be scenarios where you need to get data into Splunk but don't necessarily have the ability to write to a file.
This recipe will show you how to configure Splunk to receive syslog data on a UDP network port, but it is also applicable to the TCP port configuration.
To step through this recipe, you will need a running Splunk Enterprise server. No other prerequisites are required.
Follow these steps to configure Splunk to receive network UDP data:
Log in to your Splunk server.
From the menu in the top right-hand corner, click on the
Settings
menu and then click on the
Add Data
link.
If you are prompted to take a quick tour, click on
Skip
.
In the
How do you want to add data
section, click on
Monitor.
Click on the
TCP / UDP
section:
Ensure the
UDP
option is selected and in the
Port
section, enter
514
. On Unix/Linux, Splunk must be running as root to access privileged ports such as
514
. An alternative would be to specify a higher port, such as port 1514, or route data from 514 to another port using routing rules in
iptables
. Then, click on
Next
:
In the
Source type
section, select
Select
and then select
syslog
from the
Select Source Type
drop-down list and click
Review
:
Review the settings and if everything is correct, click
Submit
.
If everything was successful, you should see a
UDP input has been created successfully
message:
Click on the
Start
S
earching
button. The
Search & Reporting
app will open with the search already populated based on the settings supplied earlier in the recipe. Splunk is now configured to listen on UDP port
514
. Any data sent to this port now will be assigned the syslog source type. To search for the syslog source type, you can run the following search:
source="udp:514" sourcetype="syslog"
Understandably, you will not see any data unless you happen to be sending data to your Splunk server IP on UDP port 514.
When you add a new network port input, you basically add a new configuration stanza into an inputs.conf file behind the scenes. The Splunk server can contain one or more inputs.conf files, and these files are either located in the $SPLUNK_HOME/etc/system/local or the local directory of a Splunk app.
To collect data on a network port, Splunk will set up a socket to listen on the specified TCP or UDP port and will index any data it receives on that port. For example, in this recipe, you configured Splunk to listen on port 514 for UDP data. If data was received on that port, then Splunk would index it and assign a syslog source type to it.
Splunk also provides many configuration options that can be used with network inputs, such as how to resolve the host value to be used on the collected data.
While adding inputs to receive data from network ports can be done through the web interface of Splunk, as outlined in this recipe, there are other approaches to add multiple inputs quickly; these inputs allow for customization of the many configuration options that Splunk provides.
You can also add a file or directory input via the Splunk CLI. Navigate to your $SPLUNK_HOME/bin directory and execute the following command (just replace the protocol, port, and source type you wish to use):
We will use the following code for Unix:
./splunk add udp 514 -sourcetype syslog
We will use the following code for Windows:
splunk add udp 514 -sourcetype syslog
There are a number of different parameters that can be passed along with the port. See the Splunk documentation for more on data inputs using the CLI (https://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorfilesanddirectoriesusingtheCLI).
The
Indexing files and directories
recipe
The
Using scripted inputs
recipe
The
Using modular inputs
recipe
Not all data that is useful for operational intelligence comes from logfiles or network ports. Splunk will happily take the output of a command or script and index it along with all your other data.
Scripted inputs are a very helpful way to get that hard-to-reach data. For example, if you have third-party-supplied command-line programs that can output data you would like to collect, Splunk can run the command periodically and index the results. Typically, scripted inputs are often used to pull data from a source, whereas network inputs await a push of data from a source.
This recipe will show you how to configure Splunk on an interval to execute your command and direct the output into Splunk.
To step through this recipe, you will need a running Splunk server and the provided scripted input script suited to the environment you are using. For example, if you are using Windows, use the cp01_scripted_input.bat file. This script should be placed in the $SPLUNK_HOME/bin/scripts directory. No other prerequisites are required.
Follow these steps to configure a scripted input:
Log in to your Splunk server.
From the menu in the top right-hand corner, click on the
Settings
menu and then click on the
Add Data
link.
If you are prompted to take a quick tour, click on
Skip
.
In the
How do you want to add data
section, click on
Monitor
.
Click on the
Scripts
section:
A form will be displayed with a number of input fields. In the
Script Path
drop-down, select the location of the script. All scripts must be located in a Splunk
bin
directory, either in
$SPLUNK_HOME/bin/scripts
or an appropriate bin directory within a Splunk app, such as
$SPLUNK_HOME/etc/apps/search/bin
.
In the
Script Name
dropdown, select the name of the script. In the
Commands
field, add any command-line arguments to the auto-populated script name.
Enter the value in the
Interval
field (in seconds) in which the script is to be run (the default value is
60.0
seconds) and then click
Next
:
In the
Source Type
section, you have the option to either select a predefined source type or to select
New
and enter your desired value. For the purpose of this recipe, select
New
as the source type and enter
cp01_scripted_input
as the value for the source type. Then click
Review
:
By default, data will be indexed into the Splunk index of main. To change this destination index, select your desired index from the drop-down list in the Index section of the form.
Review the settings. If everything is correct, click Submit.
If everything was successful, you should see a Script input has been created successfully message:
Click on the
Start searching
button. The
Search & Reporting
app will open with the search already populated based on the settings supplied earlier in the recipe. Splunk is now configured to execute the scripted input you provided every 60 seconds, in accordance with the specified interval. You can search for the data returned by the scripted input using the following search:
sourcetype=cp01_scripted_input
When adding a new scripted input, you are directing Splunk to add a new configuration stanza into an inputs.conf file behind the scenes. The Splunk server can contain one or more inputs.conf files, located either in $SPLUNK_HOME/etc/system/local or the local directory of a Splunk app.
After creating a scripted input, Splunk sets up an internal timer and executes the command that you have specified, in accordance with the defined interval. It is important to note that Splunk will only run one instance of the script at a time, so if the script gets blocked for any reason, it will cause the script to not be executed again, until after it has been unblocked.
Since Splunk 4.2, any output of the scripted inputs that are directed to stderr (causing an error) are captured to the splunkd.log file, which can be useful when attempting to debug the execution of a script. As Splunk indexes its own data by default, you can search for that data and put an alert on it if necessary.
For security reasons, Splunk does not execute scripts located outside of the bin directories mentioned earlier. To overcome this limitation, you can use a wrapper script (such as a shell script in Linux or batch file in Windows) to call any other script located on your machine.
The
Indexing files and directories
recipe
The
Getting data through network ports
recipe
The
Using modular inputs
recipe
Since Splunk 5.0, the ability to extend data input functionality has existed such that custom input types can be created and shared while still allowing for user customization to meet needs.
Modular inputs build further upon the scripted input model. Originally, any additional functionality required by the user had to be contained within a script. However, this presented a challenge, as no customization of this script could occur from within Splunk itself. For example, pulling data from a source for two different usernames needed two copies of a script or meant playing around with command-line arguments within your scripted input configuration.
By leveraging the modular input capabilities, developers are now able to encapsulate their code into a reusable app that exposes parameters in Splunk and allows for configuration through processes familiar to Splunk administrators.
This recipe will walk you through how to install the Command Modular Input, which allows for periodic execution of commands and subsequent indexing of the command output. You will configure the input to collect the data outputted by the vmstat command in Linux and the systeminfo command in Windows.
