Nagios Core Administration Cookbook - Tom Ryder - E-Book

Nagios Core Administration Cookbook E-Book

Tom Ryder

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Network monitoring requires significantly more than just pinging hosts. This cookbook will help you to comprehensively test your networks' major functions on a regular basis."Nagios Core Administration Cookbook" will show you how to use Nagios Core as a monitoring framework that understands the layers and subtleties of the network for intelligent monitoring and notification behaviour. Nagios Core Administration Guide introduces the reader to methods of extending Nagios Core into a network monitoring solution. The book begins by covering the basic structure of hosts, services, and contacts and then goes on to discuss advanced usage of checks and notifications, and configuring intelligent behaviour with network paths and dependencies. The cookbook emphasizes using Nagios Core as an extensible monitoring framework. By the end of the book, you will learn that Nagios Core is capable of doing much more than pinging a host or to check if websites respond.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 457

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Nagios Core Administration Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding Hosts, Services, and Contacts
Introduction
Creating a new network host
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new HTTP service
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new e-mail contact
Getting ready
How to do it...
How it works...
There's more...
See also
Verifying configuration
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new hostgroup
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new servicegroup
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new contactgroup
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new time period
Getting ready
How to do it...
How it works...
There's more...
See also
Running a service on all hosts in a group
Getting ready
How to do it...
How it works...
There's more...
See also
2. Working with Commands and Plugins
Introduction
Finding a plugin
Getting ready
How to do it...
How it works...
There's more...
See also
Installing a plugin
Getting ready
How to do it...
How it works...
There's more...
See also
Removing a plugin
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a new command
Getting ready
How to do it...
How it works...
There's more...
See also
Customizing an existing command
Getting ready
How to do it...
How it works...
There's more...
See also
Using an alternative check command for hosts
Getting ready
How to do it...
How it works
There's more...
See also
Writing a new plugin from scratch
Getting ready
How to do it...
How it works...
There's more...
See also
3. Working with Checks and States
Introduction
Specifying how frequently to check a host or service
Getting ready
How to do it...
How it works...
There's more...
See also
Changing thresholds for PING RTT and packet loss
Getting ready
How to do it...
How it works...
There's more...
See also
Changing thresholds for disk usage
Getting ready
How to do it...
How it works
There's more...
See also
Scheduling downtime for a host or service
Getting ready
How to do it...
How it works...
There's more...
See also
Managing brief outages with flapping
Getting ready
How to do it...
How it works...
There's more...
See also
Adjusting flapping percentage thresholds for a service
Getting ready
How to do it...
How it works...
There's more...
See also
4. Configuring Notifications
Introduction
Configuring notification periods
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring notification for groups
Getting ready
How to do it...
How it works...
There's more...
See also
Specifying which states to be notified about
Getting ready
How to do it...
How it works...
There's more...
See also
Tolerating a certain number of failed checks
Getting ready
How to do it...
How it works...
There's more...
See also
Automating contact rotation
Getting ready
How to do it...
How it works...
There's more...
See also
Defining an escalation for repeated notifications
Getting ready
How to do it...
How it works...
There's more...
See also
Defining a custom notification method
Getting ready
How to do it...
How it works...
There's more...
See also
5. Monitoring Methods
Introduction
Monitoring PING for any host
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring SSH for any host
Getting ready
How to do it...
How it works...
There's more...
See also
Checking an alternative SSH port
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring mail services
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring web services
Getting ready
How to do it...
How it works...
There's more...
See also
Checking that a website returns a given string
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring database services
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring the output of an SNMP query
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring a RAID or other hardware device
Getting ready
How to do it...
How it works...
See also
Creating an SNMP OID to monitor
Getting ready
How to do it...
How it works...
There's more...
See also
6. Enabling Remote Execution
Introduction
Monitoring local services on a remote machine with NRPE
Getting ready
How to do it...
How it works...
There's more...
See also
Setting the listening address for NRPE
Getting ready
How to do it...
How it works...
See also
Setting allowed client hosts for NRPE
Getting ready
How to do it...
How it works...
There's more...
See also
Creating new NRPE command definitions securely
Getting ready
How to do it...
How it works...
There's more...
See also
Giving limited sudo privileges to NRPE
Getting ready
How to do it...
How it works...
There's more...
See also
Using check_by_ssh with key authentication instead of NRPE
Getting ready
How to do it...
How it works...
There's more...
See also
7. Using the Web Interface
Introduction
Using the Tactical Overview
Getting started
How to do it...
How it works...
There's more...
See also
Viewing and interpreting availability reports
Getting started
How to do it...
How it works...
There's more...
See also
Viewing and interpreting trends
Getting started
How to do it...
How it works...
There's more...
See also
Viewing and interpreting notification history
Getting started
How to do it...
How it works...
There's more...
See also
Adding comments on hosts or services in the web interface
Getting started
How to do it...
How it works...
There's more...
See also
Viewing configuration in the web interface
Getting started
How to do it...
How it works...
There's more...
See also
Scheduling checks from the web interface
Getting started
How to do it...
How it works...
There's more...
See also
Acknowledging a problem via the web interface
Getting started
How to do it...
How it works...
There's more...
See also
8. Managing Network Layout
Introduction
Creating a network host hierarchy
Getting ready
How to do it...
How it works...
There's more...
See also
Using the network map
Getting ready
How to do it...
How it works...
There's more...
See also
Choosing icons for hosts
Getting ready
How to do it...
How it works...
There's more
See also
Establishing a host dependency
Getting ready
How to do it...
How it works...
There's more...
See also
Establishing a service dependency
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring individual nodes in a cluster
Getting ready
How to do it...
How it works...
There's more...
See also
Using the network map as an overlay
Getting ready
How to do it...
How it works...
There's more...
See also
9. Managing Configuration
Introduction
Grouping configuration files in directories
Getting ready
How to do it...
How it works...
There's more...
See also
Keeping configuration under version control
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring host roles using groups
Getting ready
How to do it...
How it works...
There's more...
See also
Building groups using regular expressions
Getting ready
How to do it...
How it works...
There's more...
See also
Using inheritance to simplify configuration
Getting ready
How to do it...
How it works...
There's more...
See also
Defining macros in a resource file
Getting ready
How to do it...
How it works...
There's more...
See also
Dynamically building host definitions
Getting ready
How to do it...
How it works...
There's more...
See also
10. Security and Performance
Introduction
Requiring authentication for the web interface
Getting ready
How to do it...
How it works...
There's more...
See also
Using authenticated contacts
Getting ready
How to do it...
How it works...
There's more...
See also
Writing debugging information to a Nagios log file
Getting ready
How to do it...
How it works...
There's more...
See also
Monitoring Nagios performance with Nagiostats
Getting ready
How to do it...
How it works...
There's more...
See also
Improving startup times with pre-cached object files
Getting ready
How to do it...
How it works...
There's more...
See also
Setting up a redundant monitoring host
Getting ready
How to do it...
How it works...
There's more...
See also
11. Automating and Extending Nagios Core
Introduction
Allowing and submitting passive checks
Getting ready
How to do it...
How it works...
There's more...
See also
Submitting passive checks from a remote host with NSCA
Getting ready
How to do it...
How it works...
There's more...
See also
Submitting passive checks in response to SNMP traps
Getting ready
How to do it...
How it works...
There's more...
See also
Setting up an event handler script
Getting ready
How to do it...
How it works...
There's more...
See also
Tracking host and service states with Nagiosgraph
Getting ready
How to do it...
How it works...
There's more...
See also
Reading status into a MySQL database with NDOUtils
Getting ready
How to do it...
How it works...
There's more...
See also
Writing customized Nagios Core reports
Getting ready
How to do it...
How it works...
See also
Getting extra visualizations with NagVis
Getting ready
How to do it...
How it works...
There's more...
See also
Index

Nagios Core Administration Cookbook

Nagios Core Administration Cookbook

Copyright © 2013 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2013

Production Reference: 1180113

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-84951-556-6

www.packtpub.com

Cover Image by Gavin Doremus (<[email protected]>)

Credits

Author

Tom Ryder

Reviewers

Emmanuel Dyan

John C. Kennedy

Pierguido Lambri

Acquisition Editor

Jonathan Titmus

Commissioning Editor

Shreerang Deshpande

Lead Technical Editor

Kedar Bhat

Technical Editor

Lubna Shaikh

Project Coordinator

Abhishek Kori

Proofreaders

Bernadette Watkins

Maria Gould

Indexer

Monica Ajmera

Graphics

Aditi Gajjar

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Tom Ryder is a systems administrator and former web developer from New Zealand. He uses Nagios Core as part of his "day job" as a systems administrator, monitoring the network for a regional Internet Service Provider. Tom works a great deal with UNIX-like systems, being a particular fan of GNU/Linux, and writes about usage of open source command line development tools on his blog Arabesque: http://www.blog.sanctum.geek.nz.

Thanks are of course due to Ethan Galstad and the Nagios Core development team for writing and maintaining Nagios Core, along with the reference manual against which the book's material was checked.

Thanks are also due to my loving partner Chantelle Potroz for her patience and support as this book was being written, and to my employer James Watts of Inspire Net Limited for permission to write it.

Thanks also to Shreerang Deshpande and Kedar Bhat from Packt for their patience and technical guidance during the book's development.

The map of Australia used for the background to the network map in Chapter 8, Managing Network Layout, is sampled from the public domain images available on the superb Natural Earth website, viewable at http://www.naturalearthdata.com/.

About the Reviewers

Emmanuel Dyan is an expert in web development and in all the technologies gravitating around the Web: servers, network infrastructures, languages, and software.

He has been managing his own company, iNet Process, since 2004. He opened a branch in India in 2006 for its development needs, and recruited the staff. He then had to define the working procedures as well as the tools and set up the work environment. iNet Process implements and hosts CRM solutions based on SugarCRM. Its clients are mainly from France and are big, medium, as well as small companies.

Emmanuel teaches development languages, IT strategies, and CMS (Drupal) in a French University (Paris-Est Marne-la-Vallée) for students preparing for a Master's degree.

John C. Kennedy has been administering UNIX and Linux servers and workstations since 1997. He has experience with Red Hat, SUSE, Ubuntu, Debian, Solaris, and HP-UX. John is also experienced in BASH shell scripting and is currently teaching himself Python and Ruby. John has also been a technical editor for various publishers for over 10 years specializing in open source related books.

When John is not geeking out in front of either a home or work computer, he helps out with a German Shepherd rescue centre in Virginia by fostering some great dogs or helping the centre with their IT needs.

I would like to thank my family (my wonderful wife, Michele, my intelligent and caring daughter Denise, and my terrific and smart son, Kieran) for supporting the (sometimes) silly things and not so silly things I do. I'd also like to thank my current foster dogs for their occasional need to keep their legs crossed a little longer while I test things out from the book and forget they are there.

Pierguido Lambri has more than 10 years of experience with GNU/Linux and with the system administration side. He has worked with many operating systems (proprietary and open source), but he's a fan of the open source movement. Interested in everything that has to do with IT, he always likes to learn about new technologies.

Thanks to Abhishek Kori and Kedar Bhat for the opportunity of the book review.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Nagios Core, the open source version of the Nagios monitoring framework, is an industry standard for network monitoring hosted on Unix-like systems, such as GNU/Linux or BSD. It is very often used by network and system administrators for checking connectivity between hosts and ensuring that network services are running as expected.

Where home-grown scripts performing network checks can rapidly become unmaintainable and difficult for newer administrators to customize safely, Nagios Core provides a rigorous and configurable monitoring framework to make checks in a consistent manner and to alert appropriate people and systems of any problem it detects.

This makes Nagios Core a very general monitoring framework rather than an out-of-the-box monitoring solution, which is known to make it a little unfriendly to beginners and something of a "black box", even to otherwise experienced administrators. Busy administrators charged with setting up a Nagios Core system will often set it up to send PING requests to a set of hosts every few minutes and send them an e-mail about any problem, and otherwise never touch it. More adventurous administrators new to the system might instate a few HTTP checks to make sure that company websites respond.

Nagios Core is capable of a great deal more than that, and this book's recipes are intended to highlight all of the different means of refining and controlling checks, notifications, and reporting for Nagios Core, rather than being a list of instructions for using specific plugins, of which there are many hundreds available online at the Nagios Exchange at http://exchange.nagios.org/. The book's fundamental aim is to get administrators excited about the possibilities of Nagios Core beyond elementary default checking behavior, so that they can use much more of the framework's power, and make it into the centerpiece of their network monitoring.

This also includes installing and even writing custom plugins beyond the standard Nagios Plugins set, writing and refining one's own checks, working with the very powerful Simple Network Management Protocol (SNMP), recording and reporting of performance data, refining notification behavior to only send appropriate notifications at appropriate times to appropriate people or systems, basic visualization options, identifying breakages in network paths, clever uses of the default web interface, and even extending Nagios Core with other open source programs, all in order to check virtually any kind of host property or network service on any network.

Where possible, this book focuses on add-ons written by the Nagios team themselves, particularly NRPE and NSCA. It omits discussion of the popular NRPE replacement check_mk, and of popular forks of Nagios Core such as Icinga. In the interest of conferring an in-depth understanding of advanced Nagios Core configuration, it also does not discuss any configuration frontends or wizards such as NConf. Finally, as a Packt open source series book focusing on the use of the freely available Nagios Core, it also does not directly discuss the use of Nagios XI, the commercial version of the software supported by the Nagios team. This is done to instill a thorough understanding of Nagios Core itself, rather than to reflect personal opinions of the author; curious administrators should definitely investigate all of these projects (particularly check_mk).

What this book covers

Chapter 1, Understanding Hosts, Services, and Contacts, explains the basic building blocks of Nagios Core configurations and how they interrelate, and discusses using groups with each one.

Chapter 2, Working with Commands and Plugins, explains the architecture of plugins and commands, including installing new plugins, defining custom uses of existing ones, and walks us through for writing a new plugin with Perl.

Chapter 3, Working with Checks and States, explains how Nagios Core performs its checks and how to customize that behavior, including scheduling downtime for hosts and services, and managing "flapping" for hosts or services that keep going up and down.

Chapter 4, Configuring Notifications, explains the logic of how Nagios Core decides on what basis to notify, and when and to whom, including examples of implementing a custom notification method, escalating notifications that aren't fixed after a certain period of time, and scheduling contact rotation.

Chapter 5, Monitoring Methods, gives examples of usage of some of the standard Nagios Plugins set, moving from basic network connectivity checks with PING and HTTP to more complex and powerful checks involving SNMP usage.

Chapter 6, Enabling Remote Execution, shows how to use NRPE as a means of working around the problem of not being able to check system properties directly over the network, including a demonstration of the more advanced method of check_by_ssh.

Chapter 7, Using the Web Interface, shows some less-used features of the web interface to actually control how Nagios Core is behaving and to see advanced reports, rather than simply viewing current state information. Use of the network map is not discussed here but in the next chapter.

Chapter 8, Managing Network Layout, explains how to make Nagios Core aware of the structure and function of your network, with a focus on hosts and services depending on one another to function correctly, including monitoring clusters, and using that layout information to build a network status map, optionally with icons and a background.

Chapter 9, Managing Configuration, shows how to streamline, refine, and control Nagios Core configuration at a low level without the use of frontends. It focusses on the clever use of groups, templates, and macros, and gives an example of generating configuration programmatically with the templating language m4.

Chapter 10, Security and Performance, shows how to manage simple access control, debugging runtime problems, and keeping tabs on how Nagios Core is performing, as well as a demonstration of basic monitoring redundancy.

Chapter 11, Automating and Extending Nagios Core, explains how to submit check results from other programs (including NSCA) to provide information about external processes via the commands file, and an introduction to a few popular add-ons (NDOUtils, NagVis, and Nagiosgrapher).

What you need for this book

In an attempt to work with a "standard" installation of Nagios Core, this book's recipes assume that Nagios Core 3.0 or later and the Nagios plugins set have been installed in /usr/local/nagios, by following the Nagios Quickstart Guides available at http://nagios.sourceforge.net/docs/3_0/quickstart.html.

If your system's package repositories include a package for Nagios Core 3.0 or later that you would prefer to use, this should still be possible, but the paths of all the files are likely to be very different. This is known to be a particular issue with the nagios3 package on Debian or Ubuntu systems. If you are familiar with the differences in the installation layout that your packaging system imposes, then you should still be able to follow the recipes with only some path changes.

For the screenshots of the web interface, the familiar Nagios Classic UI is used (with white-on-black menu), which was the default from Version 3.0 for several years before the newer "Exfoliation" style (with black-on-white menu) became the default more recently. Some of the graphical elements and styles are different, but everything has the same text and is in the same place, so it's not necessary to install the Classic UI to follow along with the recipes if you already have the Exfoliation style installed.

If you really want the Classic UI so that what you're seeing matches the screenshots exactly, you can install it by adding this as the final step of your installation process:

# make install-classicui

This book was written while the alpha of Nagios Core 4.0 was being tested. I have reviewed the change logs of the alpha release so far, and am reasonably confident that all of the recipes should work in this newer version's final release. If after Nagios Core 4.0 is released you do find some issues with using some of the recipes for it, please see the "Errata" section to let the publisher and author know.

Who this book is for

This book is aimed at system and network administrators comfortable with basic Unix-like system administration via the command line. It is best suited for GNU/Linux administrators, but should work fine for BSD administrators too. It has particular focus on the kind of administrator identified in the preface: one who is comfortable working with their Unix-like system, may well have a basic Nagios Core installation ready with some PING checks, and now wants to learn how to use much more of the framework's power and understand its configuration in more depth.

Administrators should be comfortable with installing library dependencies for the extensions, plugins, and add-ons discussed in the book. An effort is made to mention any dependencies; however, how these are best installed will depend on the system and its package repository. In almost all cases this should amount to installing some common libraries and their headers from a packaging system. Debian and Ubuntu package names are given for some more complex cases.

The easier recipes in the first five chapters involve some recap of the basics of configuring Nagios Core objects. Users completely new to Nagios Core who have just installed it will almost certainly want to start with Chapter 1, Understanding Hosts, Services, and Contacts, after completing the Nagios Quickstart Guide, as latter chapters assume a fair amount of knowledge.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "Nagios Core will only need whatever information the PING tool would need for its own check_ping command".

A block of code is set as follows:

define service { use generic-service host_name sparta.naginet service_description HTTP check_command check_http }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

define host { host_name sparta.naginet alias sparta address 10.128.0.21 max_check_attempts 3 check_period 24x7 check_command check-host-alive contacts nagiosadmin notification_interval 60 notification_period 24x7 }

Any command-line input or output is written as follows:

# cd /usr/local/nagios/etc/objects# vi sparta.naginet.cfg

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "If the server restarted successfully, the web interface should show a brand new host in the Hosts list, in PENDING state as it waits to run a check that the host is alive".

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title through the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the erratasubmissionform link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Understanding Hosts, Services, and Contacts

In this chapter we will cover the following recipes:

Creating a new network hostCreating a new HTTP serviceCreating a new e-mail contactVerifying configurationCreating a new hostgroupCreating a new servicegroupCreating a new contactgroupCreating a new time periodRunning a service on all hosts in a group

Introduction

Nagios Core is appropriate for monitoring services and states on all sorts of hosts, and one of its primary advantages is that the configuration can be as simple or as complex as required. Many Nagios Core users will only ever use the software as a way to send PING requests to a few hosts on their local network or possibly the Internet, and to send e-mail or pager messages to the administrator if they don't get any replies. Nagios Core is capable of monitoring vastly more complex systems than this, scaling from simple LAN configurations to being the cornerstone for monitoring an entire network.

However, for both simple and complex configurations of Nagios Core, the most basic building blocks of configuration are hosts, services, and contacts. These are the three things that administrators of even very simple networking setups will end up editing and probably creating. If you're a beginner to Nagios Core, then you might have changed a hostname here and there or copied a stanza in a configuration to get it to do what you want. In this chapter, we're going to look at what these configurations do in a bit more depth than that.

In a Nagios Core configuration:

Hosts usually correspond to some sort of computer. This could be a physical or virtual machine accessible over the network, or the monitoring server itself. Conceptually, however, a host can monitor any kind of network entity, such as the endpoint of a VPN. Services usually correspond to an arrangement for Nagios Core to check something about a host, whether that's something as simple as getting PING replies from it, or something more complicated such as checking that the value of an SNMP OID is within acceptable bounds.Contacts define a means to notify someone when events happen to our services on our hosts, such as not being able to get a PING response, or being unable to send a test e-mail message.

In this chapter, we'll add all three of these, and we'll learn how to group their definitions together to make the configuration more readable, and to work with hosts in groups rather than having to edit each one individually. We'll also set up a custom time period for notifications, so that hardworking system administrators like us don't end up getting paged at midnight unnecessarily!

Creating a new network host

In this recipe, we'll start with the default Nagios Core configuration, and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up, and will automatically check it (probably using PING) on a regular basis. In this example, I'll use my Nagios Core monitoring server with a Domain Name System (DNS) name of olympus.naginet, and add a host definition for a webserver with a DNS name of sparta.naginet. This is all on my local network – 10.128.0.0/24.

Getting ready

You'll need a working Nagios Core 3.0 or greater installation with a web interface, with all the Nagios Core Plugins installed. If you have not yet installed Nagios Core, then you should start with the QuickStart guide: http://nagios.sourceforge.net/docs/3_0/quickstart.html.

We'll assume that the configuration file that Nagios Core reads on startup is located at /usr/local/nagios/etc/nagios.cfg, as is the case with the default install. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point, but it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server, and be able to write text files using an editor of your choice; I'll use vi. You will need root privileges on the server via su or sudo.

You should know how to restart Nagios Core on the server, so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole server to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios, which I'll use here.

You should also get the hostname or IP address of the server you'd like to monitor ready. It's good practice to use the IP address if you can, which will mean your checks keep working even if DNS is unavailable. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the PING tool would need for its own check_ping command.

Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server via PING by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:

tom@olympus:~$ ping 10.128.0.21PING sparta.naginet (10.128.0.21) 56(84) bytes of data.64 bytes from sparta.naginet (10.128.0.21): icmp_req=1 ttl=64 time=0.149 ms

How to do it...

We can create the new host definition for sparta.naginet as follows:

Change directory to /usr/local/nagios/etc/objects, and create a new file called sparta.naginet.cfg:
# cd /usr/local/nagios/etc/objects# vi sparta.naginet.cfg
Write the following into the file, changing the values in bold as appropriate for your own setup:
define host { host_name sparta.naginet alias sparta address 10.128.0.21 max_check_attempts 3 check_period 24x7 check_command check-host-alive contacts nagiosadmin notification_interval 60 notification_period 24x7 }
Change directory to /usr/local/nagios/etc, and edit the nagios.cfg file:
# cd ..# vi nagios.cfg
At the end of the file add the following line:
cfg_file=/usr/local/nagios/etc/objects/sparta.naginet.cfg
Restart the Nagios Core server:
# /etc/init.d/nagios restart

If the server restarted successfully, the web interface should show a brand new host in the Hosts list, in PENDING state as it waits to run a check that the host is alive:

In the next few minutes, it should change to green to show that the check passed and the host is UP, assuming that the check succeeded:

If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, then it would probably look similar to the following screenshot:

How it works...

The configuration we included in this section adds a host to Nagios Core's list of hosts. It will periodically check the host by sending a PING request, checking to see if it receives a reply, and updating the host's status as shown in the Nagios Core web interface accordingly. We haven't defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core, and we can view its state in the web interface at any time.

The directives we defined in the preceding configuration are explained as follows:

host_name: This defines the hostname of the machine, used internally by Nagios Core to refer to its host. It will end up being used in other parts of the configuration.alias: This defines a more recognizable human-readable name for the host; it appears in the web interface. It could also be used for a full-text description of the host.address: This defines the IP address of the machine. This is the actual value that Nagios Core will use for contacting the server; using an IP address rather than a DNS name is generally best practice, so that the checks continue to work even if DNS is not functioning.max_check_attempts: This defines the number of times Nagios Core should try to repeat the check if checks fail. Here, we've defined a value of 3, meaning that Nagios Core will try two more times to PING the target host after first finding it down.check_period: This references the time period that the host should be checked. 24x7 is a time period defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, and not how often it will notify anyone.check_command: This references the command that will be used to check whether the host is UP, DOWN, or UNREACHABLE. In this case, a QuickStart Nagios Core configuration defines check-host-alive as a PING check, which is a good test of basic network connectivity, and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.contacts: This references the contact or contacts that will be notified about state changes in the host. In this instance, we've used nagiosadmin, which is defined in the QuickStart Nagios Core configuration.notification_interval: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of 60, which corresponds to 60 minutes or one hour.notification_period: This references the time period during which Nagios Core should send out notifications, if there are problems. Here, we're again using the 24x7 time period; for other hosts, another time period such as workhours might be more appropriate.

Note that we added the definition in its own file called sparta.naginet.cfg, and then referred to it in the main nagios.cfg configuration file. This is simply a conventional way of laying out hosts, and it happens to be quite a tidy way to manage things to keep definitions in their own files.

There's more...

There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.

While this is a perfectly valid way of specifying a host, it's more typical to define a host based on some template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core's QuickStart sample configuration defines a simple template host called generic-host, which could be used by extending the host definition with the use directive:

define host { use generic-host name sparta host_name sparta.naginet address 10.128.0.21 max_check_attempts 3 contacts nagiosadmin }

This uses all the parameters defined for generic-host, and then adds on the details of the specific host that needs to be checked. Note that if you use generic-host, then you will need to define check_command in your host definition. If you're curious to see what's defined in generic-host, then you can find its definition in /usr/local/nagios/etc/objects/templates.cfg.

See also

The Using an alternative check command for hosts recipe in Chapter 2, Working with Commands and PluginsThe Specifying how frequently to check a host recipe in Chapter 3, Working with Checks and StatesThe Grouping configuration files in directories and Using inheritance to simplify configuration recipes in Chapter 9, Managing Configuration

Creating a new HTTP service

In this recipe, we'll create a new service to check on an existing host. Specifically, we'll check our sparta.naginet server to see if it's responding to HTTP requests on the usual HTTP TCP port 80. To do this, we'll be using a predefined command called check_http, which in turn uses one of the standard set of Nagios Core plugins, also called check_http. If you don't yet have a web server defined as a host in Nagios Core, then you may like to try the Creating a new network host recipe in this chapter.

After we've done this, not only will our host be checked for a PING response by check_command, but Nagios Core will also run a periodic check to ensure that an HTTP service on that machine is responding to requests on the same host.

Getting ready

You'll need a working Nagios Core 3.0 or greater installation with a web interface, all the Nagios Plugins installed, and at least one host defined. If you need to set up a host definition for your web server first, then you might like to read the Creating a new network host recipe in this chapter, for which the requirements are the same.

It would be a good idea to test that the Nagios Core server is actually able to contact the web server first, to ensure that the check we're about to set up should succeed. The standard telnet tool is a fine way to test that a response comes back from TCP port 80 as we would expect from a web server:

tom@olympus:~$ telnet sparta.naginet 80Trying 10.128.0.21...Connected to sparta.naginet.Escape character is '^]'.

How to do it...

We can create the service definition for sparta.naginet as follows:

Change to the directory containing the file in which the sparta.naginet host is defined, and edit it as follows:
# cd /usr/local/nagios/etc/objects# vi sparta.naginet.cfg
Add the following code snippet to the end of the file, substituting in the value of the host's host_name directive:
define service { host_name sparta.naginet service_description HTTP check_command check_http max_check_attempts 3 check_interval 5 retry_interval 1 check_period 24x7 notification_interval 60 notification_period 24x7 contacts nagiosadmin }
Restart the Nagios Core server:
# /etc/init.d/nagios restart

If the server restarted successfully, the web interface should show a new service under the Services section, in PENDING state as the service awaits its first check:

Within a few minutes, the service's state should change to OK once the check has run and succeeded with an HTTP/1.1 200 OK response, or similar:

If the check had problems, perhaps because the HTTP daemon isn't running on the target server, then the check may show CRITICAL instead. This probably doesn't mean the configuration is broken; it more likely means the network or web server isn't working:

How it works...

The configuration we've added adds a simple service check definition for an existing host, to check up to three times whether the HTTP daemon on that host is responding to a simple HTTP/1.1 request. If Nagios Core can't get a response to its check, then it will flag the state of the service as CRITICAL, and will try again up to two more times before sending a notification. The service will be visible in the Nagios Core web interface and we can check its state at any time. Nagios Core will continue testing the server on a regular basis and flagging whether the checks were successful or not.

It's important to note that the service is like a property of a particular host; we define a service to check for a specific host, in this case, the sparta.naginet web server. That's why it's important to get the definition for host_name right.

The directives we defined in the preceding configuration are as follows:

host_name: This references the host definition for which this service should apply. This will be the same as the host_name directive for the appropriate host.service_description: This is a name for the service itself, something human-recognizable that will appear in alerts and in the web interface for the service. In this case, we've used HTTP.check_command: This references the command that should be used to check the service's state. Here, we're referring to a command defined in Nagios Core's default configuration called check_http, which refers to a plugin of the same name in the Nagios Core Plugins set.max_check_attempts: This defines the number of times Nagios Core should attempt to re-check the service after finding it in a state other than OK.check_interval: This defines how long Nagios Core should wait between checks when the service is OK, or after the number of checks given in max_check_attempts has been exceeded.retry_interval: This defines how long Nagios Core should wait between retrying checks after first finding them in a state other than OK.check_period: This references the time period during which Nagios Core should run checks of the service. Here we've used the sensible 24x7 time period, as defined in Nagios Core's default configuration. Note that this can be different from notification_period; we can check the service's status without necessarily notifying a contact.notification_interval: This defines how long Nagios Core should wait between re-sending notifications when a service is in a state other than OK.notification_period: This references the time period during which Nagios Core should send notifications if it finds a host in a problem state. Here we've again used 24x7, but for some less critical services it might be appropriate to use a time period such as workhours.

Note that we added the service definition in the same file as defining the host, and directly after it. We can actually place the definition anywhere we like, but this happens to be a good way of keeping things organized.

There's more...

The service we've set up to monitor on sparta.naginet is an HTTP service, but that's just one of many possible services we could monitor on our network. Nagios Core defines many different commands for its core plugin set, such as check_smtp, check_dns, and so on. These commands, in turn, all point to programs that actually perform a check and return the results to the Nagios Core server to be dealt with. The important thing to take away from this is that a service can monitor pretty much anything, and there are hundreds of plugins available for common network monitoring checks available on the Nagios Exchange website: http://exchange.nagios.org/.

There are a great deal more possible directives for services, and in practice it's more likely for even simple setups that we'll want to extend a service template for our service. This allows us to define values that we might want for a number of services, such as how long they should be in a CRITICAL state before a notification event takes place and someone gets contacted to deal with the problem.

One such template that Nagios Core's default configuration defines is called generic-service, and we can use it as a basis for our new service by referring to it with the use keyword:

define service { use generic-service host_name sparta.naginet service_description HTTP check_command check_http }

This may work well for you, as there are a lot of very sensible default values set by the generic-service template, which makes things a lot easier. We can inspect these values by looking at the template's definition in /usr/local/nagios/etc/objects/templates.cfg. This is the same file that includes the generic-host definition that we may have used earlier.

See also

The Creating a new servicegroup recipe in this chapterThe Specifying how frequently to check a service and Scheduling downtime for a host or service recipes in Chapter 3, Working with Checks and StatesThe Monitoring web services recipe in Chapter 5, Monitoring Methods

Creating a new e-mail contact

In this recipe, we'll create a new contact with which hosts and services can interact, chiefly to inform them of hosts or services changing states. We'll use the simplest example of setting up an e-mail contact, and configuring an existing host so that this person receives an e-mail message when Nagios Core's host checks fail and the host is apparently unreachable. In this instance, I'll make it e-mail me at [email protected] whenever my host, sparta.naginet, goes from DOWN to UP state, or vice-versa.

Getting ready

You should have a working Nagios Core 3.0 or greater server running, with a web interface and at least one host to check. If you need to do this first, see the Creating a new network host recipe in this chapter.

For this particular kind of contact, you'll also need to have a working SMTP daemon running on the monitoring server, such as Exim or Postfix. You should verify that you're able to send messages to the target address, and that they're successfully delivered to the mailserver you expect.

How to do it...

We can add a simple new contact to the Nagios Core configuration as follows:

Change to Nagios Core's object configuration directory; ideally it should contain a file that's devoted to contacts, such as contacts.cfg here, and edit that file:
# cd /usr/local/nagios/etc/objects# vi contacts.cfg
Add the following contact definition to the end of the file, substituting your own values for the properties in bold as you need them:
define contact { contact_name spartaadmin alias Administrator of sparta.naginet email [email protected] host_notification_commands notify-host-by-email host_notification_options d,u,r host_notification_period 24x7 service_notification_commands notify-service-by-email service_notification_options w,u,c,r service_notification_period 24x7 }
Edit the definition for the sparta.naginet host, and add or replace the definition for contacts for the appropriate host to our new spartaadmin contact:
define host { host_name sparta.naginet alias sparta address 10.128.0.21 max_check_attempts 3 check_period 24x7 check_command check-host-alive contacts spartaadmin notification_interval 60 notification_period 24x7 }
Restart the Nagios Core server:
# /etc/init.d/nagios restart

With this done, the next time our host changes its state, we should receive messages similar to the following:

When the host becomes available again, we should receive a recovery message similar to the following:

If possible, it's worth testing this setup with a test host that we can safely bring down and then up again, to check that we receive the appropriate notifications.

How it works...

This configuration adds a new contact to the Nagios Core configuration, and references it in one of the hosts as the appropriate contact to use when the host has problems.

We've defined the required directives for the contact, and a couple of others as follows:

contact_name: This defines a unique name for the contact, so that we can refer to it in host and service definitions, or anywhere else in the Nagios Core configuration.alias: This defines a human-friendly name for the contact, perhaps a brief explanation of who the person or group is and/or for what they're responsible.email: This defines the e-mail address of the contact, since we're going to be sending messages by e-mail.host_notification_commands: This defines the command or commands to be run when a state change on a host prompts a notification for the contact. In this case, we're going to e-mail the contact the results with a predefined command called notify-host-by-email.host_notification_options: This specifies the different kinds of host events for which this contact should be notified. Here, we're using d,u,r, which means that this contact will receive notifications for a host going DOWN, becoming UNREACHABLE, or coming back UP.host_notification_period: This defines the time period during which this contact can be notified of any host events. If a host notification is generated and defined to be sent to this contact, but it falls outside this time period, then the notification will not be sent.service_notification_commands: This defines the command or commands to be run when a state change on a service prompts a notification for this contact. In this case, we're going to e-mail the contact the results with a predefined command called notify-service-by-email.service_notification_options: This specifies the different kinds of service events for which this contact should be notified. Here, we're using w,u,c,r, which means we want to receive notifications about the services entering the WARNING, UNKNOWN, or CRITICAL states, and also when they recover and go back to being in the OK state.service_notification_period: This is the same as host_notification_period, except that this directive refers to notifications about services, and not hosts.

Note that we placed the definition for the contact in contacts.cfg, which is a reasonably sensible place. However, we can place the contact definition in any file that Nagios Core will read as part of its configuration; we can organize our hosts, services, and contacts any way we like. It helps to choose some sort of system, so we can easily identify where definitions are likely to be when we need to add, change, or remove them.

There's more...

If we define a lot of contacts with similar options, it may be appropriate to have individual contacts extend contact templates, so that they can inherit those common settings. The QuickStart Nagios Core configuration includes such a template, called generic-contact. We can define our new contact as an extension of this template, as follows:

define contact { use generic-contact alias Administrator of sparta.naginet email [email protected] }

To see the directives defined for generic-contact, you can inspect its definition in the /usr/local/nagios/etc/objects/templates.cfg file.

See also

The Creating a new contact group recipe in this chapterThe Automating contact rotation and Defining an escalation for repeated notifications recipes in Chapter 5, Monitoring Methods

Verifying configuration

In this recipe, we'll learn about the most basic step in debugging a Nagios Core configuration, which is to verify it. This is a very useful step to take before restarting the Nagios Core server to load an altered configuration, because it will warn us about possible problems. This is a good recipe to follow if you're not able to start the Nagios Core server at any point because of configuration problems, and instead get output similar to the following:

# /etc/init.d/nagios restart