Writing a new servicemon plugin
This guide will document an example of how to implement a new service monitor plugin in NAV.
A quick intro to the service monitor
The NAV service monitor is part of the legacy statemon system. NAV development has not put much effort in service monitoring, though, focusing instead on the core functionality of network hardware monitoring. The service monitoring system therefore has its limitations.
The basic limitations are as follows:
A service must be associated with an IP device (e.g. a server).
A service must necessarily be hosted on the single management IP address of the server. Multi-homed servers are not supported (although you can potentially hack this by adding an IP address as an argument to your plugin).
The goal
Write a service monitor plugin that verifies the presence of the phrase “Don’t Panic” in a web page.
Writing the plugin
The basics
A servicemon plugin is called a checker.
Requirements:
The checker must be a Python class inheriting from
nav.statemon.abstractchecker.AbstractChecker
.The class name must be suffixed with the string
Checker
.The class must be contained within a module that has an identical name.
The module must reside in the
nav.statemon.checker
package.
Writing a simple checker skeleton
We begin with something like this, in the file
nav/statemon/checker/PanicChecker.py
:
1from nav.statemon.event import Event
2from nav.statemon.abstractchecker import AbstractChecker
3
4
5class PanicChecker(AbstractChecker):
6 IPV6_SUPPORT = True
7 DESCRIPTION = "Checks for panic"
8
9 def execute(self):
10 return Event.UP, "Everything is fine"
You can now open SeedDB in your browser, go to
and add this new service checker to some server.Shortly thereafter, you should see this in the logs (servicemon.log
):
[2015-09-02 10:22:14] abstractchecker.py:run:107 [Info] server.example.org:panic -> Everything is fine
And the service matrix status should be green:
Note
Another one of the limitations of the service monitor system, is that the status of a service will be shown as green even if the service checker doesn’t run. It will only flag an alert if the service checker actually runs and reports a failure.
Congratulations, you have just recreated the
nav.statemon.checker.DummyChecker
- a checker that accepts no
arguments and always reports a positive result.
Fleshing out the plugin
Now let’s make the plugin do something useful. Instead of replicating the
convoluted nav.statemon.checker.HttpChecker
, we use the Python
Requests library to fetch the web page and examine its contents:
1import requests
2from nav.statemon.event import Event
3from nav.statemon.abstractchecker import AbstractChecker
4
5
6class PanicChecker(AbstractChecker):
7 IPV6_SUPPORT = True
8 DESCRIPTION = "Checks for panic"
9
10 def execute(self):
11 ip, port = self.get_address()
12 url = 'http://%s/' % ip
13 req = requests.get(url)
14 if req.status_code == 200:
15 if "don't panic!" in req.text.lower():
16 return Event.UP, "No panic is needed"
17 else:
18 return Event.DOWN, "Panicking!"
19 else:
20 return Event.DOWN, "%s status code = %s" % (url, req.status_code)
As you can see from lines 12 and 13, we get the server’s IP address using the
inherited get_address()
function, and build an HTTP URL from that.
Note
Servicemon does not reload changed plugins, so you will need to restart the daemon for this change to take effect.
Given that server.example.org
is not currently running a web server, you
should now get logs that look like this:
[2015-09-02 10:52:58] abstractchecker.py:run:107 [Info] server.example.org:panic -> ('Connection aborted.', error(111, 'Connection refused'))
[2015-09-02 10:52:58] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, ('Connection aborted.', error(111, 'Connection refused')))
[2015-09-02 10:53:03] abstractchecker.py:run:107 [Info] server.example.org:panic -> ('Connection aborted.', error(111, 'Connection refused'))
[2015-09-02 10:53:03] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, ('Connection aborted.', error(111, 'Connection refused')))
[2015-09-02 10:53:08] abstractchecker.py:run:107 [Info] server.example.org:panic -> ('Connection aborted.', error(111, 'Connection refused'))
[2015-09-02 10:53:08] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, ('Connection aborted.', error(111, 'Connection refused')))
[2015-09-02 10:53:13] abstractchecker.py:run:107 [Info] server.example.org:panic -> ('Connection aborted.', error(111, 'Connection refused'))
[2015-09-02 10:53:13] abstractchecker.py:run:133 [Alert ] server.example.org:panic -> DOWN, ('Connection aborted.', error(111, 'Connection refused'))
This tells you a couple of things:
If an exception is raised inside the
execute()
method, this is automatically considered a service failure by servicemon.When your service checker reports a changed state, servicemon will retry it three times before posting an event to NAV’s event queue.
With a web server active at server.example.org
, you might get this instead:
[2015-09-02 11:08:18] abstractchecker.py:run:107 [Info] server.example.org:panic -> Panicking!
[2015-09-02 11:08:18] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, Panicking!)
[2015-09-02 11:08:23] abstractchecker.py:run:107 [Info] server.example.org:panic -> Panicking!
[2015-09-02 11:08:23] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, Panicking!)
[2015-09-02 11:08:28] abstractchecker.py:run:107 [Info] server.example.org:panic -> Panicking!
[2015-09-02 11:08:28] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, Panicking!)
[2015-09-02 11:08:34] abstractchecker.py:run:107 [Info] server.example.org:panic -> Panicking!
[2015-09-02 11:08:34] abstractchecker.py:run:133 [Alert ] server.example.org:panic -> DOWN, Panicking!
At this point, add the string “Don’t Panic!” to the document served at /
on your web server. You should soon see this in the logs:
[2015-09-02 11:12:23] abstractchecker.py:run:107 [Info] server.example.org:panic -> No panic is needed
[2015-09-02 11:12:23] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (UP, No panic is needed)
[2015-09-02 11:12:28] abstractchecker.py:run:107 [Info] server.example.org:panic -> No panic is needed
[2015-09-02 11:12:28] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (UP, No panic is needed)
[2015-09-02 11:12:33] abstractchecker.py:run:107 [Info] server.example.org:panic -> No panic is needed
[2015-09-02 11:12:33] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (UP, No panic is needed)
[2015-09-02 11:12:38] abstractchecker.py:run:107 [Info] server.example.org:panic -> No panic is needed
[2015-09-02 11:12:38] abstractchecker.py:run:133 [Alert ] server.example.org:panic -> UP, No panic is needed
Summary so far
What have we learned so far?
A plugin must inherit from
nav.statemon.abstractchecker.AbstractChecker
.The plugin must implement the
execute()
method, where all the work is done.The
execute()
method must return its status as a two-element tuple. The first element must either beEvent.UP
orEvent.DOWN
, while the second element must be a human-readable description of the status (which will be attached to any events posted).The plugin can describe itself to the outside world by setting the class variables
IPV6_SUPPORT
andDESCRIPTION
.
Adding arguments
To make your plugin more configurable, it should accept arguments. Your plugin
class can specify mandatory arguments in the ARGS
class variable, while
optional arguments can be specified in the OPTARGS
class variable. Each of
these must be a list of two-tuples, each two-tuple consisting of an argument
name and argument description.
Any arguments provided when configuring the service checker in SeedDB will be
placed in a Python dictionary in the plugin’s args
instance variable.
Let’s make the document path configurable in our plugin:
1import requests
2from nav.statemon.event import Event
3from nav.statemon.abstractchecker import AbstractChecker
4
5
6class PanicChecker(AbstractChecker):
7 IPV6_SUPPORT = True
8 DESCRIPTION = "Checks for panic"
9 OPTARGS = (
10 ('path', 'Document path to fetch'),
11 )
12
13 def execute(self):
14 ip, port = self.get_address()
15 path = self.args.get('path', '/')
16 url = 'http://%s%s' % (ip, path)
17 req = requests.get(url)
18 if req.status_code == 200:
19 if "don't panic!" in req.text.lower():
20 return Event.UP, "No panic is needed"
21 else:
22 return Event.DOWN, "Panicking!"
23 else:
24 return Event.DOWN, "%s status code = %s" % (url, req.status_code)
Now, restart servicemon and add a path argument of /foo.html
to the
service instance, using SeedDB:
Assuming there is no /foo.html
document on your web server, this should be
seen in the logs:
[2015-09-02 11:13:49] abstractchecker.py:run:107 [Info] server.example.org:panic -> http://10.0.1.42/foo.html status code = 404
[2015-09-02 11:13:49] abstractchecker.py:run:124 [Notice] server.example.org:panic -> State changed. New check in 5 sec. (DOWN, http://10.0.1.42/foo.html status code = 404)
And if you then add a foo.html
, containing the phrase Don’t Panic!, you
should be good to go again.
In conclusion
Now you should know what you need to write new plugins for the service monitor. There is also a wealth of pre-existing checkers whose code you can use for inspiration. Good luck!