Event Engine¶
Introduction¶
The Event Engine is the backend process used by NAV to process the event queue. Whenever a NAV subsystem posts an event to the queue, the Event Engine will pick it up and decide what do to with it.
Typically, the Event Engine will generate an alert from the event, or it may ignore the event entirely, depending on the circumstances. In some cases, it will delay the alert for a grace period, while waiting for another corresponding event to resolve the pending problem.
Plugins¶
Most of the work of the Event Engine is done by event handler
plugins from the nav.eventengine.plugins
namespace. Each event picked
from the queue will be offered to each of the plugins, until one of them
decides to handle the event. If no plugins wanted to handle the event, the
Event Engine will perform a very simple default routine to translate
the event directly into an alert (possibly using alert hints given in the event
itself).
Configuration¶
The operation of the Event Engine can be customized using
configuration options in eventengine.conf
. Most of the configuration
concerns itself with configuring the grace periods (timeouts) for various types
of alerts. The default configuration looks somewhat like this:
# NAV eventengine configuration
[export]
# If set, the script option will point to a program that will receive a
# continuous stream of JSON serialized alert objects on its STDIN.
;script = /path/to/event/receiver/script
[timeouts]
#
# This section configures timeout values for alert quarantines. A quarantine
# is when one or multiple alerts are held back for a period of time, while
# waiting for the problem to resolve itself. It is a protection against a
# torrent of alerts for things that are rapidly flapping.
#
# All options are commented out with default values. Uncomment to change the
# defaults. Valid units are s=seconds, m=minutes, h=hours, d=days.
#
# When a boxDown event is received, how long to wait for resolve before
# sending out a boxDownwarning.
;boxDown.warning = 1m
# When a boxDown event is received, how long to wait for resolve before
# finally declaring the IP device as down.
;boxDown.alert = 4m
# When a moduleDown event is received, how long to wait for resolve before
# sending out a moduleDownWarning.
;moduleDown.warning = 1m
# When a moduleDown event is received, how long to wait for resolve before
# finally declaring the module as down.
;moduleDown.alert = 4m
# When a linkDown event is received, how long to wait for resolve before
# finally declaring the link as down.
;linkDown.alert = 4m
# When an snmpAgentDown event is received, how long to wait for resolve before
# finally declaring the SNMP agent as down.
;snmpAgentDown.alert = 4m
# When a bgpDown event is received, how long to wait for resolve before
# finally declaring the BGP session to be down.
;bgpDown.alert = 1m
[linkdown]
# This section contains options to control which link down events to
# send alerts about. Also see settings in ipdevpoll.conf about which links to
# generate events for in the first place.
# When enabled, only link loss on redundant links cause alerts to be sent.
# The rationale is that on a non-redundant link, you will get boxDown alerts
# for the devices behind that link, which are now unreachable.
;only_redundant = yes
# If a linkDown event is posted for a switch port that doesn't carry any of
# these vlans (tagged or untagged), no alert is sent. This is a
# space-separated list of VLAN tag numbers. An empty value means no filtering
# based on VLAN.
;limit_to_vlans =
Alert severity¶
All NAV alerts (as generated by Event Engine) are assigned a severity value, in the interval 1 through 5. These values can be used as part of your users’ Alert Profile filters, and should be interpreted roughly like this:
5 = Information
4 = Low
3 = Moderate
2 = High
1 = Critical
Severity values are normally chosen by the NAV program that generates the event
that an alert is based on. However, NAV cannot distinguish what severity level
any given alert constitutes for your NOC. Therefore, the Event
Engine lets you configure your own severity rules, using YAML syntax, in the
configuration file severity.yml
. Any rules present in this file will be
processed to set or modify the existing severity of any matching alert that is
generated.
Configuring severity.yml
¶
Here is an example severity configuration:
---
default-severity: 3
rules:
- alert_type: boxDown
severity: 2
rules:
- netbox.category.id: GSW
severity: 1
- netbox.category.id: GW
severity: 1
- alert_type: boxDownWarning
severity: 5
- netbox.organization.id: foobar
severity: '+2'
This configuration starts off by assigning a default severity level of 3 to every alert that Event Engine generates, regardless of what the original severity value of the event was.
Then follows a list of rules that will be processed in the order they appear in the file. Each rule consists of:
One or more alert attribute match expressions.
One severity value modification expression to be applied to an alert that matches the attribute expressions.
Optionally, a sub-list of more rules to further apply to any alert that matched the expressions of this rule.
The first example rule will match any alert whose alert_type
value equals
boxDown
(NAV’s alert type for a lasting “box is unreachable” incident). Any
such alert will be assigned a severity level of 2. Furthermore, the rule
lists two additional sub-rules to ensure that if the boxDown
alert was
issued for any netbox (IP Device) whose category is a router (a category id of
either GSW
or GW
), the severity is set to the most critical level of
1.
The second top-level example rule will match any alert whose type is
boxDownWarning
, and set its severity to the least critical level of
5. This is the stateless early warning the Event Engine issues a
few minutes before declaring a stateful boxDown
. It is safe to consider
this type of alert as only informational.
The final top-level example rule will match any alert whose associated netbox
(IP Device) is owned by the organizational id foobar
. This rule uses a
severity modifier expression of +2
, which will add 2
to the current
alert’s existing severity value.
In summary, if a boxDown
alert is dispatched for a router in your network,
this rule set will ensure its severity is set to 1. However, if the router
belongs to your less important foobar
department, two severity levels will
be deducted, and the alert comes out with a severity of 3.
Modifier expressions¶
There are two types of supported severity modifier expressions for use in rules:
Absolute values: An absolute integer will replace a matching alert’s current severity level.
Relative values: Prefixing an integer with
+
(or-
) will increase (or decrease) the existing severity value by the given amount.
Event Engine will silently ensure that no assigned or calculated severity value will ever exceed the valid range of 1-5.
Important
Please note that relative values must be enclosed in quotes,
to avoid confusion with absolute values. YAML interprets +2
as the absolute value of 2, while '+2'
is a relative
value.
A good practice would be to always quote your values, as that will work as intended in all cases.
Available event_type
and alert_type
values¶
Two of the available alert attributes that can be matched against in severity
rules are event_type
and alert_type
. However, event_type
is a
Python object: To match against an event type id/name, you must match against
the object’s id
attribute, i.e. event_type.id
, as the example
configuration file shows. See the event- and alert-type reference
documentation for a detailed list of available type names to
match.
Other matchable attributes¶
Most alerts generated by the Event Engine are associated with a
specific IP device registered in NAV (known as netbox
internally). Severity
rules can be used to match against attributes of IP devices, or even
sub-attributes thereof. As with the examples above, the ID (or name) of the
organizational unit that is responsible for an IP device can be read from
netbox.organization.id
. The ID of the wiring closet this device is located
in (as organized by you, the admin, in SeedDB), can be had from
netbox.room.id
.
See the reference documentation for the Netbox
model to see all the available attributes of an IP
device.