Skip to content

Commit bb707df

Browse files
committed
Initial commit
0 parents  commit bb707df

File tree

1 file changed

+115
-0
lines changed

1 file changed

+115
-0
lines changed

README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Abstract
2+
With the emergence of NetSaint/Nagios at the latest, this system and their successors/clones
3+
have relied on a loose group of programs called "Monitoring Plugins" to do the lower level
4+
task of actually determining the state of particular entity or conduct measurements of certain
5+
values.
6+
7+
This document shall help users and especially developers of those programs as a basis
8+
on how they should be implemented, how they should work and how they should behave.
9+
It encourages the standardization of libraries, Monitoring Plugins and Monitoring Systems,
10+
to reduce the cognitive load on users, administrators and developers, if they work with
11+
different implementations.
12+
13+
These guidelines aim to be mostly as general as possible and not to assume anticipate a special
14+
implementation detail, e.g. the programming language, the install mechanism or the monitoring
15+
system which executes the Monitoring Plugin.
16+
17+
# Language
18+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
19+
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
20+
"OPTIONAL" in this document are to be interpreted as described in
21+
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
22+
capitals, as shown here.
23+
24+
# Terminology
25+
26+
## Monitoring Plugin
27+
Is an executable on a _normal_ computer system (meaning something like a commonly occurring system with an operating system
28+
like something bases on Linux, FreeBSD, Windows or something similar)
29+
30+
## Monitoring System
31+
Is a software which, for the scope of this document, executes a *Monitoring Plugin*
32+
33+
34+
# The Monitoring Plugin Interface
35+
36+
## The basic Monitoring Plugin usage
37+
A Monitoring System executes a Monitoring Plugin. The Monitoring Plugin MAY accept parameters in
38+
the form of command line arguments, environment variables or a configuration file (the location of which
39+
MAY in turn be given on the command line or via environment variable).
40+
The Monitoring Plugin then proceeds to execute it's duty and returns the result to the Monitoring System.
41+
Part of the process of returning the result is the termination of the execution of the Monitoring Plugin itself.
42+
43+
## Input Parameters for a Monitoring Plugin
44+
45+
## Ouput of a Monitoring Plugin
46+
The output of a Monitoring Plugin consists of two parts on the first level, the *Exit Code* and
47+
output in textual form on _stdout_.
48+
49+
### Exit Code
50+
The *Monitoring Plugin* MUST make use of the *Exit Code* as a method to communicate a result to
51+
the *Monitoring System*. Since the *Exit Code* is more or less standardized over different systems
52+
as a number with a size of or greater than 256 bit, the following mapping is used:
53+
54+
| *Exit Code* (numerical) | Meaning (short) | Meaning (extended) |
55+
| --- | --- | --- |
56+
| 0 | OK | The execution of the *Monitoring Plugin* proceeded as planned and whatever it test appeared to function properly and the measured values are with their respective thresholds |
57+
| 1 | WARNING | The execution of the *Monitoring Plugin* proceeded as planned and whatever it test appeared to *not* function properly or the measured values are *not* with their respective thresholds. The problem(s) do(es) *not* seem exceptionally grave though and do(es) *not* require immediate attention |
58+
| 2 | CRITICAL | The execution of the *Monitoring Plugin* proceeded as planned and whatever it test appeared to *not* function properly or the measured values are *not* with their respective thresholds. The problem(s) *do(es)* seem exceptionally grave though and *do(es)* require immediate attention |
59+
| 3 | UNKNOWN | The execution of the *Monitoring Plugin* *did not* proceed as planned. The reasons might be manifold, e.g. missing permissions, missing libraries, no available network connection to the destination, etc.. In summary: The *Monitoring Plugin* could *not* determine the state of whatever it should have been checking and can therefore make no reliable statement about it. |
60+
| 4-31 | reserved for future use |
61+
62+
### Textual Output
63+
The original purpose of the output on _stdout_ was to provide human readable information for the user of the *Monitoring System*,
64+
a way for the *Monitoring Plugin* to communicate further details on what happened.
65+
This purpose still exists, but was expanded with the, so called, *perfdata* (performance data) to allow the machine readable
66+
communication of measured values for further processing in the *Monitoring System*, e.g. for the creation of diagrams.
67+
68+
Therefore the further explanation is split into *human readable output* and *perfdata*.
69+
70+
#### Human readable output
71+
This part of the output should give an user information about the state of the test and, in the case of problems, ideally hint what
72+
the origin of the problem might be or what the symptoms are. If the test relies on numeric values, this might be displayed to
73+
give an user more information about the specific problem.
74+
It might consist of one or more lines of printable symbols.
75+
76+
Examples:
77+
```
78+
Remaining space on filesystem "/" is OK
79+
80+
Sensor temperature is within thresholds
81+
82+
Available Memory is too low
83+
84+
Sensore temperature exceeds thresholds
85+
```
86+
are OK, but
87+
```
88+
Remaining space on filesystem "/" is OK ( 62GiB / 128GiB )
89+
90+
Sensor temperature is within thresholds ( 42°C )
91+
92+
Available Memory is too low ( 126MiB / 32GiB )
93+
94+
Sensor temperature exceeds thresholds ( 78°C > 70°C )
95+
```
96+
are better.
97+
98+
Although no strict guidelines for creating this part of the output can really be given, a developer should
99+
keep a potential user in mind. It might, for example, be OK to put the output in a single line if there are
100+
only one or two items of a similar type (think: multiple file systems, multiple sensors, etc.) are present,
101+
but not if there 10 or 100, although this might present a total valid use case.
102+
If there are several different items exists in the output of the *Monitoring Plugin*, furthermore called *partial results*,
103+
they probably SHOULD be given their own line in the output.
104+
105+
#### Performance data
106+
In addition to the human readable part the output can contain machine readable measurement values. These data points
107+
are separated from the human readable part by the "|" symbol which is in effect until the end of the line.
108+
The performance data then MUST consist of space separated single values, these MUST have the following format:
109+
110+
`'label'=value[UOM][;warn[;crit[;min[;max]]]]`
111+
112+
with the following definitions:
113+
1. _label_ must consist of at least on non-space character, but can otherwise contain any printable characters except for the equals sign ("=") or single quotes ("'").
114+
If it contains spaces, it must be surrounded by single quotes
115+
2. _value_ is a numerical value

0 commit comments

Comments
 (0)