Skip to content

Commit f39b469

Browse files
committed
[#3297] Updated ARM, added a ChangeLog
1 parent 555d933 commit f39b469

File tree

3 files changed

+128
-16
lines changed

3 files changed

+128
-16
lines changed

ChangeLog

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
2216. [func] tmark
2+
PerfMon hook library is now functional. It accumulates
3+
and reports performance data, and supports alarms. Still
4+
lacking are API commands.
5+
(Gitlab #3297)
6+
7+
18
2215. [bug] tmark
29
Corrected an issue that can cause kea-dhcp-ddns to
310
stop processing queued requests. Thanks to Shawn

doc/sphinx/arm/hooks-perfmon.rst

Lines changed: 120 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ to extend them with the ability to track and report performance related data.
99

1010
.. note::
1111

12-
This library is currently under development and not fully functional.
12+
This library is experimental and not recommended for use in production
13+
environments.
1314

1415
Overview
1516
~~~~~~~~
@@ -93,23 +94,127 @@ uniquely identified by a "duration key" which consists of the following values:
9394
Once the server has finished constructing a response to a query, the query's event stack
9495
is processed into a series of updates to monitored durations. If upon updating, a
9596
duration's sample interval is found to have been completed, it is sent to reporting
96-
and a new sample interval is begun.
97-
98-
.. Note:
99-
Monitoring is not yet functional.
97+
and a new sample interval is begun. The interval width is dictacted by configuration
98+
parameter ``interval-width-secs``.
99+
100+
The event stack for the multi-threaded mode DHCPDISCOVER/DHCPOFFER cycle shown above
101+
contains the following events:
102+
103+
+-----------------------------+--------------------+
104+
| Event Timestamp | Event Label |
105+
+=============================+====================+
106+
| 2024-Mar-20 14:52:20.067563 | socket_received |
107+
+-----------------------------+--------------------+
108+
| 2024-Mar-20 14:52:20.067810 | buffer_read |
109+
+-----------------------------+--------------------+
110+
| 2024-Mar-20 14:52:20.067897 | mt_queued |
111+
+-----------------------------+--------------------+
112+
| 2024-Mar-20 14:52:20.067952 | process_started |
113+
+-----------------------------+--------------------+
114+
| 2024-Mar-20 14:52:20.069614 | process_completed |
115+
+-----------------------------+--------------------+
116+
117+
Assuming the selected subnet's ID was 100, the duration updates formed by PerfMon
118+
from these events are shown below:
119+
120+
+--------------------------------------------------------------+--------------+
121+
| Duration Keys for SubnetID 100 | Update in |
122+
| | microseconds |
123+
+==============================================================+==============+
124+
| DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.100 | 247 |
125+
+--------------------------------------------------------------+--------------+
126+
| DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.100 | 87 |
127+
+--------------------------------------------------------------+--------------+
128+
| DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.100 | 55 |
129+
+--------------------------------------------------------------+--------------+
130+
| DHCPDISCOVER.DHCPOFFER.process_started-process_completed.100 | 1662 |
131+
+--------------------------------------------------------------+--------------+
132+
| DHCPDISCOVER.DHCPOFFER.composite-total_response.100 | 2051 |
133+
+--------------------------------------------------------------+--------------+
134+
135+
Notice that in addition to the adjacent event updates, there is an additional duration
136+
update for the total duration of the entire stack whose key contains the event-pair
137+
``composite-total_response``. This tracks the total time to responds from query
138+
received until the response is ready to send. Finally, there would also be global
139+
duration updates for each of the above:
140+
141+
+--------------------------------------------------------------+--------------+
142+
| Global Duration Keys | Update in |
143+
| | milliseconds |
144+
+==============================================================+==============+
145+
| DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.0 | 247 |
146+
+--------------------------------------------------------------+--------------+
147+
| DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.0 | 87 |
148+
+--------------------------------------------------------------+--------------+
149+
| DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.0 | 55 |
150+
+--------------------------------------------------------------+--------------+
151+
| DHCPDISCOVER.DHCPOFFER.process_started-process_completed.0 | 1662 |
152+
+--------------------------------------------------------------+--------------+
153+
| DHCPDISCOVER.DHCPOFFER.composite-total_response.0 | 2051 |
154+
+--------------------------------------------------------------+--------------+
155+
156+
Statistics Reporting
157+
~~~~~~~~~~~~~~~~~~~~
158+
159+
When enabled (see ``stats-mgr-reporting``), PerfMon will report a duration's data
160+
data each time the duration completes a sampling interval. Each statistic employs
161+
the following naming convention:
162+
163+
::
164+
165+
{subnet-id[x]}.perfmon.<query type>-<response type>.<start event>-<end event>.<value-name>
166+
167+
And there will be both a global and a subnet-specific value for each. Currently the only
168+
value reported for a given duration key is "average-ms". This statistic is the average time
169+
between the duration's event pair over the most recently completed interval. In other
170+
words if during a given interval there were 7 occurrences (i.e. updates) totaling
171+
350ms, the the average-ms reported would be 50ms. Continuing with example above, the
172+
statistics reported would be named as follows for subnet level values:
173+
174+
::
175+
176+
subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms
177+
subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms
178+
subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms
179+
subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms
180+
subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms
181+
182+
and as shown for global values:
183+
184+
::
185+
186+
perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms
187+
perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms
188+
perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms
189+
perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms
190+
perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms
191+
192+
Since they are reported to StatsMgr they may be fetched using the commands :isccmd:`statistic-get-all`
193+
or :isccmd:`statistic-get`.
100194

101195
Alarms
102196
~~~~~~
103197

104-
Alarms may be defined to watch specific durations. Each alarm defines a high-water mark and a
105-
low-water mark. When the reported average value for duration exceeds the high-water mark, a
106-
WARN level alarm log will be emitted at which point the alarm is considered "triggered". Once
107-
triggered the WARN level log will be repeated at a specified alarm report interval as long the
108-
reported average for the duration remains above the low-water mark. Once the average falls
109-
below the low-water mark the alarm is "cleared" and an INFO level log will be emitted.
198+
Alarms may be defined to watch specific durations. Each alarm defines a high-water mark,
199+
``high-water-ms`` and a low-water mark, ``low-water-ms``. When the reported average value
200+
for duration exceeds the high-water mark, a WARN level alarm log will be emitted at which
201+
point the alarm is considered "triggered". Once triggered the WARN level log will be
202+
repeated at a specified alarm report interval, ``alarm-report-secs`` as long the reported
203+
average for the duration remains above the low-water mark. Once the average falls below the
204+
low-water mark the alarm is "cleared" and an INFO level log will be emitted.
205+
206+
The alarm triggered WARN log will look similar to the following:
207+
208+
::
209+
210+
2024-03-20 10:22:14.030 WARN [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_TRIGGERED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been triggered since 2024-03-20 10:18:20.070000, reported average duration 00:00:00.700000 exceeds high-water-ms: 500
211+
212+
213+
and he alarm cleared INFO log will look similar to the following:
214+
215+
::
110216

111-
.. Note:
112-
Alarms are not yet functional.
217+
2024-03-20 10:30:14.030 INFO [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_CLEARED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been cleared, reported average duration 00:00:00.010000 is now below low-water-ms: 25
113218

114219
API Commands
115220
~~~~~~~~~~~~
@@ -153,7 +258,7 @@ An example of the anticipated configuration is shown below:
153258
Where:
154259

155260
* enable-monitoring
156-
Enables event data aggregation for reporting, statisitcs, and alarms. Defaults to false.
261+
Enables event data aggregation for reporting, statistics, and alarms. Defaults to false.
157262
* interval-width-secs
158263
The amount of time, in seconds, that individual task durations are accumulated into an
159264
aggregate before it is reported. Default is 60 seconds.
@@ -167,7 +272,7 @@ Where:
167272
defined by the following:
168273

169274
* duration-key
170-
Idenitifies the monitored duration to watch
275+
Identifies the monitored duration to watch
171276

172277
* query-type - Message type of the client query (e.g.DHCPDISCOVER, DHCPV6_REQUEST)
173278
* response-type - Message type of the server response (e.g. DHCPOFFER, DHCPV6_REPLY)

src/hooks/dhcp/perfmon/perfmon_messages.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ extern const isc::log::MessageID PERFMON_INIT_OK = "PERFMON_INIT_OK";
2020
namespace {
2121

2222
const char* values[] = {
23-
"PERFMON_ALARM_CLEARED", "Alarm for %1 has been cleared, reported average duration %2 is now beloe low-water-ms: %3",
23+
"PERFMON_ALARM_CLEARED", "Alarm for %1 has been cleared, reported average duration %2 is now below low-water-ms: %3",
2424
"PERFMON_ALARM_TRIGGERED", "Alarm for %1 has been triggered since %2, reported average duration %3 exceeds high-water-ms: %4",
2525
"PERFMON_DEINIT_FAILED", "unloading PerfMon hooks library failed: %1",
2626
"PERFMON_DEINIT_OK", "unloading PerfMon hooks library successful",

0 commit comments

Comments
 (0)