Describe the bug
GET /api/clusters/{clusterName}/brokers/{id}/metrics returns HTTP 404 whenever the broker's Prometheus exposition contains even a single NaN (or Infinity) value, instead of returning the metrics that are available.
The frontend then shows the broker's Metrics tab as empty / errored.
This is the same outward symptom as #1630 — but with a deterministic, easily reproducible root cause that #1630 did not pin down. (#1630 also reports the silent variant where the response body is empty with HTTP 200; the underlying mapping path is the same.)
Root cause
io.kafbat.ui.mapper.ClusterMapper#convert(Stream<MetricSnapshot>) does:
.value(BigDecimal.valueOf(readPointValue(p)))
BigDecimal.valueOf(double) throws NumberFormatException("Infinite or NaN") on Double.NaN / Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY.
BrokersController#getBrokersMetrics catches all errors with:
.onErrorReturn(ResponseEntity.notFound().build())
So a single NaN data point anywhere in the broker's metric stream collapses the whole response to 404. Nothing is logged.
How NaN ends up in the exposition
JMX-Prometheus exporter (Strimzi's metricsConfig.type: jmxPrometheusExporter, MSK Open Monitoring, anything using io.prometheus.jmx) legitimately emits NaN for *_avg / *_max Kafka sensors when the underlying meter has never been hit. Concretely, on a fresh Strimzi broker we see ~24 NaN data points like:
kafka_server_socket_server_metrics_reauthentication_latency_avg{listener="PLAIN-9092",networkProcessor="3"} NaN
kafka_server_socket_server_metrics_reauthentication_latency_max{listener="REPLICATION-9091",networkProcessor="0"} NaN
kafka_server_socket_server_metrics_request_size_avg{listener="TLS-9093",networkProcessor="6"} NaN
...
Operators have no realistic way to make these non-NaN — the broker has simply never observed a reauthentication or a request on that listener. Today, that means the broker's Metrics tab never works.
Steps to reproduce
- Run a Kafka cluster whose Prometheus endpoint produces at least one
NaN data point. The simplest reproduction is Strimzi with jmxPrometheusExporter and the default JMX ruleset, but any cluster with unused listeners/sensors works.
- Configure kafbat-ui:
KAFKA_CLUSTERS_0_METRICS_TYPE: PROMETHEUS
KAFKA_CLUSTERS_0_METRICS_PORT: 9404 # 11001 for MSK Open Monitoring
- Verify the endpoint serves data:
wget -qO- http://broker-0:9404/metrics | grep -c ' NaN$'
# > 0
- Open the broker detail → Metrics tab, or curl
/api/clusters/<name>/brokers/0/metrics.
Expected behavior
Finite data points are returned; NaN/Infinity points are dropped (or otherwise represented in a JSON-safe way). A single bad point should not nuke the whole response.
Actual behavior
HTTP 404 Not Found (or empty body in some Spring/Reactor configurations — see #1630). No log line.
Environment
Fix
PR will follow this issue. The minimal fix is to filter non-finite data points in ClusterMapper#convert before they reach BigDecimal.valueOf. A separate concern — the silent onErrorReturn(notFound()) in BrokersController swallowing all errors — is intentionally left out of scope for this issue/PR.
Describe the bug
GET /api/clusters/{clusterName}/brokers/{id}/metricsreturns HTTP 404 whenever the broker's Prometheus exposition contains even a singleNaN(orInfinity) value, instead of returning the metrics that are available.The frontend then shows the broker's Metrics tab as empty / errored.
This is the same outward symptom as #1630 — but with a deterministic, easily reproducible root cause that #1630 did not pin down. (#1630 also reports the silent variant where the response body is empty with HTTP 200; the underlying mapping path is the same.)
Root cause
io.kafbat.ui.mapper.ClusterMapper#convert(Stream<MetricSnapshot>)does:BigDecimal.valueOf(double)throwsNumberFormatException("Infinite or NaN")onDouble.NaN/Double.POSITIVE_INFINITY/Double.NEGATIVE_INFINITY.BrokersController#getBrokersMetricscatches all errors with:So a single NaN data point anywhere in the broker's metric stream collapses the whole response to 404. Nothing is logged.
How NaN ends up in the exposition
JMX-Prometheus exporter (Strimzi's
metricsConfig.type: jmxPrometheusExporter, MSK Open Monitoring, anything usingio.prometheus.jmx) legitimately emitsNaNfor*_avg/*_maxKafka sensors when the underlying meter has never been hit. Concretely, on a fresh Strimzi broker we see ~24 NaN data points like:Operators have no realistic way to make these non-NaN — the broker has simply never observed a reauthentication or a request on that listener. Today, that means the broker's Metrics tab never works.
Steps to reproduce
NaNdata point. The simplest reproduction is Strimzi withjmxPrometheusExporterand the default JMX ruleset, but any cluster with unused listeners/sensors works./api/clusters/<name>/brokers/0/metrics.Expected behavior
Finite data points are returned; NaN/Infinity points are dropped (or otherwise represented in a JSON-safe way). A single bad point should not nuke the whole response.
Actual behavior
HTTP 404 Not Found(or empty body in some Spring/Reactor configurations — see #1630). No log line.Environment
ghcr.io/kafbat/kafka-ui:v1.5.0(also reproducible againstmain— sameClusterMappercode path)jmxPrometheusExporter, KRaftFix
PR will follow this issue. The minimal fix is to filter non-finite data points in
ClusterMapper#convertbefore they reachBigDecimal.valueOf. A separate concern — the silentonErrorReturn(notFound())inBrokersControllerswallowing all errors — is intentionally left out of scope for this issue/PR.