[improvement](fe) Add virtual compute group switch metric#63036
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
57b917c to
d0cef2d
Compare
|
run buildall |
| protected static AutoMappedMetric<LongCounterMetric> CLUSTER_CLOUD_GLOBAL_BALANCE_NUM; | ||
| protected static AutoMappedMetric<LongCounterMetric> CLUSTER_CLOUD_SMOOTH_UPGRADE_BALANCE_NUM; | ||
| protected static AutoMappedMetric<LongCounterMetric> CLUSTER_CLOUD_WARM_UP_CACHE_BALANCE_NUM; | ||
| protected static AutoMappedMetric<LongCounterMetric> VIRTUAL_CLUSTER_SWITCH_COUNTER; |
There was a problem hiding this comment.
Done. Renamed the new metric/API/labels from virtual cluster terminology to virtual compute group terminology. The exposed metric is now doris_fe_virtual_compute_group_switch_total with *_compute_group_* labels.
| List<MetricLabel> labels = new ArrayList<>(); | ||
| counter.increase(1L); | ||
| labels.add(new MetricLabel("virtual_cluster_id", virtualClusterId)); | ||
| labels.add(new MetricLabel("virtual_cluster_name", virtualClusterName)); |
There was a problem hiding this comment.
what happen to a renamed compute group,
the existed metrics with wrong names seem never disappare?
There was a problem hiding this comment.
Fixed. The internal AutoMappedMetric key now uses virtual/src/dst compute group ids instead of names. When the same ids are reported with updated names, FE removes the old registered label series before setting the new labels, so renamed compute groups do not leave stale old-name metrics. Added MetricsTest.testVirtualComputeGroupSwitchMetricRename to cover this case.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add an FE cloud metric that records virtual compute group active-standby switch events. The metric key uses virtual/src/dst compute group ids so a compute group rename updates the exposed labels without leaving stale old-name series.
### Metric example
Prometheus output example:
```text
# HELP doris_fe_virtual_compute_group_switch_total virtual compute group active standby switch count
# TYPE doris_fe_virtual_compute_group_switch_total counter
doris_fe_virtual_compute_group_switch_total{virtual_compute_group_id="id1",virtual_compute_group_name="v_group_1",src_compute_group_id="id2",src_compute_group_name="p_group_1",dst_compute_group_id="id3",dst_compute_group_name="p_group_2"} 1
```
The metric value is the accumulated switch count for the labeled virtual compute group switch path.
### Release note
Add FE metric doris_fe_virtual_compute_group_switch_total for virtual compute group active-standby switches.
### Check List (For Author)
- Test:
- Unit Test: bash run-fe-ut.sh --run org.apache.doris.cloud.system.CloudSystemInfoServiceTest
- Unit Test: bash run-fe-ut.sh --run org.apache.doris.metric.MetricsTest
- Manual test: git diff --check
- FE checkstyle: bash -lc "export DORIS_HOME=$PWD && source env.sh && cd fe && ${MVN_CMD} -pl fe-core -DskipTests checkstyle:check"
- Behavior changed: Yes. Add a new FE metric for virtual compute group active-standby switches.
- Does this need documentation: No
d0cef2d to
034d322
Compare
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add an FE cloud metric that records virtual compute group active-standby switch events. The metric key uses virtual/src/dst compute group ids so a compute group rename updates the exposed labels without leaving stale old-name series.
Metric example
Prometheus output example:
The metric value is the accumulated switch count for the labeled virtual compute group switch path.
Release note
Add FE metric doris_fe_virtual_compute_group_switch_total for virtual compute group active-standby switches.
Check List (For Author)