Skip to content

Maybe can set flag for breaking point pod? #9283

@lcgash

Description

@lcgash

Feature request: Add annotation to pod when hitting debug breakpoint
When using the debug capability to set breakpoints for pod troubleshooting, the pod will be stuck at the breakpoint and cannot continue to execute the original business tasks. From the product layer, it is impossible to directly distinguish whether the task execution failure is caused by business logic errors or being paused by debug breakpoints.
Therefore, we hope to add a feature: automatically add a specific annotation to the pod when the pod enters the debug breakpoint state, and automatically remove the annotation when the breakpoint is released or the debug session ends. This annotation can help the product layer quickly identify the root cause of task suspension, avoid misjudgment of business failures, and facilitate operation and maintenance personnel to locate the running state of the pod.
Use case
Product layer failure monitoring scenarioOur platform has a task health check module that periodically detects whether the pod's business tasks are running normally. When a pod is stuck at a debug breakpoint, the health check module will consider the task as "execution failed" and trigger an alarm. After adding the breakpoint annotation (e.g. debug.xxx.io/breakpoint-active: "true"), the health check module can filter out pods with this annotation, mark them as "debug paused" instead of "business failure", and suspend unnecessary alarms, reducing the operation and maintenance team's invalid troubleshooting work.
Debug session management scenarioOperation and maintenance personnel often need to check which pods are currently in the debug breakpoint state. By querying the annotation, they can quickly obtain the list of paused pods through the command kubectl get pods -l debug.xxx.io/breakpoint-active=true without logging into each node to check the debug process status, improving the efficiency of debug management.
Task recovery guarantee scenarioAfter the debug is completed, the automatic removal of the annotation can be used as a sign that the pod has returned to the normal running state. The product layer can rely on the disappearance of this annotation to trigger the task recovery process, ensuring that the business can resume operation in a timely manner after debugging.

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions