Skip to content

Test fails at the end of execution when api server is temporary down #6783

@thradec

Description

@thradec

Describe the bug

The test workflow fails, when during its execution, the testkube-api-server is temporary un-available, for example when the node where it is scheduled is evicted. The error message in log is following:

failed to run post process function: failed to create internal storage for metrics: failed to connect with Cloud: connection error: desc = "transport: error while dialing: dial tcp 172.20.144.180:8089: connect: connection refused": failed to connect with Cloud: connection error: desc = "transport: error while dialing: dial tcp 172.20.144.180:8089: connect: connection refused"�

Expected behavior

The best would be if the api server can run in high availability configuration, but I understand that this can be hard to achive. It would help, if the testworflow-init be more resilient and implement some retry mechanism idealy with backoff.

Version / Cluster

  • testkube version: 2.1.166 (standalone agent)
  • kubernetes version: eks 1.33

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐛Something is not working as should be

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions