Skip to content

Test dashboard API metrics endpoint error handling of BYO CNI clusters - Test Release 2.31 #8063

@mgoltzsche

Description

@mgoltzsche

Summary

There was a bug within the Dashboard API where the user cluster metrics endpoints returned a 503 response and an corresponding error notification was shown within the Dashboard UI periodically (every ~10 seconds) when browsing the details view of a user cluster that was created with CNI set to none (BYO CNI mode) but no CNI was set up for the cluster manually.
As a consequence the API error rate limit increased, eventually triggering the KubermaticAPITooManyErrors alert.
(The description of the PR provides screenshots illustrating all of that.)

The unavailability is of the metrics endpoints is actually expected in that case as long as the user didn't set up a CNI provider herself.
Therefore the fix was to let the metrics endpoint return a 200 status with empty metrics in that case.

The goal of this ticket is to verify that the fix works as expected.

Testing scope:

Related:

Type of Testing

  • New Feature
  • Bug Fix / Regression
  • UI/UX
  • Performance
  • Security / Permissions (RBAC)
  • Upgrade / Migration
  • Other:

Prerequisites

Environment:

  • Provider Specific Setup (If Applicable):

Test Scenarios

  • Scenario 1: Unavailable BYO CNI user cluster
    • Steps:
      1. Create a user cluster with BYO CNI (select none as CNI provider).
      2. Don't install the CNI and let the user cluster worker nodes be in a "NotReady" state.
      3. Browse the detail view of that user cluster within the KKP Dashboard UI.
    • Expected:
      • The "the server is currently unable to handle the request" notification should not show up within the KKP Dashboard UI anymore when browsing the detail view of that user cluster.
      • The KKP Dashboard API server logs the error as a warning (not error) periodically: "the server is currently unable to handle the request".
      • The kubermatic-api request error rate should not increase, even if more of those BYO CNI clusters are created and their dashboard views opened in several browser tabs in parallel. The error rate (sum(rate(http_requests_total{app_kubernetes_io_name="kubermatic-api",code=~"5.."}[5m]))) can be observed within prometheus (for comparison, see screenshot within PR description).

Edge Cases & Boundary Conditions

  • [ ]

Screenshots / Attachments

Acceptance Criteria

  • [ ]

Notes

Test Environment

  • UI Version:
  • API Version:
  • K8s Version:
  • Provider:
  • Browser & OS:
  • Domain:

Metadata

Metadata

Assignees

No one assigned

    Labels

    sig/uiDenotes a PR or issue as being assigned to SIG UI.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions