Skip to content

Gaps in Cortex Metrics in Grafana Dashboards for Time Intervals longer than 6 Hours #7045

@alexinfoblox

Description

@alexinfoblox

We are running Cortex in a dedicated EKS cluster.
More than 70 other clusters send their metrics to this Cortex instance.
Each cluster’s Grafana is configured to query Cortex for data visualization.

For the past couple of months, we have been observing gaps in Grafana panels for time ranges longer than 6 hours (this only has been observed for our biggest tenant - around 27.7 Mil series).

Image Image Image

There are no missing metrics — all data is successfully received by Cortex.
It appears the issue is related to metrics caching.
We’ve noticed that restarting the Memcached frontend resolves the problem temporarily — after the restart, the gaps disappear.

Memcached-fronted config:

    query_range:
      cache_results: true
      results_cache:
        cache:
          memcached_client:
            host: cortex-infra-memcached-frontend.cortex-infra.svc.cluster.local
            timeout: 3s
            max_idle_conns: 200

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions