Skip to content

Replace EnableNativeHistograms from TSDB config with PerTenant Limit #6718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Changelog

## master / unreleased
* [CHANGE] Ingester: Remove EnableNativeHistograms config flag and instead gate keep through new per-tenant limit at ingestion. #6718
* [CHANGE] StoreGateway/Alertmanager: Add default 5s connection timeout on client. #6603
* [FEATURE] Query Frontend: Add dynamic interval size for query splitting. This is enabled by configuring experimental flags `querier.max-shards-per-query` and/or `querier.max-fetched-data-duration-per-query`. The split interval size is dynamically increased to maintain a number of shards and total duration fetched below the configured values. #6458
* [FEATURE] Querier/Ruler: Add `query_partial_data` and `rules_partial_data` limits to allow queries/rules to be evaluated with data from a single zone, if other zones are not available. #6526
Expand Down
4 changes: 0 additions & 4 deletions docs/blocks-storage/querier.md
Original file line number Diff line number Diff line change
Expand Up @@ -1561,10 +1561,6 @@ blocks_storage:
# CLI flag: -blocks-storage.tsdb.out-of-order-cap-max
[out_of_order_cap_max: <int> | default = 32]

# [EXPERIMENTAL] True to enable native histogram.
# CLI flag: -blocks-storage.tsdb.enable-native-histograms
[enable_native_histograms: <boolean> | default = false]

# [EXPERIMENTAL] If enabled, ingesters will cache expanded postings when
# querying blocks. Caching can be configured separately for the head and
# compacted blocks.
Expand Down
4 changes: 0 additions & 4 deletions docs/blocks-storage/store-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -1679,10 +1679,6 @@ blocks_storage:
# CLI flag: -blocks-storage.tsdb.out-of-order-cap-max
[out_of_order_cap_max: <int> | default = 32]

# [EXPERIMENTAL] True to enable native histogram.
# CLI flag: -blocks-storage.tsdb.enable-native-histograms
[enable_native_histograms: <boolean> | default = false]

# [EXPERIMENTAL] If enabled, ingesters will cache expanded postings when
# querying blocks. Caching can be configured separately for the head and
# compacted blocks.
Expand Down
8 changes: 4 additions & 4 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2130,10 +2130,6 @@ tsdb:
# CLI flag: -blocks-storage.tsdb.out-of-order-cap-max
[out_of_order_cap_max: <int> | default = 32]

# [EXPERIMENTAL] True to enable native histogram.
# CLI flag: -blocks-storage.tsdb.enable-native-histograms
[enable_native_histograms: <boolean> | default = false]

# [EXPERIMENTAL] If enabled, ingesters will cache expanded postings when
# querying blocks. Caching can be configured separately for the head and
# compacted blocks.
Expand Down Expand Up @@ -3516,6 +3512,10 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# [max_series]
[limits_per_label_set: <list of LimitsPerLabelSet> | default = []]

# [EXPERIMENTAL] True to enable native histogram.
# CLI flag: -blocks-storage.tsdb.enable-native-histograms
[enable_native_histograms: <boolean> | default = false]

# The maximum number of active metrics with metadata per user, per ingester. 0
# to disable.
# CLI flag: -ingester.max-metadata-per-user
Expand Down
10 changes: 5 additions & 5 deletions pkg/ingester/ingester.go
Original file line number Diff line number Diff line change
Expand Up @@ -1347,7 +1347,7 @@ func (i *Ingester) Push(ctx context.Context, req *cortexpb.WriteRequest) (*corte
return nil, wrapWithUser(err, userID)
}

if i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms {
if i.limits.EnableNativeHistograms(userID) {
for _, hp := range ts.Histograms {
var (
err error
Expand Down Expand Up @@ -1494,7 +1494,7 @@ func (i *Ingester) Push(ctx context.Context, req *cortexpb.WriteRequest) (*corte
i.validateMetrics.DiscardedSamples.WithLabelValues(perLabelsetSeriesLimit, userID).Add(float64(perLabelSetSeriesLimitCount))
}

if !i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms && discardedNativeHistogramCount > 0 {
if !i.limits.EnableNativeHistograms(userID) && discardedNativeHistogramCount > 0 {
i.validateMetrics.DiscardedSamples.WithLabelValues(nativeHistogramSample, userID).Add(float64(discardedNativeHistogramCount))
}

Expand Down Expand Up @@ -2451,9 +2451,9 @@ func (i *Ingester) createTSDB(userID string) (*userTSDB, error) {
EnableMemorySnapshotOnShutdown: i.cfg.BlocksStorageConfig.TSDB.MemorySnapshotOnShutdown,
OutOfOrderTimeWindow: time.Duration(oooTimeWindow).Milliseconds(),
OutOfOrderCapMax: i.cfg.BlocksStorageConfig.TSDB.OutOfOrderCapMax,
EnableOOONativeHistograms: i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms, // Automatically enabled when EnableNativeHistograms is true.
EnableOverlappingCompaction: false, // Always let compactors handle overlapped blocks, e.g. OOO blocks.
EnableNativeHistograms: i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms,
EnableOOONativeHistograms: true,
EnableOverlappingCompaction: false, // Always let compactors handle overlapped blocks, e.g. OOO blocks.
EnableNativeHistograms: true, // Always enable Native Histograms. Gate keeping is done though a per-tenant limit at ingestion.
BlockChunkQuerierFunc: i.blockChunkQuerierFunc(userID),
}, nil)
if err != nil {
Expand Down
43 changes: 27 additions & 16 deletions pkg/ingester/ingester_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ func seriesSetFromResponseStream(s *mockQueryStreamServer) (storage.SeriesSet, e

func TestMatcherCache(t *testing.T) {
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
userID := "1"
tenantLimits := newMockTenantLimits(map[string]*validation.Limits{userID: &limits})
registry := prometheus.NewRegistry()
Expand All @@ -135,7 +136,7 @@ func TestMatcherCache(t *testing.T) {
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))
cfg := defaultIngesterTestConfig(t)
cfg.MatchersCacheMaxItems = 50
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, registry, true)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))

Expand Down Expand Up @@ -204,7 +205,7 @@ func TestIngesterDeletionRace(t *testing.T) {
require.NoError(t, os.Mkdir(chunksDir, os.ModePerm))
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, registry, false)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
defer services.StopAndAwaitTerminated(context.Background(), ing) //nolint:errcheck
Expand Down Expand Up @@ -254,6 +255,7 @@ func TestIngesterDeletionRace(t *testing.T) {

func TestIngesterPerLabelsetLimitExceeded(t *testing.T) {
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
userID := "1"
registry := prometheus.NewRegistry()

Expand Down Expand Up @@ -287,7 +289,7 @@ func TestIngesterPerLabelsetLimitExceeded(t *testing.T) {
require.NoError(t, os.Mkdir(chunksDir, os.ModePerm))
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, tenantLimits, blocksDir, registry, true)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, tenantLimits, blocksDir, registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
// Wait until it's ACTIVE
Expand Down Expand Up @@ -630,7 +632,7 @@ func TestIngesterPerLabelsetLimitExceeded(t *testing.T) {
// Should persist between restarts
services.StopAndAwaitTerminated(context.Background(), ing) //nolint:errcheck
registry = prometheus.NewRegistry()
ing, err = prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, tenantLimits, blocksDir, registry, true)
ing, err = prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, tenantLimits, blocksDir, registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
ing.updateActiveSeries(ctx)
Expand Down Expand Up @@ -661,6 +663,7 @@ func TestIngesterPerLabelsetLimitExceeded(t *testing.T) {
func TestPushRace(t *testing.T) {
cfg := defaultIngesterTestConfig(t)
l := defaultLimitsTestConfig()
l.EnableNativeHistograms = true
cfg.LabelsStringInterningEnabled = true
cfg.LifecyclerConfig.JoinAfter = 0

Expand All @@ -686,7 +689,7 @@ func TestPushRace(t *testing.T) {
blocksDir := filepath.Join(dir, "blocks")
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, l, nil, blocksDir, prometheus.NewRegistry(), true)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, l, nil, blocksDir, prometheus.NewRegistry())
require.NoError(t, err)
defer services.StopAndAwaitTerminated(context.Background(), ing) //nolint:errcheck
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
Expand Down Expand Up @@ -747,6 +750,7 @@ func TestPushRace(t *testing.T) {

func TestIngesterUserLimitExceeded(t *testing.T) {
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
limits.MaxLocalSeriesPerUser = 1
limits.MaxLocalMetricsWithMetadataPerUser = 1

Expand Down Expand Up @@ -778,7 +782,7 @@ func TestIngesterUserLimitExceeded(t *testing.T) {
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

blocksIngesterGenerator := func(reg prometheus.Registerer) *Ingester {
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, nil, blocksDir, reg, true)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, nil, blocksDir, reg)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
// Wait until it's ACTIVE
Expand Down Expand Up @@ -878,6 +882,7 @@ func benchmarkData(nSeries int) (allLabels []labels.Labels, allSamples []cortexp

func TestIngesterMetricLimitExceeded(t *testing.T) {
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
limits.MaxLocalSeriesPerMetric = 1
limits.MaxLocalMetadataPerMetric = 1

Expand Down Expand Up @@ -909,7 +914,7 @@ func TestIngesterMetricLimitExceeded(t *testing.T) {
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

blocksIngesterGenerator := func(reg prometheus.Registerer) *Ingester {
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, nil, blocksDir, reg, true)
ing, err := prepareIngesterWithBlocksStorageAndLimits(t, defaultIngesterTestConfig(t), limits, nil, blocksDir, reg)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), ing))
// Wait until it's ACTIVE
Expand Down Expand Up @@ -1933,6 +1938,7 @@ func TestIngester_Push(t *testing.T) {
cfg.ActiveSeriesMetricsEnabled = !testData.disableActiveSeries

limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = !testData.disableNativeHistogram
limits.MaxExemplars = testData.maxExemplars
limits.OutOfOrderTimeWindow = model.Duration(testData.oooTimeWindow)
limits.LimitsPerLabelSet = []validation.LimitsPerLabelSet{
Expand All @@ -1945,7 +1951,7 @@ func TestIngester_Push(t *testing.T) {
Hash: 1,
},
}
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, "", registry, !testData.disableNativeHistogram)
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, "", registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), i))
defer services.StopAndAwaitTerminated(context.Background(), i) //nolint:errcheck
Expand Down Expand Up @@ -2174,7 +2180,8 @@ func TestIngester_PushNativeHistogramErrors(t *testing.T) {
cfg.LifecyclerConfig.JoinAfter = 0

limits := defaultLimitsTestConfig()
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, "", registry, true)
limits.EnableNativeHistograms = true
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, "", registry)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), i))
defer services.StopAndAwaitTerminated(context.Background(), i) //nolint:errcheck
Expand Down Expand Up @@ -2662,6 +2669,7 @@ func Benchmark_Ingester_PushOnError(b *testing.B) {
cfg.LifecyclerConfig.JoinAfter = 0

limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
if !testData.prepareConfig(&limits, instanceLimits) {
b.SkipNow()
}
Expand All @@ -2670,7 +2678,7 @@ func Benchmark_Ingester_PushOnError(b *testing.B) {
return instanceLimits
}

ingester, err := prepareIngesterWithBlocksStorageAndLimits(b, cfg, limits, nil, "", registry, true)
ingester, err := prepareIngesterWithBlocksStorageAndLimits(b, cfg, limits, nil, "", registry)
require.NoError(b, err)
require.NoError(b, services.StartAndAwaitRunning(context.Background(), ingester))
defer services.StopAndAwaitTerminated(context.Background(), ingester) //nolint:errcheck
Expand Down Expand Up @@ -3947,10 +3955,12 @@ func mockHistogramWriteRequest(t *testing.T, lbls labels.Labels, value int64, ti
}

func prepareIngesterWithBlocksStorage(t testing.TB, ingesterCfg Config, registerer prometheus.Registerer) (*Ingester, error) {
return prepareIngesterWithBlocksStorageAndLimits(t, ingesterCfg, defaultLimitsTestConfig(), nil, "", registerer, true)
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
return prepareIngesterWithBlocksStorageAndLimits(t, ingesterCfg, limits, nil, "", registerer)
}

func prepareIngesterWithBlocksStorageAndLimits(t testing.TB, ingesterCfg Config, limits validation.Limits, tenantLimits validation.TenantLimits, dataDir string, registerer prometheus.Registerer, nativeHistograms bool) (*Ingester, error) {
func prepareIngesterWithBlocksStorageAndLimits(t testing.TB, ingesterCfg Config, limits validation.Limits, tenantLimits validation.TenantLimits, dataDir string, registerer prometheus.Registerer) (*Ingester, error) {
// Create a data dir if none has been provided.
if dataDir == "" {
dataDir = t.TempDir()
Expand All @@ -3966,7 +3976,6 @@ func prepareIngesterWithBlocksStorageAndLimits(t testing.TB, ingesterCfg Config,
ingesterCfg.BlocksStorageConfig.TSDB.Dir = dataDir
ingesterCfg.BlocksStorageConfig.Bucket.Backend = "filesystem"
ingesterCfg.BlocksStorageConfig.Bucket.Filesystem.Directory = bucketDir
ingesterCfg.BlocksStorageConfig.TSDB.EnableNativeHistograms = nativeHistograms

ingester, err := New(ingesterCfg, overrides, registerer, log.NewNopLogger(), nil)
if err != nil {
Expand Down Expand Up @@ -6432,15 +6441,16 @@ func TestIngester_MaxExemplarsFallBack(t *testing.T) {
dir := t.TempDir()
blocksDir := filepath.Join(dir, "blocks")
limits := defaultLimitsTestConfig()
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, blocksDir, prometheus.NewRegistry(), true)
limits.EnableNativeHistograms = true
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, blocksDir, prometheus.NewRegistry())
require.NoError(t, err)

maxExemplars := i.getMaxExemplars("someTenant")
require.Equal(t, maxExemplars, int64(2))

// set max exemplars value in limits, and re-initialize the ingester
limits.MaxExemplars = 5
i, err = prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, blocksDir, prometheus.NewRegistry(), true)
i, err = prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, nil, blocksDir, prometheus.NewRegistry())
require.NoError(t, err)

// validate this value is picked up now
Expand Down Expand Up @@ -6815,6 +6825,7 @@ func TestIngester_UpdateLabelSetMetrics(t *testing.T) {
cfg.BlocksStorageConfig.TSDB.BlockRanges = []time.Duration{2 * time.Hour}
reg := prometheus.NewRegistry()
limits := defaultLimitsTestConfig()
limits.EnableNativeHistograms = true
userID := "1"
ctx := user.InjectOrgID(context.Background(), userID)

Expand All @@ -6839,7 +6850,7 @@ func TestIngester_UpdateLabelSetMetrics(t *testing.T) {
require.NoError(t, os.Mkdir(chunksDir, os.ModePerm))
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, reg, false)
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case for when native histogram is disabled and push should increment the samples discarded metrics?
If not, can you add one to TestIngester_Push?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), i))
defer services.StopAndAwaitTerminated(context.Background(), i) //nolint:errcheck
Expand Down
4 changes: 0 additions & 4 deletions pkg/storage/tsdb/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,6 @@ type TSDBConfig struct {
// OutOfOrderCapMax is maximum capacity for OOO chunks (in samples).
OutOfOrderCapMax int64 `yaml:"out_of_order_cap_max"`

// Enable native histogram ingestion.
EnableNativeHistograms bool `yaml:"enable_native_histograms"`

// Posting Cache Configuration for TSDB
PostingsCache TSDBPostingsCacheConfig `yaml:"expanded_postings_cache" doc:"description=[EXPERIMENTAL] If enabled, ingesters will cache expanded postings when querying blocks. Caching can be configured separately for the head and compacted blocks."`
}
Expand Down Expand Up @@ -204,7 +201,6 @@ func (cfg *TSDBConfig) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.MaxExemplars, "blocks-storage.tsdb.max-exemplars", 0, "Deprecated, use maxExemplars in limits instead. If the MaxExemplars value in limits is set to zero, cortex will fallback on this value. This setting enables support for exemplars in TSDB and sets the maximum number that will be stored. 0 or less means disabled.")
f.BoolVar(&cfg.MemorySnapshotOnShutdown, "blocks-storage.tsdb.memory-snapshot-on-shutdown", false, "True to enable snapshotting of in-memory TSDB data on disk when shutting down.")
f.Int64Var(&cfg.OutOfOrderCapMax, "blocks-storage.tsdb.out-of-order-cap-max", tsdb.DefaultOutOfOrderCapMax, "[EXPERIMENTAL] Configures the maximum number of samples per chunk that can be out-of-order.")
f.BoolVar(&cfg.EnableNativeHistograms, "blocks-storage.tsdb.enable-native-histograms", false, "[EXPERIMENTAL] True to enable native histogram.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change. But since this feature is experimental can we remove it and simplify the configurations?

@yeya24 @alanprot what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still a breaking change if we keep the configuration name the same but move it to per tenant runtime config? I guess we don't have to rename it just to add _per_user suffix

That way existing users won't be impacted as it is allowed to be set globally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean keep the same CLI flag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It is still kind of breaking as it moves from tsdb to limit section in the config file. But I guess it is fine for an experimental feature as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated to keep the same config name.


flagext.DeprecatedFlag(f, "blocks-storage.tsdb.wal-compression-enabled", "Deprecated (use blocks-storage.tsdb.wal-compression-type instead): True to enable TSDB WAL compression.", util_log.Logger)

Expand Down
Loading
Loading