Skip to content

Replace EnableNativeHistograms from TSDB config with PerTenant Limit #6718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

PaurushGarg
Copy link
Contributor

@PaurushGarg PaurushGarg commented Apr 23, 2025

Replace EnableNativeHistograms from TSDB config with EnableNativeHistogramPerUser Limit

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…nableNativeHistogramPerUser limit

Signed-off-by: Paurush Garg <[email protected]>
@@ -204,7 +201,6 @@ func (cfg *TSDBConfig) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.MaxExemplars, "blocks-storage.tsdb.max-exemplars", 0, "Deprecated, use maxExemplars in limits instead. If the MaxExemplars value in limits is set to zero, cortex will fallback on this value. This setting enables support for exemplars in TSDB and sets the maximum number that will be stored. 0 or less means disabled.")
f.BoolVar(&cfg.MemorySnapshotOnShutdown, "blocks-storage.tsdb.memory-snapshot-on-shutdown", false, "True to enable snapshotting of in-memory TSDB data on disk when shutting down.")
f.Int64Var(&cfg.OutOfOrderCapMax, "blocks-storage.tsdb.out-of-order-cap-max", tsdb.DefaultOutOfOrderCapMax, "[EXPERIMENTAL] Configures the maximum number of samples per chunk that can be out-of-order.")
f.BoolVar(&cfg.EnableNativeHistograms, "blocks-storage.tsdb.enable-native-histograms", false, "[EXPERIMENTAL] True to enable native histogram.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change. But since this feature is experimental can we remove it and simplify the configurations?

@yeya24 @alanprot what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still a breaking change if we keep the configuration name the same but move it to per tenant runtime config? I guess we don't have to rename it just to add _per_user suffix

That way existing users won't be impacted as it is allowed to be set globally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean keep the same CLI flag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It is still kind of breaking as it moves from tsdb to limit section in the config file. But I guess it is fine for an experimental feature as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated to keep the same config name.

@@ -257,6 +258,7 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&l.MaxLocalSeriesPerMetric, "ingester.max-series-per-metric", 50000, "The maximum number of active series per metric name, per ingester. 0 to disable.")
f.IntVar(&l.MaxGlobalSeriesPerUser, "ingester.max-global-series-per-user", 0, "The maximum number of active series per user, across the cluster before replication. 0 to disable. Supported only if -distributor.shard-by-all-labels is true.")
f.IntVar(&l.MaxGlobalSeriesPerMetric, "ingester.max-global-series-per-metric", 0, "The maximum number of active series per metric name, across the cluster before replication. 0 to disable.")
f.BoolVar(&l.EnableNativeHistogramPerUser, "ingester.enable_native_histogram_per_user", false, "Flag to enable NativeHistograms per user.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the Experimental tag?

Copy link
Contributor Author

@PaurushGarg PaurushGarg Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added experimental tag.

EnableNativeHistograms: i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms,
EnableOOONativeHistograms: true,
EnableOverlappingCompaction: false, // Always let compactors handle overlapped blocks, e.g. OOO blocks.
EnableNativeHistograms: true, // Always enable Native Histograms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain here that the gate keeping is done though a per tenant config at ingestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added.

@@ -1347,7 +1347,7 @@ func (i *Ingester) Push(ctx context.Context, req *cortexpb.WriteRequest) (*corte
return nil, wrapWithUser(err, userID)
}

if i.cfg.BlocksStorageConfig.TSDB.EnableNativeHistograms {
if i.limits.EnableNativeHistogramPerUser(userID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to throw an exception here similar to Prometheus? Wdyt?

https://github.com/prometheus/prometheus/blob/main/tsdb/head_append.go#L647

Try ingesting some native histogram sample into current version of cortex with the NH feature disabled and see what is the behaviour.

Copy link
Contributor Author

@PaurushGarg PaurushGarg Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current Cortex behaviour is that it count it as discarded sample here and not throw error here. But I m still checking by running Cortex.

Signed-off-by: Paurush Garg <[email protected]>
@PaurushGarg PaurushGarg requested a review from harry671003 April 24, 2025 12:04
@PaurushGarg PaurushGarg requested a review from yeya24 April 25, 2025 14:01
@PaurushGarg PaurushGarg changed the title Replace EnableNativeHistograms from TSDB config with EnableNativeHistogramPerUser Limit Replace EnableNativeHistograms from TSDB config with PerTenant Limit Apr 25, 2025
@PaurushGarg PaurushGarg marked this pull request as ready for review April 25, 2025 14:30
@dosubot dosubot bot added the storage/blocks Blocks storage engine label Apr 25, 2025
Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 28, 2025
Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -6839,7 +6850,7 @@ func TestIngester_UpdateLabelSetMetrics(t *testing.T) {
require.NoError(t, os.Mkdir(chunksDir, os.ModePerm))
require.NoError(t, os.Mkdir(blocksDir, os.ModePerm))

i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, reg, false)
i, err := prepareIngesterWithBlocksStorageAndLimits(t, cfg, limits, tenantLimits, blocksDir, reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case for when native histogram is disabled and push should increment the samples discarded metrics?
If not, can you add one to TestIngester_Push?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is awesome!

@yeya24 yeya24 merged commit 39761ee into cortexproject:master Apr 30, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size/M storage/blocks Blocks storage engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants