Implement Request and Response Policy Based Routing in Cluster Mode #3422

ofekshenawa · 2025-06-30T12:13:18Z

This PR introduces support for Redis COMMAND-based request_policy and response_policy routing for Redis commands when used in OSS Cluster client.

Key Additions:

Command Policy Loader: Parses and caches COMMAND metadata with routing/aggregation tips on first use.
Routing Engine Enhancements:
Implements support for all request policies: default(keyless), default(hashslot), all_shards, all_nodes, multi_shard, and special.
Response Aggregator: Combines multi-shard replies based on response_policy:
all_succeeded, one_succeeded, agg_sum, special, etc.
Includes custom handling for special commands like FT.CURSOR.
Raw Command Support: Policies are enforced on Client.Do(ctx, args...).

feat(routing): add internal request/response policy enums

* feat: load the policy table in cluster client * Remove comments

…or osscluster.go (#6) * centralize cluster command routing in osscluster_router.go and refactor osscluster.go * enalbe ci on all branches * Add debug prints * Add debug prints * FIX: deal with nil policy * FIX: fixing clusterClient process * chore(osscluster): simplify switch case * wip(command): ai generated clone method for commands * feat: implement response aggregator for Redis cluster commands * feat: implement response aggregator for Redis cluster commands * fix: solve concurrency errors * fix: solve concurrency errors * return MaxRedirects settings * remove locks from getCommandPolicy * Handle MOVED errors more robustly, remove cluster reloading at exectutions, ennsure better routing * Fix: supports Process hook test * Fix: remove response aggregation for single shard commands * Add more preformant type conversion for Cmd type * Add router logic into processPipeline --------- Co-authored-by: Nedyalko Dyakov <[email protected]>

…ce-search-commands-to-shards

…ot be used in pipeline

htemelski-redis · 2025-09-25T07:23:19Z

osscluster_router.go

+		}
+		if result.cmd != nil && result.err == nil {
+			// For MGET, extract individual values from the array result
+			if strings.ToLower(cmd.Name()) == "mget" {


Do we actually need this special case?

htemelski-redis · 2025-09-25T07:24:42Z

osscluster_router.go

+}
+
+// getCommandPolicy retrieves the routing policy for a command
+func (c *ClusterClient) getCommandPolicy(ctx context.Context, cmd Cmder) *routing.CommandPolicy {


~~It seems like this will introduce a big overhead for each command execution.~~
We should fetch all policies during the connection handshake

Note: for the first stage we should use hard-coded policy manager that can be extended in the future to take into account the COMMAND command output

@htemelski-redis 💡 Consider implementing a PolicyResolverConfig type that users can override via the client options. This config should map module__command_name to metadata (policies, key requirements, etc.).

Set hardcoded defaults in the client options, but allow users to override policies per command as needed.

.github/workflows/build.yml

ndyakov

Submitting partial review for the aggregators.

internal/routing/aggregator.go

ndyakov · 2025-10-09T11:30:38Z

osscluster_router.go

+	// For MGET without policy, use keyed aggregator
+	if cmdName == "mget" {
+		return routing.NewDefaultAggregator(true)
+	}


Since we are passing the cmd.Name() in routing.NewResponseAggregator this can be handler by it. If policy is nil for mget, maybe the NewResponseAggregator can accept a policy and check the nil as well`.

internal/routing/aggregator.go

ndyakov · 2025-10-09T11:49:57Z

internal/routing/aggregator.go

+// SetAggregatorFunc allows setting custom aggregation logic for special commands.
+func (a *SpecialAggregator) SetAggregatorFunc(fn func([]interface{}, []error) (interface{}, error)) {
+	a.mu.Lock()
+	defer a.mu.Unlock()
+	a.aggregatorFunc = fn
+}
+
+// SpecialAggregatorRegistry holds custom aggregation functions for specific commands.
+var SpecialAggregatorRegistry = make(map[string]func([]interface{}, []error) (interface{}, error))
+
+// RegisterSpecialAggregator registers a custom aggregation function for a command.
+func RegisterSpecialAggregator(cmdName string, fn func([]interface{}, []error) (interface{}, error)) {
+	SpecialAggregatorRegistry[cmdName] = fn
+}
+
+// NewSpecialAggregator creates a special aggregator with command-specific logic if available.
+func NewSpecialAggregator(cmdName string) *SpecialAggregator {
+	agg := &SpecialAggregator{}
+	if fn, exists := SpecialAggregatorRegistry[cmdName]; exists {
+		agg.SetAggregatorFunc(fn)
+	}
+	return agg


SetAggregatorFunc is only used internally in this package, I assume it can be private if needed at all, see next comment.

internal/routing/aggregator.go

ndyakov

Submitting another partial review.

ndyakov · 2025-10-09T12:00:22Z

internal/routing/policy.go

+}
+
+func (p *CommandPolicy) CanBeUsedInPipeline() bool {
+	return p.Request != ReqAllNodes && p.Request != ReqAllShards && p.Request != ReqMultiShard


What about special? Can it be used in a pipeline?

My understanding is that special should be handled on a case-by-case basis

ndyakov · 2025-10-09T12:04:21Z

internal/routing/shard_picker.go

+// ShardPicker chooses “one arbitrary shard” when the request_policy is
+// ReqDefault and the command has no keys.
+type ShardPicker interface {
+	Next(total int) int // returns an index in [0,total)
+}


Those are great, can we implement StaticShardPicker or StickyShardPicker that will always return the same shard. I do think this can be helpful for testing. This is not a blocker by any means.

command.go

ndyakov · 2025-10-09T12:10:27Z

command.go

-	return strconv.ParseBool(cmd.val)
+	return strconv.ParseBool(cmd.Val())


why was this change needed?

Not sure, for consistency maybe?

command.go

ndyakov · 2025-10-09T12:32:07Z

command.go

+	if commandInfoTips != nil {
+		if v, ok := commandInfoTips[requestPolicy]; ok {
+			if p, err := routing.ParseRequestPolicy(v); err == nil {
+				req = p
+			}
+		}
+		if v, ok := commandInfoTips[responsePolicy]; ok {
+			if p, err := routing.ParseResponsePolicy(v); err == nil {
+				resp = p
+			}
+		}
+	}
+	tips := make(map[string]string, len(commandInfoTips))
+	for k, v := range commandInfoTips {
+		if k == requestPolicy || k == responsePolicy {
+			continue
+		}
+		tips[k] = v
+	}


can't we do both of those in a single range over commandInfoTips?

Not sure that I completely understand the question

command.go

ndyakov · 2025-10-09T12:42:55Z

json.go

 	return nil
 }

+func (cmd *IntPointerSliceCmd) Clone() Cmder {


it's tricky here. do we need to return the same pointer or do we only want the value when cloning?

osscluster.go

ndyakov

Final part of initial review

Overview:

Let's use atomics when possible.
Left questions related to the node selection and setting of values.

Overall the design of the solution looks good, would have to do an additional pass over the test files once this review is addressed.

Thank you both @ofekshenawa and @htemelski-redis!

osscluster_router.go

ndyakov · 2025-10-09T13:03:40Z

osscluster_router.go

+	if c.hasKeys(cmd) {
+		// execute on key based shard
+		return node.Client.Process(ctx, cmd)
+	}


Do we know that this node servers the slot for the key?

Yes, the node should've been selected based on the slot osscluster.go:L1906

func (c *ClusterClient) cmdNode(

ndyakov · 2025-10-09T13:04:38Z

osscluster_router.go

+		// execute on key based shard
+		return node.Client.Process(ctx, cmd)
+	}
+	return c.executeOnArbitraryShard(ctx, cmd)


since it doesn't matter and there is already some node selected, why not use it?

We have two different ways of picking an arbitrary shard, either round robin or a random one

Yes, I understand that, but for some reason there is already a node selected here that may have been selected because MOVED OR normal key based selection. Why do we have to reselect the node? Shouldn't this selection of arbitrary node be done outside, so we do the node selection only one time and the node on line #52 is the one that should be used for this command?

osscluster_router.go

ndyakov · 2025-10-09T13:17:53Z

osscluster_router.go

+			// Command executed successfully but value extraction failed
+			// This is common for complex commands like CLUSTER SLOTS
+			// The command already has its result set correctly, so just return


I do not understand that comment here. Why the value extraction returned nil? Can we make sure the cmd has value set at least? If it doesn't, we may return a cmd with nil value and nil error, which doesn't make sense.

ndyakov · 2025-10-09T13:19:53Z

osscluster_router.go

+		if c, ok := cmd.(*KeyValuesCmd); ok {
+			// KeyValuesCmd needs a key string and values slice
+			if key, ok := value.(string); ok {
+				c.SetVal(key, []string{}) // Default empty values
+			}
+		}
+	case CmdTypeZSliceWithKey:
+		if c, ok := cmd.(*ZSliceWithKeyCmd); ok {
+			// ZSliceWithKeyCmd needs a key string and Z slice
+			if key, ok := value.(string); ok {
+				c.SetVal(key, []Z{}) // Default empty Z slice
+			}


why are we setting empty values here?

No idea tbh, will look into it

…ve primitives

ofekshenawa and others added 7 commits May 14, 2025 21:35

feat(routing): add internal request/response policy enums

82a3433

Merge pull request #3 from ofekshenawa/define-policy-type

9e4369a

feat(routing): add internal request/response policy enums

feat: load the policy table in cluster client (#4)

74407a0

* feat: load the policy table in cluster client * Remove comments

modify Tips and command pplicy in commandInfo (#5)

f99c63b

Merge branch 'load-balance-search-commands-to-shards' into load-balan…

ed528f8

…ce-search-commands-to-shards

remove thread debugging code

43fcc67

ofekshenawa changed the title ~~Load balance search commands to shards~~ Implement Request and Response Policy Based Routing in Cluster Mode Jun 30, 2025

ofekshenawa added 4 commits July 4, 2025 16:05

remove thread debugging code && reject commands with policy that cann…

7eb3818

…ot be used in pipeline

refactor processPipline and cmdType enum

de344fd

remove FDescribe from cluster tests

57cdd32

Add tests

04a110a

ofekshenawa requested review from bobymicroby, htemelski-redis and ndyakov July 6, 2025 10:28

ofekshenawa added 4 commits July 6, 2025 14:44

fix aggregation test

f1c7f62

fix mget test

e0b122a

fix mget test

a2ffd62

remove aggregateKeyedResponses

c00bd81

ofekshenawa marked this pull request as ready for review July 6, 2025 12:54

htemelski-redis requested changes Sep 25, 2025

View reviewed changes

htemelski-redis added 2 commits October 8, 2025 09:24

added scaffolding for the req-resp manager

de1b16c

added default policies for the search commands

1b2eaa6

htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from 6e3b627 to 1b2eaa6 Compare October 8, 2025 08:05

htemelski-redis added 2 commits October 8, 2025 14:50

split command map into module->command

64245f8

cleanup, added logic to refresh the cache

3397b6f

ndyakov reviewed Oct 9, 2025

View reviewed changes

.github/workflows/build.yml Show resolved Hide resolved

htemelski-redis added 3 commits October 9, 2025 12:24

added reactive cache refresh

4fb4c68

revert cluster refresh

bd526a8

fixed lint

5b01de5

htemelski-redis added 2 commits October 9, 2025 13:43

updated build workflow

4d1d775

update build action

2a06726

ndyakov reviewed Oct 9, 2025

View reviewed changes

htemelski-redis added 3 commits October 10, 2025 14:14

addresed first batch of comments

17201a1

rewrote aggregator implementations with atomic for native or nearnati…

d7f7ad3

…ve primitives

addressed more comments, fixed lint

cfb290a

		return strconv.ParseBool(cmd.val)
		return strconv.ParseBool(cmd.Val())

Implement Request and Response Policy Based Routing in Cluster Mode #3422

Are you sure you want to change the base?

Implement Request and Response Policy Based Routing in Cluster Mode #3422

Uh oh!

Conversation

ofekshenawa commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Additions:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htemelski-redis Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndyakov Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ofekshenawa commented Jun 30, 2025 •

edited

Loading

htemelski-redis Sep 25, 2025 •

edited

Loading

ndyakov Oct 10, 2025 •

edited

Loading