Skip to content

Conversation

ofekshenawa
Copy link
Collaborator

@ofekshenawa ofekshenawa commented Jun 30, 2025

This PR introduces support for Redis COMMAND-based request_policy and response_policy routing for Redis commands when used in OSS Cluster client.

Key Additions:

Command Policy Loader: Parses and caches COMMAND metadata with routing/aggregation tips on first use.
Routing Engine Enhancements:
Implements support for all request policies: default(keyless), default(hashslot), all_shards, all_nodes, multi_shard, and special.
Response Aggregator: Combines multi-shard replies based on response_policy:
all_succeeded, one_succeeded, agg_sum, special, etc.
Includes custom handling for special commands like FT.CURSOR.
Raw Command Support: Policies are enforced on Client.Do(ctx, args...).

ofekshenawa and others added 7 commits May 14, 2025 21:35
feat(routing): add internal request/response policy enums
* feat: load the policy table in cluster client

* Remove comments
…or osscluster.go (#6)

* centralize cluster command routing in osscluster_router.go and refactor osscluster.go

* enalbe ci on all branches

* Add debug prints

* Add debug prints

* FIX: deal with nil policy

* FIX: fixing clusterClient process

* chore(osscluster): simplify switch case

* wip(command): ai generated clone method for commands

* feat: implement response aggregator for Redis cluster commands

* feat: implement response aggregator for Redis cluster commands

* fix: solve concurrency errors

* fix: solve concurrency errors

* return MaxRedirects settings

* remove locks from getCommandPolicy

* Handle MOVED errors more robustly, remove cluster reloading at exectutions, ennsure better routing

* Fix: supports Process hook test

* Fix: remove response aggregation for single shard commands

* Add more preformant type conversion for Cmd type

* Add router logic into processPipeline

---------

Co-authored-by: Nedyalko Dyakov <[email protected]>
@ofekshenawa ofekshenawa changed the title Load balance search commands to shards Implement Request and Response Policy Based Routing in Cluster Mode Jun 30, 2025
@ofekshenawa ofekshenawa marked this pull request as ready for review July 6, 2025 12:54
}
if result.cmd != nil && result.err == nil {
// For MGET, extract individual values from the array result
if strings.ToLower(cmd.Name()) == "mget" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this special case?

}

// getCommandPolicy retrieves the routing policy for a command
func (c *ClusterClient) getCommandPolicy(ctx context.Context, cmd Cmder) *routing.CommandPolicy {
Copy link
Contributor

@htemelski-redis htemelski-redis Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this will introduce a big overhead for each command execution.
We should fetch all policies during the connection handshake

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: for the first stage we should use hard-coded policy manager that can be extended in the future to take into account the COMMAND command output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htemelski-redis 💡 Consider implementing a PolicyResolverConfig type that users can override via the client options. This config should map module__command_name to metadata (policies, key requirements, etc.).

Set hardcoded defaults in the client options, but allow users to override policies per command as needed.

@htemelski-redis htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from 6e3b627 to 1b2eaa6 Compare October 8, 2025 08:05
Copy link
Member

@ndyakov ndyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting partial review for the aggregators.

Comment on lines +446 to +449
// For MGET without policy, use keyed aggregator
if cmdName == "mget" {
return routing.NewDefaultAggregator(true)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are passing the cmd.Name() in routing.NewResponseAggregator this can be handler by it. If policy is nil for mget, maybe the NewResponseAggregator can accept a policy and check the nil as well`.

Comment on lines 567 to 588
// SetAggregatorFunc allows setting custom aggregation logic for special commands.
func (a *SpecialAggregator) SetAggregatorFunc(fn func([]interface{}, []error) (interface{}, error)) {
a.mu.Lock()
defer a.mu.Unlock()
a.aggregatorFunc = fn
}

// SpecialAggregatorRegistry holds custom aggregation functions for specific commands.
var SpecialAggregatorRegistry = make(map[string]func([]interface{}, []error) (interface{}, error))

// RegisterSpecialAggregator registers a custom aggregation function for a command.
func RegisterSpecialAggregator(cmdName string, fn func([]interface{}, []error) (interface{}, error)) {
SpecialAggregatorRegistry[cmdName] = fn
}

// NewSpecialAggregator creates a special aggregator with command-specific logic if available.
func NewSpecialAggregator(cmdName string) *SpecialAggregator {
agg := &SpecialAggregator{}
if fn, exists := SpecialAggregatorRegistry[cmdName]; exists {
agg.SetAggregatorFunc(fn)
}
return agg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetAggregatorFunc is only used internally in this package, I assume it can be private if needed at all, see next comment.

Copy link
Member

@ndyakov ndyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting another partial review.

}

func (p *CommandPolicy) CanBeUsedInPipeline() bool {
return p.Request != ReqAllNodes && p.Request != ReqAllShards && p.Request != ReqMultiShard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about special? Can it be used in a pipeline?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that special should be handled on a case-by-case basis

Comment on lines +8 to +12
// ShardPicker chooses “one arbitrary shard” when the request_policy is
// ReqDefault and the command has no keys.
type ShardPicker interface {
Next(total int) int // returns an index in [0,total)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are great, can we implement StaticShardPicker or StickyShardPicker that will always return the same shard. I do think this can be helpful for testing. This is not a blocker by any means.

Comment on lines -879 to +1073
return strconv.ParseBool(cmd.val)
return strconv.ParseBool(cmd.Val())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this change needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, for consistency maybe?

Comment on lines +4396 to +4414
if commandInfoTips != nil {
if v, ok := commandInfoTips[requestPolicy]; ok {
if p, err := routing.ParseRequestPolicy(v); err == nil {
req = p
}
}
if v, ok := commandInfoTips[responsePolicy]; ok {
if p, err := routing.ParseResponsePolicy(v); err == nil {
resp = p
}
}
}
tips := make(map[string]string, len(commandInfoTips))
for k, v := range commandInfoTips {
if k == requestPolicy || k == responsePolicy {
continue
}
tips[k] = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we do both of those in a single range over commandInfoTips?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I completely understand the question

return nil
}

func (cmd *IntPointerSliceCmd) Clone() Cmder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's tricky here. do we need to return the same pointer or do we only want the value when cloning?

Copy link
Member

@ndyakov ndyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final part of initial review

Overview:

  • Let's use atomics when possible.
  • Left questions related to the node selection and setting of values.

Overall the design of the solution looks good, would have to do an additional pass over the test files once this review is addressed.

Thank you both @ofekshenawa and @htemelski-redis!

Comment on lines +50 to +53
if c.hasKeys(cmd) {
// execute on key based shard
return node.Client.Process(ctx, cmd)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that this node servers the slot for the key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the node should've been selected based on the slot osscluster.go:L1906

func (c *ClusterClient) cmdNode(

// execute on key based shard
return node.Client.Process(ctx, cmd)
}
return c.executeOnArbitraryShard(ctx, cmd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it doesn't matter and there is already some node selected, why not use it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two different ways of picking an arbitrary shard, either round robin or a random one

Copy link
Member

@ndyakov ndyakov Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that, but for some reason there is already a node selected here that may have been selected because MOVED OR normal key based selection. Why do we have to reselect the node? Shouldn't this selection of arbitrary node be done outside, so we do the node selection only one time and the node on line #52 is the one that should be used for this command?

Comment on lines +498 to +500
// Command executed successfully but value extraction failed
// This is common for complex commands like CLUSTER SLOTS
// The command already has its result set correctly, so just return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand that comment here. Why the value extraction returned nil? Can we make sure the cmd has value set at least? If it doesn't, we may return a cmd with nil value and nil error, which doesn't make sense.

Comment on lines +748 to +759
if c, ok := cmd.(*KeyValuesCmd); ok {
// KeyValuesCmd needs a key string and values slice
if key, ok := value.(string); ok {
c.SetVal(key, []string{}) // Default empty values
}
}
case CmdTypeZSliceWithKey:
if c, ok := cmd.(*ZSliceWithKeyCmd); ok {
// ZSliceWithKeyCmd needs a key string and Z slice
if key, ok := value.(string); ok {
c.SetVal(key, []Z{}) // Default empty Z slice
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we setting empty values here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea tbh, will look into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants