Skip to content

bin/resque: add support for connecting with a cluster client #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jacobbednarz
Copy link

@jacobbednarz jacobbednarz commented Feb 2, 2022

As bin/resque currently works, it pulls in the hostname to use from
the environment using getenv. The problem with this is that you cannot
pass in an array which is what the underlying Redis library uses to
determine whether it initialises a Credis_Client or a Credit_Cluster
for the connection.

This solves that issue by introducing support for passing a comma
separated list of hostnames to REDIS_BACKEND which will be expanded to
an array before passing to Credis_Cluster.

@jacobbednarz jacobbednarz changed the base branch from master to develop February 2, 2022 02:54
@jacobbednarz jacobbednarz force-pushed the add-cluster-support-to-worker branch 6 times, most recently from e8bf589 to b7bb014 Compare February 2, 2022 04:54
@danhunsaker
Copy link
Member

It seems to me that simply having a comma in the host variable would indicate that cluster mode is desired. We can drop some configuration complexity by checking for a comma (stripos($host, ',') !== false), rather than introducing another environment variable.

@jacobbednarz
Copy link
Author

We could however, that would mean for single hostname clusters, we'd need to have a wonky looking hostname like redis.local, for it to trigger the clustering client. With this proposal, we keep a single hostname but we flag we explicitly want to use the clustering client. Thoughts?

@danhunsaker
Copy link
Member

What is the benefit of using the cluster driver with a single host?

@jacobbednarz
Copy link
Author

In my specific use case, it's not a single host but a single hostname that is used for the entire cluster which is load balanced by a Kubernetes service.

@danhunsaker
Copy link
Member

I must be missing something. The Cluster driver will still only create one connection per hostname, regardless of how many servers are actually behind that name (neither the Redis extension nor PHP stream connections have access to more than the first IP that a hostname resolves to, as is the norm for any software that doesn't explicitly work around that default behavior of OS-level hostname-lookup functions). So your Cluster still only has a single Client inside, giving you exactly zero benefit to using the driver in the first place, plus a bit of extra function call overhead for your trouble.

@jacobbednarz
Copy link
Author

You’re correct about the one connection per hostname but using the cluster client is also about knowing how to find data on the (potentially) sharded cluster. If the regular client attempts a lookup and doesn’t hit the correct node on the first go, it will throw an exception as it (by design) doesn’t follow the redirection from Redis; the cluster client will and successfully fetches the key regardless of which mode has the data.

@danhunsaker
Copy link
Member

Interesting. I saw exactly zero code that would handle such a difference in the Cluster driver itself. It doesn't even handle connecting - it makes the regular driver do it. All the Credis Cluster class does is wrap one or more (regular) Credis Client objects, forwarding requests to the specific client object and returning the results without further processing, based on what i just read earlier. So I wonder what mechanism it uses to do that?

In any case, I'm not against supporting the extra variable if we have to for things to function correctly, but i also think it should be an override for the single hostname edge case rather than an on/off switch for the entire feature. Commas in the host string should automatically select the Cluster driver accordingly whether the additional value is set or not.

@jacobbednarz
Copy link
Author

How do you feel about supporting either? Single hostnames needing cluster support can add the new environment flag and multiple hostnames separated by a comma also automatically get cluster support? I’m just thinking if I came across a single hostname with a trailing comma in a config file, I’d be confused as to why it was like that. Having an extra comma may also limit the reuse of REDIS_BACKEND as an environment variable as everything that used it would then need to take that into account.

@danhunsaker
Copy link
Member

That hybrid approach is what I was trying to describe above, yeah, so I'm on board with that.

@jacobbednarz jacobbednarz force-pushed the add-cluster-support-to-worker branch from b7bb014 to c851597 Compare February 6, 2022 19:52
@jacobbednarz
Copy link
Author

Thanks for clarifying; that wasn't apparent to me. I've added support for comma delimitered hosts in REDIS_BACKEND as well.

@jacobbednarz
Copy link
Author

Having a dig further into this, there is something not quite right either with Credis or our cluster. For the test, I'm doing the following:

  • Grab the Credis library files
$ curl -o Client.php https://raw.githubusercontent.com/colinmollenhour/credis/master/Client.php
$ curl -o Cluster.php https://raw.githubusercontent.com/colinmollenhour/credis/master/Cluster.php
  • Add a simple get from Redis
$ cat repro.php

<?php
require 'Client.php';
require 'Cluster.php';

$cluster = new Credis_Cluster(array(
  array('host' => 'svc.hostname.local', 'port' => 6379),
));

var_dump($cluster->get("example-key");
  • Run the PHP file
$ php repro.php

PHP Fatal error:  Uncaught RedisException: MOVED 4701 10.36.230.134:6379 in /tmp/Client.php:1191
Stack trace:
#0 /tmp/Client.php(1191): Redis->get('example-key')
#1 /tmp/Cluster.php(253): Credis_Client->__call('get', Array)
#2 /tmp/repro.php(9): Credis_Cluster->__call('get', Array)
#3 {main}

Next CredisException: MOVED 4701 10.36.230.134:6379 in /tmp/Client.php:1209
Stack trace:
#0 /tmp/Cluster.php(253): Credis_Client->__call('get', Array)
#1 /tmp/repro.php(9): Credis_Cluster->__call('get', Array)
#2 {main}
  thrown in /tmp/Client.php on line 1209

I would expect that this works as the library mentions it support clustering, that this works out of the box. Probably hold on this PR until this is resolved in case I'm holding this wrong.

@jacobbednarz
Copy link
Author

jacobbednarz commented Feb 7, 2022

Looks like there is something funky with single hostnames as if I explicitly add all the IPs in the cluster instead, this works as expected with the bin/resque modifications. We might be able to ditch the REDIS_CLUSTER_ENABLED environment variable after all.

@jacobbednarz
Copy link
Author

jacobbednarz commented Feb 7, 2022

@danhunsaker do you happen to have a working example of multiple hostnames with Resque::setBackend? Passing in an array to Resque::setBackend doesn't seem to be working as I would expect, and it looks like only a single host ends up used in Credis.

Resque::setBackend([
  ["host" => "10.0.0.1", "port" => 6379],
  ["host" => "10.0.0.2", "port" => 6379],
]);

@jacobbednarz
Copy link
Author

jacobbednarz commented Feb 7, 2022

It's worth noting as well, Credis_Cluster doesn't support automatic discovery of other nodes. The only way clustering works is with explicit hostnames so comma separated lists are the only support approach now.

As `bin/resque` currently works, it pulls in the hostname to use from
the environment using `getenv`. The problem with this is that you cannot
pass in an array which is what the underlying Redis library uses[1] to
determine whether it initialises a `Credis_Client` or a `Credit_Cluster`
for the connection.

This solves that issue by introducing support for passing a comma
separated list of hostnames to `REDIS_BACKEND` which will be expanded to
an array before passing to `Credis_Cluster`.

[1]: master/lib/Resque/Redis.php#L128
@jacobbednarz jacobbednarz force-pushed the add-cluster-support-to-worker branch from 06f7629 to 51ea37f Compare February 8, 2022 00:24
Comment on lines -43 to -47
if(!empty($REDIS_BACKEND)) {
if (empty($REDIS_BACKEND_DB))
Resque::setBackend($REDIS_BACKEND);
else
Resque::setBackend($REDIS_BACKEND, $REDIS_BACKEND_DB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resque-scheduler has the same block, so support should probably also be added there

@jacobbednarz
Copy link
Author

I've finally got this working for our use case however, not as I originally intended.

The kicker is that Credis_Cluster != redis cluster compatible. Credis_Cluster is a poor man's version where it handles all the clients and hashes the keys to find where the data is stored despite none of the hosts knowing about the others.

This was an oversight on my behalf and confused the heck out of everything because I was expecting it to work in the same way where the client would follow the MOVE responses from redis cluster but it did not. We ended up solving our use case using an intermediate proxy that is redis compatible so I'm not going to pursue this PR any further. Someone else is welcome to pick this up if they see value in it but as we won't be using this, I can't in good faith land it without proper testing which I cannot perform in our setup now.

Thanks for your help though!

@jacobbednarz jacobbednarz deleted the add-cluster-support-to-worker branch February 9, 2022 03:35
@mfn
Copy link
Contributor

mfn commented Feb 9, 2022

We ended up solving our use case using an intermediate proxy that is redis compatible

Can you share what proxy? Or is it internal? Thanks

@jacobbednarz
Copy link
Author

envoy proxy with the redis cluster extension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants