-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: For cluster internal scopes also add variant without trailing dot #547
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix the places that depend on the old value here, not just blindly add both.
let mut cluster_domains = vec![cluster_domain.to_string()]; | ||
if let Some(cluster_domain_without_trailing_dot) = cluster_domain.strip_suffix('.') { | ||
cluster_domains.push(cluster_domain_without_trailing_dot.to_owned()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, Kerberos wants consistency and TLS doesn't really care. Either should be helped by doing both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to respond in #547 (comment)
[domain_realm] | ||
cluster.local = {realm_name} | ||
cluster.local. = {realm_name} | ||
.cluster.local = {realm_name} | ||
.cluster.local. = {realm_name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IME, this shouldn't be necessary at all (probably since we set the default realm before). But if we do keep it then we should read the actual cluster domain, not hard-code cluster.local specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this PR was mostly about trying out if we can fix the TLS cert problems we have.
Fixed it in 5781e66
General comment on why we are thinking of adding both (with and wo trailing dot) to scopes - both for TLS and Kerberos:
That being said this is a WIP, I would leave it totally up on @dervoeti and @maltesander to decide how to proceed, as they looked at the issue at the first place. I just happened to bump op-rs and run into a failing test |
Wouldn't secret-operator know just as well as any other operator? Are you planning on supporting mixed environments?
curl was also happy IME too without the dot, so I guess this is one indication that TLS SANs should never have it.
Migration is a fair concern, that's true. We should explicitly document those migration paths in the comments.
We should know what credentials we're asking for, and why. Whether they need to be included when provisioning manually, and so on. It's fine to add things to that list, it just shouldn't be something we do blindly. |
I agree that, in general, we should prefer not adding the hostname without the dot if it's not really necessary / we can work around it. I'm not sure what exactly the scenario was (@sbernauer and/or @maltesander did the research on this) but I think one reason was that Zookeeper does a reverse DNS lookup on the client IP and complains that the client cert is not valid for the returned hostname (without the trailing dot). That would be a reason to add the alternative hostname to the SANs. Other ways to solve this are trying to fix this in Zookeeper or maybe explicitly not supporting Zookeeper mTLS if you use a cluster domain with a trailing dot. I'm fine with either solution, adding the alternative hostname to the SAN was probably just the easiest way make it work. |
So, we have to make a decision. I can't really comment on the Kerberos related changes, but as far as I understand it, they are not strictly necessary but would make migration easier. In that case I would be fine with not merging these changes if they are controversial, an easy migration path is nice to have but I think it's okay if we don't have it. But I'm also fine with not merging this at all and explicitly listing Zookeeper mTLS as "known not to work with FQDN cluster domains yet". In that case we probably still support many setups with FQDN cluster domains with 25.3, so it's better than before. Opinions @nightkr @sbernauer @maltesander ? |
For TLS only the non-FQDN variant seems to matter at all, at least in my testing. We should only keep the non-FQDN variant there. For Kerberos I'm not sure. I think the argument for having both makes sense, at least during the transitional period (though we should probably make sure we have both variants in both cases). Maybe an exception here would be if we can centralize this logic in listener-op, and have that be what decides the Flag Day™️. |
I'll also do some tests with this later, if it works I'm fine with that solution as well. |
Yeah we definitly need the non-FQDN in there. That fixed most of the problems i had. IIRC zookeeper required the FQDN in the certificate. I would punt on Kerberos as well. Main thing is to fix the certs? |
I did some tests with Zookeeper yesterday, including mTLS tests with and without FQDN cluster domains, adding just the non-FQDN hostname to the SANs worked fine. Will do some more testing today with other products. |
@maltesander @nightkr @sbernauer I created a PR that only adds the non-FQDN variant to the SANs, works fine for me: |
Description
Please add a description here. This will become the commit message of the merge request later.
Definition of Done Checklist
Author
Reviewer
Acceptance