-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756
base: main
Are you sure you want to change the base?
Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the overall direction. This makes a lot of sense
We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the |
docs/resource/k8s.md
Outdated
- `pod.annotation[resource.opentelemetry.io/service.version]` | ||
- `pod.label[app.kubernetes.io/version]` (well-known label | ||
[app.kubernetes.io/version](https://kubernetes.io/docs/reference/labels-annotations-taints/#app-kubernetes-io-version)) | ||
- `if (contains(container docker image tag, '/') == false) container docker image tag` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we bail if the tag has /
? The service attribute conventions don't define a format for version. If none of the other options work, wouldn't using the tag be better (not sure what the sdk defaults to)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about this part of the spec.
I took this from the way the operator logic - collector does it a bit differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found the reason - it was a guard for a port number separator as can be seen here: https://github.com/distribution/reference/blob/284a39eaf3368476e0a4c6114a0eec61220acdd9/reference_test.go#L250
I've updated the spec to refer to that library - which is already used by the collector for container.image.id
-
... with the notable difference that I think we should include the digest if available, it's the best source of truth for the service version
docs/resource/k8s.md
Outdated
Choose the first value found: | ||
|
||
- `pod.annotation[resource.opentelemetry.io/service.instance.id]` | ||
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed service.instance.id
in the semconv meeting and had some doubts about the scenario where a container is restarted without a pod recreate (e.g. kubelet restarting a container for OOM). The confusion was whether this id should change to reflect that a new instance of the container was started which is what would happen i think if we left this to an sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention for service.instance.id
states that it MUST be unique for each instance of the same service.namespace,service.name pair (in other words service.namespace,service.name,service.instance.id triplet MUST be globally unique).
For a pod which is owned by a statefulset this is not the case. I guess the question is whether this id should be used to differentiate between different instances of a service at a given time or across all instances ever run. In case of a statefulset, the id is not unique enough to differentiate between instances of a pod with same sts ordinal but k8s will only run one instance of a pod with a given ordinal (the id will remain same after an sts pod is recreated). If we want a different id across pod recreates, switching to pod uid instead of name would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time. IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
Are sdks going to ignore any service.instance.id
values passed in with the env var OTEL_RESOURCE_ATTRIBUTES
? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
Are sdks going to ignore any
service.instance.id
values passed in with the env varOTEL_RESOURCE_ATTRIBUTES
? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.
no, SDKs must honor whatever attributes are set using OTEL_RESOURCE_ATTRIBUTES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time.
that is also my understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jsuereth for the feedback on this question, which I have now included in the PR itself:
Note that the container restart count is not included in the
service.instance.id
calculation, because it makes
troubleshooting harder when the ID changes on every restart, e.g. in a crash loop.
@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page? |
maybe as subsection(s) under https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/README.md#service |
604cdc9
to
d850c85
Compare
@jsuereth @tigrannajaryan thanks for the positive feedback in the entities SIG call. A added the notes about the object hierarchy for |
@jinja2 thanks for the detailed review - everything should be addressed now |
@open-telemetry/specs-semconv-approvers please take a look Approval from the k8s semconv and entities SIG has been granted now |
@@ -439,3 +439,17 @@ A ResourceQuota provides constraints that limit aggregate resource consumption p | |||
<!-- endsemconv --> | |||
|
|||
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status | |||
|
|||
## Specify resource attributes using Kubernetes annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the best place for these docs.
I agree with what's written, just not location. It's ok here for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask I think this was your suggestion - IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was from @dmitryax #1756 (comment), I have no objection to moving it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding it to the yaml - along with modifying the jinja template.
It looked strange to have markdown in yaml - so I'm wondering where we could put the text.
The goal was to have the k8s guidelines for service attributes part of the service spec.
Here's how it looks embedded in yaml:
groups:
- id: registry.service
type: attribute_group
display_name: Service Attributes
brief: >
A service instance.
note: >
Note: `service.namespace` and `service.name` are not intended to be concatenated for the purpose of forming a single globally unique name for the service. For example the following 2 sets of attributes actually describe 2 different services (despite the fact that the concatenation would result in the same string):
```
# Resource attributes that describes a service.
namespace = Company.Shop
service.name = shoppingcart
```
```
# Another set of resource attributes that describe a different service.
namespace = Company
service.name = Shop.shoppingcart
```
### Service in Kubernetes
@joaopgrassi or @lmolkova do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this all non-normative "suggestion-like", I think it would be nicer if it was its own document, inside https://github.com/open-telemetry/semantic-conventions/tree/main/docs/non-normative. So then everything in this PR goes to a new document there. WDYT?
…d prod (deployment.envirnoment.name)
Co-authored-by: Joao Grassi <[email protected]>
2a21f8c
to
a5ba77b
Compare
@open-telemetry/specs-semconv-maintainers this seems like a good PR to include in https://changelog.opentelemetry.io. Should we add the label to it? 🤔 |
sure! I don't think it strictly falls under "New semantic convention areas or stability changes, as well as releases", but I think it's always ok for @open-telemetry/specs-semconv-maintainers to call out particularly interesting or broadly relevant changes on https://changelog.opentelemetry.io |
Fixes #236
This is a new take on the issue. #349 has been created before and not completed.
So what's the difference now?
service.instance.id
has been completed - which was effectively a blocker before.Merge requirement checklist
[chore]