-
Notifications
You must be signed in to change notification settings - Fork 1.4k
refactor(ssi): introduce libraryinjection provider for APM injection #44972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc0cdcc to
3f01216
Compare
Go Package Import DifferencesBaseline: 340919a
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 340919a Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.90 | [-1.15, +4.95] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.90 | [-1.15, +4.95] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.78 | [+0.73, +0.84] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.39 | [+0.36, +0.43] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.37 | [+0.14, +0.60] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.24 | [+0.20, +0.29] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.17 | [-0.04, +0.38] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_memory | memory utilization | +0.13 | [+0.05, +0.20] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.11 | [-0.12, +0.34] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | +0.08 | [-0.02, +0.17] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.07 | [-0.30, +0.45] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.05 | [+0.00, +0.10] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.04 | [-0.00, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.03 | [-0.35, +0.42] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.02 | [-0.11, +0.14] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.10, +0.13] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.08, +0.07] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.03 | [-0.43, +0.37] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.14 | [-0.30, +0.02] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.14 | [-1.61, +1.32] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.28 | [-0.33, -0.22] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.30 | [-0.50, -0.10] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.49 | [-0.65, -0.34] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.78 | [-0.84, -0.72] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
ba41a0c to
007ddd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betterengineering I think we should move mutate/autoinstrumentation/annotation.go to mutate/common/annotation.go so we can reuse it in library_injection.
But I didn't want to do it in this PR to keep the review simpler.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts:
- Our module should be self contained
- My PR to refactor annotations moved everything into the
autoinstrumentationmodule. The issue with shared usage we've had so far is that business logic ends up being implicit across webhooks and really hard to change.LabelSelectorswas the other big one.
- My PR to refactor annotations moved everything into the
- We should have submodules
- I like that you've added one for library injection! If we want to reuse the annotation code, we should make another submodule or potentially leverage
pkg/ssiand add a module there. I'm ok with small modules for now until we get it right. For example, make aautoinstrumentation/annotationmodule until we figure out where it belongs.
- I like that you've added one for library injection! If we want to reuse the annotation code, we should make another submodule or potentially leverage
- We shouldn't duplicate our code, even temporarily
- I would much rather we have a somewhat useless module that only does annotations that can be imported everywhere then have two files for annotations, even temporarily.
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
29 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
betterengineering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good stab at isolating the injection mutation code. I think we have more work to do here, which can either happen in follow up PRs or in this change.
| containerNames: defaultContainerNames, | ||
| }, | ||
| }, | ||
| "custom library image via annotation is used": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a duplicate of:
local sdk injection with custom library image gets custom image
| }, | ||
| }, | ||
| // All supported languages tests | ||
| "all supported languages can be injected simultaneously": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "all supported languages can be injected simultaneously": { | |
| "all supported languages can be injected simultaneously through local SDK injection": { |
| }, | ||
| containerNames: defaultContainerNames, | ||
| expectedAnnotations: map[string]string{ | ||
| "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "datadog-auto-instrumentation,datadog-auto-instrumentation-etc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently test for this in pkg/ssi/testutils during RequireInjection, but it doesn't hurt to add here:
datadog-agent/pkg/ssi/testutils/pod.go
Line 106 in 5964cf0
| K8sAutoscalerSafeToEvictVolumesAnnotation: "datadog-auto-instrumentation,datadog-auto-instrumentation-etc", |
| --- | ||
| other: | ||
| - | | ||
| APM: Refactor APM auto-instrumentation library injection to use a provider-based architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't think this change needs a release note. I think it makes sense to reserve release notes for when there is a customer facing behavior change. You can use changelog/no-changelog to allow CI to be ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts:
- Our module should be self contained
- My PR to refactor annotations moved everything into the
autoinstrumentationmodule. The issue with shared usage we've had so far is that business logic ends up being implicit across webhooks and really hard to change.LabelSelectorswas the other big one.
- My PR to refactor annotations moved everything into the
- We should have submodules
- I like that you've added one for library injection! If we want to reuse the annotation code, we should make another submodule or potentially leverage
pkg/ssiand add a module there. I'm ok with small modules for now until we get it right. For example, make aautoinstrumentation/annotationmodule until we figure out where it belongs.
- I like that you've added one for library injection! If we want to reuse the annotation code, we should make another submodule or potentially leverage
- We shouldn't duplicate our code, even temporarily
- I would much rather we have a somewhat useless module that only does annotations that can be imported everywhere then have two files for annotations, even temporarily.
| workloadmeta "github.com/DataDog/datadog-agent/comp/core/workloadmeta/def" | ||
| "github.com/DataDog/datadog-agent/pkg/clusteragent/admission/common" | ||
| "github.com/DataDog/datadog-agent/pkg/clusteragent/admission/metrics" | ||
| libraryinjection "github.com/DataDog/datadog-agent/pkg/clusteragent/admission/mutate/autoinstrumentation/library_injection" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this module should be tweaked. The simplest change is to remove the underscore from the module name. That way, we don't need to libraryinjection prefix on the import.
But maybe you and I can workshop a more specific name? Maybe, provider? The best name for the module will be the same as the core interface it provides
| } | ||
|
|
||
| // LibraryConfig contains the configuration needed to inject a language-specific tracing library. | ||
| type LibraryConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for your change, but we should standardize this piece. I propose the following structs: main...mark.spicer/refactor-lib-info
Let me create the new structs as a module before the EOD. That way we can start to use them and are not stepping on each others toes.
| // Different implementations can use different mechanisms: | ||
| // - initContainerProvider: Uses init containers with EmptyDir volumes | ||
| // - (future) CSI provider: Uses a CSI driver to mount library files | ||
| type libraryInjectionProvider interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this interface! I think we could make it even simpler. Here would by my dream interface for this module:
type Provider interface {
Inject(pod *corev1.Pod, libs []Library) Result
}To support this, the injector would need to be a library as well, which may get awkward. I also notice you have some other metadata that needs passed in. Maybe an expanded version:
type Provider interface {
Inject(ctx InjectContext, pod *corev1.Pod, injector Injector, libs []Library) Result
}But I think I want this module to have one method that does everything I need. I will always need the injector.
|
|
||
| //go:build kubeapiserver | ||
|
|
||
| package libraryinjection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the _test pattern. It's not just pedantic. I think it forces us to write good tests and think about the public interfaces we are creating. One issue we had before was because the tests had access to everything, we didn't create good interfaces and we tested it as is.
What does this PR do?
Introduces a new
library_injectionpackage with a provider-based architecture for APM library injection.This refactoring extracts the injection logic into:
InjectAPMLibrarieshigh-level function orchestrating the injection flowlibraryInjectionProviderinterface withInjectInjectorandInjectLibrarymethodsinitContainerProviderimplementation using init containers and EmptyDir volumes (current implementation)The
apmInjectionMutatorfunction innamespace_mutator.gois simplified to delegate tolibraryinjection.InjectAPMLibraries.Review Strategy
We recommend reviewing this PR commit by commit for easier understanding of the changes.
Commits Overview
refactor(ssi): introduce libraryinjection provider for APM injectionCore refactoring that introduces the new
library_injectionpackage with a provider-based architecture. This abstracts the injection mechanism behind an interface (libraryInjectionProvider) with an initial implementation using init containers (initContainerProvider).Add testsAdds unit tests for the
library_injectionpackage (pod_patcher_test.go,init_container_test.go) and additional integration tests inauto_instrumentation_test.goto ensure compatibility with the existing behavior.Cleanup dead codeRemoves legacy code that is no longer used after the refactoring:
injector.go,lib_requirement.go, unused methods inlanguage_versions.go, and various helper functions.Move functions to their right locationsImproves code organization by moving functions to the files where they are actually used (e.g.,
getNamespaceLabels,extractLibrariesFromAnnotations→target_mutator.go).Add release noteAdds the release note for the Cluster Agent changelog.
Motivation
This refactoring prepares the codebase for upcoming alternative injection modes, such as CSI driver-based injection. By abstracting the injection mechanism behind a provider interface, new injection strategies can be added without modifying the core mutation logic.
Describe how you validated your changes
go test -tags "kubeapiserver test" ./pkg/clusteragent/admission/mutate/autoinstrumentation/...Additional Notes