-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[SQG] Introduce a lightweight SQG for MQ #43592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Gitlab CI Configuration ChangesModified Jobsagent_deb-x64-a7 agent_deb-x64-a7:
after_script:
- "# Measure package size and generate in-place report\n# This runs after the main\
\ script and won't fail the job if there are issues.\n# Only run if the job status\
\ is success.\n# This is common for both rpm and deb packages\nif [[ \"$CI_JOB_STATUS\"\
\ != \"success\" ]]; then\n echo \"\u2139\uFE0F Skipping package measurement\
\ (job status is not success)\"\n exit 0\nfi\n\nif [[ -n \"$STATIC_QUALITY_GATE_NAME\"\
\ ]]; then\n echo \"\U0001F4CA Starting package measurement...\"\n\n # If the\
\ gate name contains \"suse\", use the SUSE package directory, \n # otherwise\
\ use the default package directory\n [[ \"$STATIC_QUALITY_GATE_NAME\" == *\"\
suse\"* ]] && \\\n BASE_PACKAGE_DIR=\"$OMNIBUS_PACKAGE_DIR_SUSE\" || \\\n BASE_PACKAGE_DIR=\"\
$OMNIBUS_PACKAGE_DIR\"\n\n # Determine package type and set appropriate variables\n\
\ # Handle FIPS packages by adjusting project name\n PROJECT_NAME=\"$DD_PROJECT\"\
\n if [[ \"$STATIC_QUALITY_GATE_NAME\" == *\"fips\"* ]]; then\n PROJECT_NAME=\"\
fips-${DD_PROJECT}\"\n fi\n\n # RPM uses x86_64 and aarch64 for architecture.\n\
\ # If STATIC_QUALITY_GATE_ARCH is set (which is the case for RPM packages),\
\ use it, \n # otherwise use PACKAGE_ARCH (which is the architecture of the package)\n\
\ ARCH_VAR=\"${STATIC_QUALITY_GATE_ARCH:-${PACKAGE_ARCH}}\"\n\n # Determine\
\ format based on package type\n case \"$STATIC_QUALITY_GATE_NAME\" in\n *deb*)\
\ SEP=\"_7*\"; EXT=\"deb\"; ARCH=\"${PACKAGE_ARCH}\" ;;\n *rpm*|*suse*) SEP=\"\
-7*\"; EXT=\"rpm\" ;;\n *) echo \"\u26A0\uFE0F Unknown package type for gate:\
\ $STATIC_QUALITY_GATE_NAME\"; exit 1 ;;\n esac\n\n PACKAGE_PATTERN=\"${BASE_PACKAGE_DIR}/datadog-${PROJECT_NAME}${SEP}${ARCH_VAR}.${EXT}\"\
\n\n echo \"\U0001F50D Looking for package with pattern: $PACKAGE_PATTERN\"\n\
\n # Extract report prefix from gate name (e.g. static_quality_gate_agent_rpm_amd64\
\ -> agent_rpm_amd64)\n REPORT_PREFIX=\"${STATIC_QUALITY_GATE_NAME#static_quality_gate_}\"\
\n\n for package_file in $PACKAGE_PATTERN; do\n if [[ -f \"$package_file\"\
\ ]]; then\n echo \"\U0001F4CF Measuring package: $package_file\"\n\n \
\ # Generate measurement report using STATIC_QUALITY_GATE_NAME variable\n \
\ dda inv quality-gates.measure-package-local \\\n --package-path \"\
$package_file\" \\\n --gate-name \"$STATIC_QUALITY_GATE_NAME\" \\\n \
\ --build-job-name \"$CI_JOB_NAME\" \\\n --output-path \"${REPORT_PREFIX}_size_report_${CI_PIPELINE_ID}_${CI_COMMIT_SHA:0:8}.yml\"\
\ \\\n --debug || { echo \"\u26A0\uFE0F Package measurement failed for\
\ $package_file\"; exit 1; }\n\n echo \"\u2705 Package measurement completed\"\
\n\n # Upload the report to S3\n BUCKET_BASE_PATH=\"s3://dd-ci-artefacts-build-stable/datadog-agent/static_quality_gates/GATE_REPORTS/${CI_COMMIT_SHA}\"\
\n echo \"Uploading report to ${BUCKET_BASE_PATH}\"\n aws s3 cp --only-show-errors\
\ --region us-east-1 --sse AES256 \\\n \"${REPORT_PREFIX}_size_report_${CI_PIPELINE_ID}_${CI_COMMIT_SHA:0:8}.yml\"\
\ \\\n \"${BUCKET_BASE_PATH}/${REPORT_PREFIX}_size_report_${CI_PIPELINE_ID}_${CI_COMMIT_SHA:0:8}.yml\"\
\n else\n echo \"\u26A0\uFE0F No package found matching pattern: $PACKAGE_PATTERN\"\
; exit 1;\n fi\n done\nelse\n echo \"\u2139\uFE0F Skipping package measurement\
\ (no STATIC_QUALITY_GATE_NAME defined)\"\nfi\n"
artifacts:
expire_in: 2 weeks
paths:
- $OMNIBUS_PACKAGE_DIR
- '**/*_size_report_*.yml'
cache:
- key:
files:
- omnibus/Gemfile
- release.json
prefix: omnibus-deps-$CI_JOB_IMAGE-$CI_JOB_NAME-$OMNIBUS_RUBY_VERSION
paths:
- omnibus/vendor/bundle
image: registry.ddbuild.io/ci/datadog-agent-buildimages/linux$CI_IMAGE_LINUX_SUFFIX:$CI_IMAGE_LINUX
needs:
- datadog-agent-7-x64
rules:
- - if: $CI_COMMIT_BRANCH =~ /^mq-working-branch-/
- when: never
- when: on_success
script:
- pushd omnibus && bundle config set --local path 'vendor/bundle' && popd
- dda inv -- -e omnibus.build --base-dir $OMNIBUS_BASE_DIR --skip-deps --target-project
${DD_PROJECT} ${OMNIBUS_EXTRA_ARGS}
- curl --retry 5 -sSL "https://dd-package-tools.s3.amazonaws.com/dd-pkg/${DD_PKG_VERSION}/dd-pkg_Linux_${DD_PKG_ARCH}.tar.gz"
| tar -xz -C /usr/local/bin dd-pkg
- dd-pkg version
- find $OMNIBUS_PACKAGE_DIR -iregex '.*\.\(deb\|rpm\)' | xargs dd-pkg lint
- "if [ -n \"$PACKAGE_REQUIRED_FILES_LIST\" ]; then\n find $OMNIBUS_PACKAGE_DIR\
\ \\( -name '*.deb' -or -name '*.rpm' \\) -a -not -name '*-dbg[_-]*' | xargs dd-pkg\
\ check-files --required-files ${PACKAGE_REQUIRED_FILES_LIST}\nfi\n"
- dd-pkg sign --key-id "${PIPELINE_KEY_ALIAS}" "${OMNIBUS_PACKAGE_DIR}"
stage: packaging
tags:
- arch:amd64
- specific:true
variables:
DD_PKG_ARCH: x86_64
DD_PROJECT: agent
KUBERNETES_CPU_REQUEST: 16
KUBERNETES_MEMORY_LIMIT: 32Gi
KUBERNETES_MEMORY_REQUEST: 32Gi
OMNIBUS_PACKAGE_ARTIFACT_DIR: $OMNIBUS_PACKAGE_DIR
PACKAGE_ARCH: amd64
PACKAGE_REQUIRED_FILES_LIST: test/required_files/agent-deb.txt
STATIC_QUALITY_GATE_NAME: static_quality_gate_agent_deb_amd64Added Jobsstatic_quality_gate_mqstatic_quality_gate_mq:
artifacts:
expire_in: 1 week
paths:
- extract_rpm_package_report
- static_gate_report.json
when: always
image: registry.ddbuild.io/ci/datadog-agent-buildimages/docker_x64$CI_IMAGE_DOCKER_X64_SUFFIX:$CI_IMAGE_DOCKER_X64
needs:
- agent_deb-x64-a7
retry:
exit_codes:
- 42
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure
- unknown_failure
- api_failure
- scheduler_failure
- stale_schedule
- data_integrity_failure
rules:
- if: $CI_COMMIT_BRANCH =~ /^mq-working-branch-/
script:
- DOCKER_LOGIN=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO user)
|| exit $?
- $CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO token | crane auth
login --username "$DOCKER_LOGIN" --password-stdin "$DOCKER_REGISTRY_URL"
- EXIT="${PIPESTATUS[0]}"; if [ $EXIT -ne 0 ]; then echo "Unable to locate credentials
needs gitlab runner restart"; exit $EXIT; fi
- DATADOG_API_KEY="$("$CI_PROJECT_DIR"/tools/ci/fetch_secret.sh "$AGENT_API_KEY_ORG2"
token)" || exit $?; export DATADOG_API_KEY
- export DD_API_KEY="$DATADOG_API_KEY"
- GITHUB_KEY_B64=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $AGENT_GITHUB_APP key_b64)
|| exit $?; export GITHUB_KEY_B64
- GITHUB_APP_ID=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $AGENT_GITHUB_APP app_id)
|| exit $?; export GITHUB_APP_ID
- GITHUB_INSTALLATION_ID=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $AGENT_GITHUB_APP
installation_id) || exit $?; export GITHUB_INSTALLATION_ID
- echo "Using agent GitHub App"
- SLACK_DATADOG_AGENT_BOT_TOKEN=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $SLACK_AGENT
token) || exit $?; export SLACK_DATADOG_AGENT_BOT_TOKEN
- dda inv -- quality-gates.parse-and-trigger-gates || exit $?
stage: functional_test
tags:
- arch:amd64
- specific:trueChanges Summary
ℹ️ Diff available in the job log. |
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: cbfd043 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +4.63 | [+1.60, +7.66] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +4.63 | [+1.60, +7.66] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | +1.65 | [+1.44, +1.85] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics | memory utilization | +1.10 | [+0.90, +1.31] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.63 | [+0.55, +0.70] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.29 | [+0.24, +0.34] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | +0.20 | [+0.05, +0.34] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.12 | [+0.07, +0.17] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.03 | [-0.37, +0.43] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.02 | [-0.03, +0.07] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.01 | [-0.04, +0.06] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.00 | [-0.41, +0.42] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.00 | [-0.13, +0.13] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.08, +0.07] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.03 | [-0.40, +0.34] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.03 | [-0.16, +0.11] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.05 | [-0.29, +0.18] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.07 | [-0.28, +0.14] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.08 | [-0.14, -0.01] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.10 | [-0.24, +0.04] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.13 | [-0.22, -0.04] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.29 | [-0.38, -0.20] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.36 | [-0.41, -0.31] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_logs | % cpu utilization | -1.13 | [-2.59, +0.33] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
What does this PR do?
Motivation
Describe how you validated your changes
Additional Notes