-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New API Security sampling algorithm #8178
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 66 metrics, 5 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053293
Total [baseline] (8.711 s) : 0, 8710540
Agent [candidate] (1.058 s) : 0, 1057583
Total [candidate] (8.697 s) : 0, 8696586
section iast
Agent [baseline] (1.178 s) : 0, 1178181
Total [baseline] (9.264 s) : 0, 9263516
Agent [candidate] (1.182 s) : 0, 1181631
Total [candidate] (9.258 s) : 0, 9257892
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.176 s) : 0, 1176261
Total [baseline] (9.221 s) : 0, 9221195
Agent [candidate] (1.18 s) : 0, 1179956
Total [candidate] (9.232 s) : 0, 9232400
section iast_TELEMETRY_OFF
Agent [baseline] (1.173 s) : 0, 1172929
Total [baseline] (9.243 s) : 0, 9242701
Agent [candidate] (1.175 s) : 0, 1174748
Total [candidate] (9.242 s) : 0, 9241831
gantt
title insecure-bank - break down per module: candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (720.67 ms) : 0, 720670
BytebuddyAgent [candidate] (726.694 ms) : 0, 726694
GlobalTracer [baseline] (240.493 ms) : 0, 240493
GlobalTracer [candidate] (242.118 ms) : 0, 242118
AppSec [baseline] (54.873 ms) : 0, 54873
AppSec [candidate] (55.139 ms) : 0, 55139
Debugger [baseline] (4.397 ms) : 0, 4397
Debugger [candidate] (4.44 ms) : 0, 4440
Remote Config [baseline] (716.178 µs) : 0, 716
Remote Config [candidate] (726.401 µs) : 0, 726
Telemetry [baseline] (16.089 ms) : 0, 16089
Telemetry [candidate] (12.336 ms) : 0, 12336
section iast
BytebuddyAgent [baseline] (838.616 ms) : 0, 838616
BytebuddyAgent [candidate] (841.825 ms) : 0, 841825
GlobalTracer [baseline] (230.587 ms) : 0, 230587
GlobalTracer [candidate] (230.982 ms) : 0, 230982
AppSec [baseline] (56.387 ms) : 0, 56387
AppSec [candidate] (56.272 ms) : 0, 56272
Debugger [baseline] (4.174 ms) : 0, 4174
Debugger [candidate] (4.135 ms) : 0, 4135
Remote Config [baseline] (608.574 µs) : 0, 609
Remote Config [candidate] (595.239 µs) : 0, 595
Telemetry [baseline] (8.886 ms) : 0, 8886
Telemetry [candidate] (8.795 ms) : 0, 8795
IAST [baseline] (22.907 ms) : 0, 22907
IAST [candidate] (23.005 ms) : 0, 23005
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (837.163 ms) : 0, 837163
BytebuddyAgent [candidate] (840.945 ms) : 0, 840945
GlobalTracer [baseline] (230.598 ms) : 0, 230598
GlobalTracer [candidate] (230.771 ms) : 0, 230771
AppSec [baseline] (56.178 ms) : 0, 56178
AppSec [candidate] (55.841 ms) : 0, 55841
Debugger [baseline] (4.139 ms) : 0, 4139
Debugger [candidate] (4.164 ms) : 0, 4164
Remote Config [baseline] (610.022 µs) : 0, 610
Remote Config [candidate] (592.316 µs) : 0, 592
Telemetry [baseline] (8.825 ms) : 0, 8825
Telemetry [candidate] (8.828 ms) : 0, 8828
IAST [baseline] (22.786 ms) : 0, 22786
IAST [candidate] (22.787 ms) : 0, 22787
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (835.393 ms) : 0, 835393
BytebuddyAgent [candidate] (836.452 ms) : 0, 836452
GlobalTracer [baseline] (230.22 ms) : 0, 230220
GlobalTracer [candidate] (230.407 ms) : 0, 230407
AppSec [baseline] (55.121 ms) : 0, 55121
AppSec [candidate] (56.103 ms) : 0, 56103
Debugger [baseline] (4.101 ms) : 0, 4101
Debugger [candidate] (4.138 ms) : 0, 4138
Remote Config [baseline] (607.127 µs) : 0, 607
Remote Config [candidate] (620.744 µs) : 0, 621
Telemetry [baseline] (8.59 ms) : 0, 8590
Telemetry [candidate] (8.723 ms) : 0, 8723
IAST [baseline] (22.915 ms) : 0, 22915
IAST [candidate] (22.321 ms) : 0, 22321
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.052 s) : 0, 1052034
Total [baseline] (10.523 s) : 0, 10523261
Agent [candidate] (1.049 s) : 0, 1048800
Total [candidate] (10.424 s) : 0, 10423561
section appsec
Agent [baseline] (1.198 s) : 0, 1197508
Total [baseline] (10.759 s) : 0, 10759489
Agent [candidate] (1.194 s) : 0, 1194170
Total [candidate] (10.852 s) : 0, 10852249
section iast
Agent [baseline] (1.178 s) : 0, 1178373
Total [baseline] (11.017 s) : 0, 11017471
Agent [candidate] (1.191 s) : 0, 1191107
Total [candidate] (11.018 s) : 0, 11017723
section profiling
Agent [baseline] (1.271 s) : 0, 1270696
Total [baseline] (10.893 s) : 0, 10892511
Agent [candidate] (1.276 s) : 0, 1276325
Total [candidate] (10.877 s) : 0, 10876883
gantt
title petclinic - break down per module: candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (724.355 ms) : 0, 724355
BytebuddyAgent [candidate] (720.659 ms) : 0, 720659
GlobalTracer [baseline] (241.188 ms) : 0, 241188
GlobalTracer [candidate] (240.177 ms) : 0, 240177
AppSec [baseline] (55.041 ms) : 0, 55041
AppSec [candidate] (55.361 ms) : 0, 55361
Debugger [baseline] (4.44 ms) : 0, 4440
Debugger [candidate] (5.171 ms) : 0, 5171
Remote Config [baseline] (728.987 µs) : 0, 729
Remote Config [candidate] (718.877 µs) : 0, 719
Telemetry [baseline] (10.125 ms) : 0, 10125
Telemetry [candidate] (10.66 ms) : 0, 10660
section appsec
BytebuddyAgent [baseline] (742.677 ms) : 0, 742677
BytebuddyAgent [candidate] (740.508 ms) : 0, 740508
GlobalTracer [baseline] (238.074 ms) : 0, 238074
GlobalTracer [candidate] (236.785 ms) : 0, 236785
AppSec [baseline] (176.505 ms) : 0, 176505
AppSec [candidate] (176.917 ms) : 0, 176917
Debugger [baseline] (4.32 ms) : 0, 4320
Debugger [candidate] (4.289 ms) : 0, 4289
Remote Config [baseline] (647.933 µs) : 0, 648
Remote Config [candidate] (643.847 µs) : 0, 644
Telemetry [baseline] (8.226 ms) : 0, 8226
Telemetry [candidate] (8.2 ms) : 0, 8200
IAST [baseline] (21.842 ms) : 0, 21842
IAST [candidate] (21.544 ms) : 0, 21544
section iast
BytebuddyAgent [baseline] (839.053 ms) : 0, 839053
BytebuddyAgent [candidate] (849.131 ms) : 0, 849131
GlobalTracer [baseline] (230.565 ms) : 0, 230565
GlobalTracer [candidate] (232.15 ms) : 0, 232150
AppSec [baseline] (56.158 ms) : 0, 56158
AppSec [candidate] (56.709 ms) : 0, 56709
Debugger [baseline] (4.16 ms) : 0, 4160
Debugger [candidate] (4.223 ms) : 0, 4223
Remote Config [baseline] (605.247 µs) : 0, 605
Remote Config [candidate] (613.26 µs) : 0, 613
Telemetry [baseline] (8.912 ms) : 0, 8912
Telemetry [candidate] (8.883 ms) : 0, 8883
IAST [baseline] (22.952 ms) : 0, 22952
IAST [candidate] (23.215 ms) : 0, 23215
section profiling
BytebuddyAgent [baseline] (709.273 ms) : 0, 709273
BytebuddyAgent [candidate] (712.486 ms) : 0, 712486
GlobalTracer [baseline] (350.355 ms) : 0, 350355
GlobalTracer [candidate] (352.076 ms) : 0, 352076
AppSec [baseline] (54.415 ms) : 0, 54415
AppSec [candidate] (53.796 ms) : 0, 53796
Debugger [baseline] (4.266 ms) : 0, 4266
Debugger [candidate] (4.289 ms) : 0, 4289
Remote Config [baseline] (694.293 µs) : 0, 694
Remote Config [candidate] (702.982 µs) : 0, 703
Telemetry [baseline] (8.911 ms) : 0, 8911
Telemetry [candidate] (8.947 ms) : 0, 8947
ProfilingAgent [baseline] (101.258 ms) : 0, 101258
ProfilingAgent [candidate] (102.427 ms) : 0, 102427
Profiling [baseline] (101.285 ms) : 0, 101285
Profiling [candidate] (102.453 ms) : 0, 102453
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 16 unstable metrics. Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section baseline
no_agent (386.33 µs) : 366, 406
. : milestone, 386,
iast (512.573 µs) : 491, 535
. : milestone, 513,
iast_FULL (733.905 µs) : 712, 756
. : milestone, 734,
iast_GLOBAL (561.825 µs) : 540, 584
. : milestone, 562,
iast_HARDCODED_SECRET_DISABLED (520.101 µs) : 498, 542
. : milestone, 520,
iast_INACTIVE (467.271 µs) : 446, 489
. : milestone, 467,
iast_TELEMETRY_OFF (504.273 µs) : 482, 526
. : milestone, 504,
tracing (464.794 µs) : 444, 486
. : milestone, 465,
section candidate
no_agent (388.877 µs) : 369, 409
. : milestone, 389,
iast (517.644 µs) : 496, 540
. : milestone, 518,
iast_FULL (734.471 µs) : 712, 756
. : milestone, 734,
iast_GLOBAL (561.651 µs) : 540, 583
. : milestone, 562,
iast_HARDCODED_SECRET_DISABLED (515.678 µs) : 494, 537
. : milestone, 516,
iast_INACTIVE (468.176 µs) : 447, 490
. : milestone, 468,
iast_TELEMETRY_OFF (503.655 µs) : 482, 526
. : milestone, 504,
tracing (466.332 µs) : 444, 488
. : milestone, 466,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section baseline
no_agent (1.355 ms) : 1335, 1375
. : milestone, 1355,
appsec (1.741 ms) : 1718, 1764
. : milestone, 1741,
appsec_no_iast (1.759 ms) : 1736, 1783
. : milestone, 1759,
code_origins (1.668 ms) : 1640, 1695
. : milestone, 1668,
iast (1.517 ms) : 1492, 1541
. : milestone, 1517,
profiling (1.516 ms) : 1492, 1540
. : milestone, 1516,
tracing (1.495 ms) : 1469, 1520
. : milestone, 1495,
section candidate
no_agent (1.368 ms) : 1349, 1388
. : milestone, 1368,
appsec (1.724 ms) : 1701, 1748
. : milestone, 1724,
appsec_no_iast (1.743 ms) : 1720, 1767
. : milestone, 1743,
code_origins (1.666 ms) : 1638, 1694
. : milestone, 1666,
iast (1.506 ms) : 1481, 1531
. : milestone, 1506,
profiling (1.527 ms) : 1503, 1551
. : milestone, 1527,
tracing (1.501 ms) : 1477, 1526
. : milestone, 1501,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section baseline
no_agent (14.847 s) : 14847000, 14847000
. : milestone, 14847000,
appsec (15.082 s) : 15082000, 15082000
. : milestone, 15082000,
iast (18.317 s) : 18317000, 18317000
. : milestone, 18317000,
iast_GLOBAL (18.096 s) : 18096000, 18096000
. : milestone, 18096000,
profiling (15.226 s) : 15226000, 15226000
. : milestone, 15226000,
tracing (14.896 s) : 14896000, 14896000
. : milestone, 14896000,
section candidate
no_agent (15.285 s) : 15285000, 15285000
. : milestone, 15285000,
appsec (14.988 s) : 14988000, 14988000
. : milestone, 14988000,
iast (19.115 s) : 19115000, 19115000
. : milestone, 19115000,
iast_GLOBAL (18.181 s) : 18181000, 18181000
. : milestone, 18181000,
profiling (15.053 s) : 15053000, 15053000
. : milestone, 15053000,
tracing (14.914 s) : 14914000, 14914000
. : milestone, 14914000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.48.0-SNAPSHOT~98d957d662, baseline=1.48.0-SNAPSHOT~3e2867a84a
dateFormat X
axisFormat %s
section baseline
no_agent (1.475 ms) : 1463, 1487
. : milestone, 1475,
appsec (2.334 ms) : 2290, 2378
. : milestone, 2334,
iast (2.114 ms) : 2058, 2170
. : milestone, 2114,
iast_GLOBAL (2.161 ms) : 2105, 2217
. : milestone, 2161,
profiling (1.978 ms) : 1933, 2023
. : milestone, 1978,
tracing (1.942 ms) : 1900, 1985
. : milestone, 1942,
section candidate
no_agent (1.473 ms) : 1462, 1485
. : milestone, 1473,
appsec (2.343 ms) : 2300, 2387
. : milestone, 2343,
iast (2.123 ms) : 2067, 2178
. : milestone, 2123,
iast_GLOBAL (2.161 ms) : 2105, 2217
. : milestone, 2161,
profiling (1.974 ms) : 1930, 2018
. : milestone, 1974,
tracing (1.943 ms) : 1901, 1986
. : milestone, 1943,
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
Does the motivation make sense for this PR?
|
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiAccessTracker.java
Outdated
Show resolved
Hide resolved
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiAccessTracker.java
Outdated
Show resolved
Hide resolved
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiAccessTracker.java
Outdated
Show resolved
Hide resolved
long currentTime = System.currentTimeMillis(); | ||
long hash = computeApiHash(route, method, statusCode); | ||
|
||
synchronized (apiAccessLog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit uneasy about introducing a global bottleneck at a place which might be called from many threads and rather frequently, IIUC.
I know that reducing contention here will require a more complex code because you would not be able to use the trick with the LinkedHashMap
but if this is going to be called on a hot-path, the extra complexity might be unavoidable.
I wonder, do you need to keep exact capacity limit or it would be ok if the capacity is mostly obeyed (with some fluctuation around the target value due to concurrent updates/cleanups)? If the exact capacity is not required it should help with a simpler implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked implementation of ApiAccessTracker
to avoid bottleneck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbachorik If you can have a look: now there's a concurrent hashmap for the access map, which is only written to from a single thread (post-processing, during serialization) with a 1s timeout (but usually should be ms). At request end, we do a preliminary sampling decision based on a lookup to that map, and use an atomic counter to sample at most 4 concurrently.
08d9b7f
to
bc70627
Compare
bc70627
to
6d5c67d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still half-way the review, but flushing my current comments since I have to pause.
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/PostProcessingCounter.java
Outdated
Show resolved
Hide resolved
dd-java-agent/appsec/src/main/java/com/datadog/appsec/gateway/GatewayBridge.java
Outdated
Show resolved
Hide resolved
...va-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiSecurityRequestSampler.java
Outdated
Show resolved
Hide resolved
...va-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiSecurityRequestSampler.java
Outdated
Show resolved
Hide resolved
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiAccessTracker.java
Outdated
Show resolved
Hide resolved
dd-java-agent/appsec/src/main/java/com/datadog/appsec/api/security/ApiAccessTracker.java
Outdated
Show resolved
Hide resolved
6d5c67d
to
ae09aa4
Compare
ae09aa4
to
de0e3ef
Compare
4be5bfb
to
92a91b3
Compare
b1824fa
to
0327ec4
Compare
0327ec4
to
77e1a96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (great job!)
77e1a96
to
b41b321
Compare
c3a3ff5
to
4ea51f8
Compare
4ea51f8
to
98d957d
Compare
What Does This Do
Endpoint-based API Security sampling
API Security schema extraction had a fixed sampling rate. This led to an excessive amount of work spent on redundant schemas, while still not getting schemas for infrequent endpoints. This problem was aggravated by the interaction with APM sampling, which could decide to drop a trace after we inferred the schema.
The new algorithm implemented here works as follows:
(route, http method, status code)
combination.http.route
tag, and if it is absent, schema extraction will not happen at all.system-test upgrade for DataDog/system-tests#4195, is required
Late processing
APM sampling decision might not be present until a point later than the request end handlers. So in our usual request end handler for AppSec we cannot access it. To fit this requirement, we're moving schema extraction to the trace serialization thread (see
TraceProcessingWorker
), where the APM sampling decision is guaranteed to have happened. This is based on the API for trace post-processing introduced at #6800.This also has the advantage of moving schema extraction out of the critical path to handle the request, and should have a positive impact in latency when API security is enabled.
However, it has two additional risks:
dd.trace.post-processing.timeout
) for trace post-processors. If that timeout ellapses, no more schema extraction will happen.Other changes
The old sampling mechanism is fully removed:
dd.api-security.request.sample.rate
(system property) orDD_API_SECURITY_REQUEST_SAMPLE_RATE
(env var) option is gone.dd.api-security.sample.delay
(system property) orDD_API_SECURITY_SAMPLE_DELAY
defaulting to 30 is introduced, to set the interval in seconds for the new sampler. This should generally not be used by customers, but is used in testing scenarios.Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: APPSEC-54874