Skip to content

Commit a3a477b

Browse files
brettlangdongithub-actions[bot]
authored andcommitted
fix(internal): move logging rate limiter to log filter (#12243)
PGB-61 Fixes #4856 Improve and fix internal `ddtrace.internal.logger.get_logger` implementation. Today we enforce that all internal `ddtrace` loggers use a customer class implementation of `logging.Logger` to ensure we can apply custom logic like forwarding internal error logs to telemetry and rate limiting the volume of ddtrace logs allowed. However, the current implementation requires knowledge and use of internal `logging.getLogger` details, and it tries to replicate some of the behavior. This has caused us to miss a key lock around a shared resource that was expected. This change in implementation moves to apply our logic via a log filter instead of a customer logger class. A filter approach will provide the same logic with a much more simple approach which will be easier to reason and maintain which will avoid the bug/race condition found. The downside to this approach is that loggers can have multiple filters, and the first filter to determine that a log record should not be logged will prevent the other filters from running. This means if a user applies a log filter to one of our `ddtrace.*` loggers, then we may miss collecting and forwarding the internal errors to telemetry. Otherwise, if a user filter prevents a log from being logged, then it won't apply towards our rate limit (which is acceptable/correct). ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Nicole Cybul <[email protected]> Co-authored-by: Nick Ripley <[email protected]> Co-authored-by: William Conti <[email protected]> Co-authored-by: Christophe Papazian <[email protected]> Co-authored-by: Munir Abdinur <[email protected]> Co-authored-by: Laplie Anderson <[email protected]> (cherry picked from commit 122caa6)
1 parent ad8057c commit a3a477b

File tree

3 files changed

+126
-338
lines changed

3 files changed

+126
-338
lines changed

ddtrace/internal/logger.py

+58-158
Original file line numberDiff line numberDiff line change
@@ -1,180 +1,80 @@
11
import collections
22
import logging
33
import os
4-
import typing
5-
from typing import Optional # noqa:F401
6-
from typing import cast # noqa:F401
4+
from typing import DefaultDict
5+
from typing import Tuple
76

87

9-
if typing.TYPE_CHECKING:
10-
from typing import Any # noqa:F401
11-
from typing import DefaultDict # noqa:F401
12-
from typing import Tuple # noqa:F401
13-
14-
15-
def get_logger(name):
16-
# type: (str) -> DDLogger
8+
def get_logger(name: str) -> logging.Logger:
179
"""
18-
Retrieve or create a ``DDLogger`` instance.
19-
20-
This function mirrors the behavior of `logging.getLogger`.
10+
Retrieve or create a ``Logger`` instance with consistent behavior for internal use.
2111
22-
If no logger with the provided name has been fetched before then
23-
a new one is created.
12+
Configure all loggers with a rate limiter filter to prevent excessive logging.
2413
25-
If a previous logger has been created then it is returned.
26-
27-
DEV: We do not want to mess with `logging.setLoggerClass()`
28-
That will totally mess with the user's loggers, we want
29-
just our own, selective loggers to be DDLoggers
30-
31-
:param name: The name of the logger to fetch or create
32-
:type name: str
33-
:return: The logger instance
34-
:rtype: ``DDLogger``
3514
"""
36-
# DEV: `logging.Logger.manager` refers to the single root `logging.Manager` instance
37-
# https://github.com/python/cpython/blob/48769a28ad6ef4183508951fa6a378531ace26a4/Lib/logging/__init__.py#L1824-L1826 # noqa:E501
38-
manager = logging.Logger.manager
15+
logger = logging.getLogger(name)
16+
# addFilter will only add the filter if it is not already present
17+
logger.addFilter(log_filter)
18+
return logger
3919

40-
# If the logger does not exist yet, create it
41-
# DEV: `Manager.loggerDict` is a dict mapping logger name to logger
42-
# DEV: This is a simplified version of `logging.Manager.getLogger`
43-
# https://github.com/python/cpython/blob/48769a28ad6ef4183508951fa6a378531ace26a4/Lib/logging/__init__.py#L1221-L1253 # noqa:E501
44-
# DEV: _fixupParents could be adding a placeholder, we want to replace it if that's the case
45-
if name in manager.loggerDict:
46-
logger = manager.loggerDict[name]
47-
if isinstance(manager.loggerDict[name], logging.PlaceHolder):
48-
placeholder = logger
49-
logger = DDLogger(name=name)
50-
manager.loggerDict[name] = logger
51-
# DEV: `_fixupChildren` and `_fixupParents` have been around for awhile,
52-
# DEV: but add the `hasattr` guard... just in case.
53-
if hasattr(manager, "_fixupChildren"):
54-
manager._fixupChildren(placeholder, logger)
55-
if hasattr(manager, "_fixupParents"):
56-
manager._fixupParents(logger)
57-
else:
58-
logger = DDLogger(name=name)
59-
manager.loggerDict[name] = logger
60-
if hasattr(manager, "_fixupParents"):
61-
manager._fixupParents(logger)
6220

63-
# Return our logger
64-
return cast(DDLogger, logger)
21+
# Named tuple used for keeping track of a log lines current time bucket and the number of log lines skipped
22+
LoggingBucket = collections.namedtuple("LoggingBucket", ("bucket", "skipped"))
23+
# Dict to keep track of the current time bucket per name/level/pathname/lineno
24+
_buckets: DefaultDict[Tuple[str, int, str, int], LoggingBucket] = collections.defaultdict(lambda: LoggingBucket(0, 0))
6525

26+
# Allow 1 log record per name/level/pathname/lineno every 60 seconds by default
27+
# Allow configuring via `DD_TRACE_LOGGING_RATE`
28+
# DEV: `DD_TRACE_LOGGING_RATE=0` means to disable all rate limiting
29+
_rate_limit = int(os.getenv("DD_TRACE_LOGGING_RATE", default=60))
6630

67-
def hasHandlers(self):
68-
# type: (DDLogger) -> bool
69-
"""
70-
See if this logger has any handlers configured.
71-
Loop through all handlers for this logger and its parents in the
72-
logger hierarchy. Return True if a handler was found, else False.
73-
Stop searching up the hierarchy whenever a logger with the "propagate"
74-
attribute set to zero is found - that will be the last logger which
75-
is checked for the existence of handlers.
7631

77-
https://github.com/python/cpython/blob/8f192d12af82c4dc40730bf59814f6a68f68f950/Lib/logging/__init__.py#L1629
32+
def log_filter(record: logging.LogRecord) -> bool:
7833
"""
79-
c = self
80-
rv = False
81-
while c:
82-
if c.handlers:
83-
rv = True
84-
break
85-
if not c.propagate:
86-
break
87-
else:
88-
c = c.parent # type: ignore
89-
return rv
34+
Function used to determine if a log record should be outputted or not (True = output, False = skip).
9035
91-
92-
class DDLogger(logging.Logger):
93-
"""
94-
Custom rate limited logger used by ``ddtrace``
95-
96-
This logger class is used to rate limit the output of
97-
log messages from within the ``ddtrace`` package.
36+
This function will:
37+
- Log all records with a level of ERROR or higher with telemetry
38+
- Rate limit log records based on the logger name, record level, filename, and line number
9839
"""
40+
if record.levelno >= logging.ERROR:
41+
# avoid circular import
42+
from ddtrace.internal import telemetry
9943

100-
# Named tuple used for keeping track of a log lines current time bucket and the number of log lines skipped
101-
LoggingBucket = collections.namedtuple("LoggingBucket", ("bucket", "skipped"))
102-
103-
def __init__(self, *args, **kwargs):
104-
# type: (*Any, **Any) -> None
105-
"""Constructor for ``DDLogger``"""
106-
super(DDLogger, self).__init__(*args, **kwargs)
107-
108-
# Dict to keep track of the current time bucket per name/level/pathname/lineno
109-
self.buckets = collections.defaultdict(
110-
lambda: DDLogger.LoggingBucket(0, 0)
111-
) # type: DefaultDict[Tuple[str, int, str, int], DDLogger.LoggingBucket]
44+
# currently we only have one error code
45+
full_file_name = os.path.join(record.pathname, record.filename)
46+
telemetry.telemetry_writer.add_error(1, record.msg % record.args, full_file_name, record.lineno)
11247

113-
# Allow 1 log record per name/level/pathname/lineno every 60 seconds by default
114-
# Allow configuring via `DD_TRACE_LOGGING_RATE`
115-
# DEV: `DD_TRACE_LOGGING_RATE=0` means to disable all rate limiting
116-
rate_limit = os.getenv("DD_TRACE_LOGGING_RATE", default=None)
117-
118-
if rate_limit is not None:
119-
self.rate_limit = int(rate_limit)
120-
else:
121-
self.rate_limit = 60
122-
123-
def handle(self, record):
124-
# type: (logging.LogRecord) -> None
125-
"""
126-
Function used to call the handlers for a log line.
127-
128-
This implementation will first determine if this log line should
129-
be logged or rate limited, and then call the base ``logging.Logger.handle``
130-
function if it should be logged
131-
132-
DEV: This method has all of it's code inlined to reduce on functions calls
133-
134-
:param record: The log record being logged
135-
:type record: ``logging.LogRecord``
136-
"""
137-
if record.levelno >= logging.ERROR:
138-
# avoid circular import
139-
from ddtrace.internal import telemetry
140-
141-
# currently we only have one error code
142-
full_file_name = os.path.join(record.pathname, record.filename)
143-
telemetry.telemetry_writer.add_error(1, record.msg % record.args, full_file_name, record.lineno)
144-
145-
# If rate limiting has been disabled (`DD_TRACE_LOGGING_RATE=0`) then apply no rate limit
146-
# If the logging is in debug, then do not apply any limits to any log
147-
if not self.rate_limit or self.getEffectiveLevel() == logging.DEBUG:
148-
super(DDLogger, self).handle(record)
149-
return
48+
logger = logging.getLogger(record.name)
15049

50+
# If rate limiting has been disabled (`DD_TRACE_LOGGING_RATE=0`) then apply no rate limit
51+
# If the logger is set to debug, then do not apply any limits to any log
52+
if not _rate_limit or logger.getEffectiveLevel() == logging.DEBUG:
53+
return True
15154
# Allow 1 log record by name/level/pathname/lineno every X seconds
152-
# DEV: current unix time / rate (e.g. 300 seconds) = time bucket
153-
# int(1546615098.8404942 / 300) = 515538
154-
# DEV: LogRecord `created` is a unix timestamp/float
155-
# DEV: LogRecord has `levelname` and `levelno`, we want `levelno` e.g. `logging.DEBUG = 10`
156-
current_bucket = int(record.created / self.rate_limit)
157-
158-
# Limit based on logger name, record level, filename, and line number
159-
# ('ddtrace.writer', 'DEBUG', '../site-packages/ddtrace/writer.py', 137)
160-
# This way each unique log message can get logged at least once per time period
161-
# DEV: LogRecord has `levelname` and `levelno`, we want `levelno` e.g. `logging.DEBUG = 10`
162-
key = (record.name, record.levelno, record.pathname, record.lineno)
163-
164-
# Only log this message if the time bucket has changed from the previous time we ran
165-
logging_bucket = self.buckets[key]
166-
if logging_bucket.bucket != current_bucket:
167-
# Append count of skipped messages if we have skipped some since our last logging
168-
if logging_bucket.skipped:
169-
record.msg = "{}, %s additional messages skipped".format(record.msg)
170-
record.args = record.args + (logging_bucket.skipped,) # type: ignore
171-
55+
# DEV: current unix time / rate (e.g. 300 seconds) = time bucket
56+
# int(1546615098.8404942 / 300) = 515538
57+
# DEV: LogRecord `created` is a unix timestamp/float
58+
# DEV: LogRecord has `levelname` and `levelno`, we want `levelno` e.g. `logging.DEBUG = 10`
59+
current_bucket = int(record.created / _rate_limit)
60+
# Limit based on logger name, record level, filename, and line number
61+
# ('ddtrace.writer', 'DEBUG', '../site-packages/ddtrace/writer.py', 137)
62+
# This way each unique log message can get logged at least once per time period
63+
# DEV: LogRecord has `levelname` and `levelno`, we want `levelno` e.g. `logging.DEBUG = 10`
64+
key = (record.name, record.levelno, record.pathname, record.lineno)
65+
# Only log this message if the time bucket has changed from the previous time we ran
66+
logging_bucket = _buckets[key]
67+
if logging_bucket.bucket != current_bucket:
68+
# Append count of skipped messages if we have skipped some since our last logging
69+
if logging_bucket.skipped:
70+
record.msg = "{}, %s additional messages skipped".format(record.msg)
71+
record.args = record.args + (logging_bucket.skipped,) # type: ignore
17272
# Reset our bucket
173-
self.buckets[key] = DDLogger.LoggingBucket(current_bucket, 0)
174-
175-
# Call the base handle to actually log this record
176-
super(DDLogger, self).handle(record)
177-
else:
178-
# Increment the count of records we have skipped
179-
# DEV: `self.buckets[key]` is a tuple which is immutable so recreate instead
180-
self.buckets[key] = DDLogger.LoggingBucket(logging_bucket.bucket, logging_bucket.skipped + 1)
73+
_buckets[key] = LoggingBucket(current_bucket, 0)
74+
# Actually log this record
75+
return True
76+
# Increment the count of records we have skipped
77+
# DEV: `buckets[key]` is a tuple which is immutable so recreate instead
78+
_buckets[key] = LoggingBucket(logging_bucket.bucket, logging_bucket.skipped + 1)
79+
# Skip this log message
80+
return False
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
fixes:
3+
- |
4+
internal: Fix ``ddtrace`` internal logger initialization mutating an unlocked shared resource.

0 commit comments

Comments
 (0)