Skip to content

Commit 474dfb1

Browse files
authored
fix(profiling): get rid of leaky string table (DataDog#11612)
Profiler samples typically contain several strings, for sample labels (task name, endpoint/resource, etc) and frame info. Those strings tend to be repeated. We build samples across several API calls, from Python, Cython, C Python extensions, and C++ code, making it hard to reason about the lifetimes of the strings we get. So, it's easier to just copy the strings we need while building a sample. And it's preferably to only store one copy. So, our libdatadog wrapper uses a string table, which addresses both of these concerns. However, nothing is ever removed from the table. And some things like task names can have new values each time, meaning we have unbounded memory growth. I previously tried to fix this by periodically clearing the string table. However, this comes with its own new problems. We can construct samples in parallel with profile uploading. So if we try to clear the table when we upload the profile, or really any other time, we need to make sure in-progress samples keep their strings around until the sample is flushed to libdatadog. And we need to handle forking so we don't leak a whole string table on fork. It turns out we don't really need that string table at all. Libdatadog already interns strings once the sample makes it into the libdatadog profile. That solves the duplicate string problem. We still want to copy strings while building the sample, but we can just do that on a per-sample basis, and the strings only need to be alive while we build a sample. We can use a small amount of scratch space for each sample to copy strings. We will likely only have a few samples in flight at any time, and we already pool the samples. So we can reuse this scratch memory effectively. We'll only use as memory in proportion to the number of samples being built concurrently. So, this commit implements a simple string arena for each sample object to use. We may want to tweak the default buffer size and how much memory we keep around when resetting the arena. We could even deduplicate, since we might see some duplicates per-sample from frame info. PROF-10824
1 parent 4ca6009 commit 474dfb1

File tree

5 files changed

+85
-32
lines changed

5 files changed

+85
-32
lines changed

ddtrace/internal/datadog/profiling/dd_wrapper/include/profile.hpp

-14
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,10 @@
44
#include "types.hpp"
55

66
#include <atomic>
7-
#include <deque>
87
#include <memory>
98
#include <mutex>
109
#include <string>
1110
#include <string_view>
12-
#include <unordered_set>
1311
#include <vector>
1412

1513
extern "C"
@@ -19,10 +17,6 @@ extern "C"
1917

2018
namespace Datadog {
2119

22-
// Unordered containers don't get heterogeneous lookup until gcc-10, so for now use this
23-
// strategy to dedup + store strings.
24-
using StringTable = std::unordered_set<std::string_view>;
25-
2620
// Serves to collect individual samples, as well as lengthen the scope of string data
2721
class Profile
2822
{
@@ -33,11 +27,6 @@ class Profile
3327
std::atomic<bool> first_time{ true };
3428
std::mutex profile_mtx{};
3529

36-
// Storage for strings
37-
std::deque<std::string> string_storage{};
38-
StringTable strings{};
39-
std::mutex string_table_mtx{};
40-
4130
// Configuration
4231
SampleType type_mask{ 0 };
4332
unsigned int max_nframes{ g_default_max_nframes };
@@ -69,9 +58,6 @@ class Profile
6958
ddog_prof_Profile& profile_borrow();
7059
void profile_release();
7160

72-
// String table manipulation
73-
std::string_view insert_or_get(std::string_view str);
74-
7561
// constref getters
7662
const ValueIndex& val();
7763

ddtrace/internal/datadog/profiling/dd_wrapper/include/sample.hpp

+41
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,44 @@ extern "C"
1515

1616
namespace Datadog {
1717

18+
namespace internal {
19+
20+
// StringArena holds copies of strings we need while building samples.
21+
// StringArena is intended to amortize allocations, so that in the common
22+
// case we can just do a memcpy for each string we want to copy rather than
23+
// a new allocation for each string.
24+
//
25+
// We need to make copies right now because we don't have strong guarantees
26+
// that the strings we get (from Python, Cython, C++, etc) are alive the
27+
// whole time we build samples.
28+
struct StringArena
29+
{
30+
// Default size, in bytes, of each Chunk. The value is a power of 2 (nice to
31+
// allocate) that is bigger than any actual sample string size seen over a
32+
// random selection of a few hundred Python profiles at Datadog. So ideally
33+
// we only need one chunk, which we can reuse between samples
34+
static constexpr size_t DEFAULT_SIZE = 16 * 1024;
35+
// Strings are backed by fixed-size Chunks. The Chunks can't grow, or
36+
// they'll move and invalidate pointers into the arena. At the same time,
37+
// they must be dynamically sized at creation because we get arbitrary
38+
// user-provided strings.
39+
using Chunk = std::vector<char>;
40+
// We keep the Chunks for this arena in a vector so we can track them, and
41+
// free them when the StringArena is deallocated.
42+
std::vector<Chunk> chunks;
43+
44+
StringArena();
45+
// Clear the backing data of the arena, except for a smaller initial segment.
46+
// Views returned by insert are invalid after this call.
47+
void reset();
48+
// Copies the contents of s into the arena and returns a view of the copy in
49+
// the arena. The returned view is valid until the next call to reset, or
50+
// until the arena is destroyed.
51+
std::string_view insert(std::string_view s);
52+
};
53+
54+
} // namespace internal
55+
1856
class SampleManager; // friend
1957

2058
class Sample
@@ -45,6 +83,9 @@ class Sample
4583
// Additional metadata
4684
int64_t endtime_ns = 0; // end of the event
4785

86+
// Backing memory for string copies
87+
internal::StringArena string_storage{};
88+
4889
public:
4990
// Helpers
5091
bool push_label(ExportLabelKey key, std::string_view val);

ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp

-15
Original file line numberDiff line numberDiff line change
@@ -162,21 +162,6 @@ Datadog::Profile::one_time_init(SampleType type, unsigned int _max_nframes)
162162
first_time.store(false);
163163
}
164164

165-
std::string_view
166-
Datadog::Profile::insert_or_get(std::string_view str)
167-
{
168-
const std::lock_guard<std::mutex> lock(string_table_mtx); // Serialize access
169-
170-
auto str_it = strings.find(str);
171-
if (str_it != strings.end()) {
172-
return *str_it;
173-
}
174-
175-
string_storage.emplace_back(str);
176-
strings.insert(string_storage.back());
177-
return string_storage.back();
178-
}
179-
180165
const Datadog::ValueIndex&
181166
Datadog::Profile::val()
182167
{

ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp

+38-3
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,42 @@
44

55
#include <algorithm>
66
#include <chrono>
7+
#include <string_view>
78
#include <thread>
89

10+
Datadog::internal::StringArena::StringArena()
11+
{
12+
chunks.emplace_back();
13+
chunks.back().reserve(Datadog::internal::StringArena::DEFAULT_SIZE);
14+
}
15+
16+
void
17+
Datadog::internal::StringArena::reset()
18+
{
19+
// Free chunks. Keep the first one around so it's easy to reuse this without
20+
// needing new allocations every time. We can completely drop it to get rid
21+
// of everything
22+
// TODO - we could consider keeping more around if it's not too costly.
23+
// The goal is to not retain more than we need _on average_. If we have
24+
// mostly small samples and then a rare huge one, we can end up with
25+
// all samples in our pool using as much memory as the largets ones we've seen
26+
chunks.front().clear();
27+
chunks.erase(++chunks.begin(), chunks.end());
28+
}
29+
30+
std::string_view
31+
Datadog::internal::StringArena::insert(std::string_view s)
32+
{
33+
auto chunk = &chunks.back();
34+
if ((chunk->capacity() - chunk->size()) < s.size()) {
35+
chunk = &chunks.emplace_back();
36+
chunk->reserve(std::max(s.size(), Datadog::internal::StringArena::DEFAULT_SIZE));
37+
}
38+
int base = chunk->size();
39+
chunk->insert(chunk->end(), s.begin(), s.end());
40+
return std::string_view(chunk->data() + base, s.size());
41+
}
42+
943
Datadog::Sample::Sample(SampleType _type_mask, unsigned int _max_nframes)
1044
: max_nframes{ _max_nframes }
1145
, type_mask{ _type_mask }
@@ -28,8 +62,8 @@ void
2862
Datadog::Sample::push_frame_impl(std::string_view name, std::string_view filename, uint64_t address, int64_t line)
2963
{
3064
static const ddog_prof_Mapping null_mapping = { 0, 0, 0, to_slice(""), to_slice("") };
31-
name = profile_state.insert_or_get(name);
32-
filename = profile_state.insert_or_get(filename);
65+
name = string_storage.insert(name);
66+
filename = string_storage.insert(filename);
3367

3468
CodeProvenance::get_instance().add_filename(filename);
3569

@@ -73,7 +107,7 @@ Datadog::Sample::push_label(const ExportLabelKey key, std::string_view val)
73107
}
74108

75109
// Otherwise, persist the val string and add the label
76-
val = profile_state.insert_or_get(val);
110+
val = string_storage.insert(val);
77111
auto& label = labels.emplace_back();
78112
label.key = to_slice(key_sv);
79113
label.str = to_slice(val);
@@ -106,6 +140,7 @@ Datadog::Sample::clear_buffers()
106140
labels.clear();
107141
locations.clear();
108142
dropped_frames = 0;
143+
string_storage.reset();
109144
}
110145

111146
bool
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
fixes:
3+
- |
4+
profiling: fix unbounded memory usage growth caused by keeping arbitrary
5+
user-generated strings (e.g. asyncio Task names) in an internal table and
6+
never removing them.

0 commit comments

Comments
 (0)