Skip to content

Conversation

@gyuheon0h
Copy link
Contributor

@gyuheon0h gyuheon0h commented Oct 16, 2025

What does this PR do?
We want to collect runtime frames for the crashtracker. This approach follows how we access the call stack for profiling.

Motivation:

Change log entry

Additional Notes:

How to test the change?

"Ruby and C method runtime stack capture" RSpec test

Run a ruby program with the crashtracker initialized. Runtime stacks should be visible in the experimental section of the crash report. Stacktrace emitted from an example script I wrote to test

{
  "format": "Datadog Runtime Callback 1.0",
  "frames": [
    {
      "file": "<Fiddle (C extension)>",
      "function": "free"
    },
    {
      "file": "debug_runtime_callback.rb",
      "function": "final_crash_point",
      "line": 244
    },
    {
      "file": "<Integer (C extension)>",
      "function": "times"
    },
    {
      "file": "debug_runtime_callback.rb",
      "function": "final_crash_point",
      "line": 243
    },
   .....
    {
      "file": "debug_runtime_callback.rb",
      "function": "main_crash_test",
      "line": 251
    },
    {
      "file": "debug_runtime_callback.rb",
      "function": "<main>",
      "line": 276
    },
    {
      "file": "<Kernel (C extension)>",
      "function": "fork"
    }
  ]
}

@gyuheon0h gyuheon0h requested review from a team as code owners October 16, 2025 09:34
@gyuheon0h gyuheon0h marked this pull request as draft October 16, 2025 09:34
@github-actions
Copy link

github-actions bot commented Oct 16, 2025

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2025-11-18 21:06:17 UTC

@github-actions github-actions bot added the core Involves Datadog core libraries label Oct 16, 2025
@github-actions
Copy link

github-actions bot commented Oct 16, 2025

Typing analysis

Ignored files

This PR introduces 1 ignored file. It decreases the percentage of typed files from 38.32% to 38.28% (-0.04%).

Ignored files (+1-0)Introduced:
lib/datadog/core/crashtracking/component.rb

Note: Ignored files are excluded from the next sections.

Untyped other declarations

This PR clears 1 untyped other declaration. It decreases the percentage of typed other declarations from 68.02% to 67.96% (-0.06%).

Untyped other declarations (+0-1)Cleared:
sig/datadog/core/crashtracking/component.rbs:33
└── attr_reader logger: untyped

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Oct 20, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 92.59%
Total Coverage: 95.16% (-0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 6e6e165 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from e628919 to 564c828 Compare October 24, 2025 15:37
Copy link
Contributor Author

gyuheon0h commented Oct 24, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 3 times, most recently from f102a46 to 53afc20 Compare October 31, 2025 15:19
@gyuheon0h
Copy link
Contributor Author

gyuheon0h commented Nov 8, 2025

Paper trail

[ ] Checking unreasonable string sizes
[ ] Checking that string pointers point to valid strings
[ ] Checking that control frames are valid
[ ] Checking that iseq is valid, and the instruction size is not unreasonable
[ ] Validating that pointers are readable using mincore
[ ] Checking for recursive frames
[ ] Strings can have different representations, take that into account
[ ] Ruby apps commonly have very deep stacks. We default in the profiler to collecting 400 and we've seen GitHub go close to 600, handle this
[ ] Pay attention to structure keeping the bytecode-to-line mapping

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 6 times, most recently from 77d1391 to 1ebb325 Compare November 17, 2025 02:56
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from 1ebb325 to 4e51bf3 Compare November 17, 2025 03:40
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 2 times, most recently from 3c782d0 to 8d957bf Compare November 17, 2025 03:48
@github-actions github-actions bot added the profiling Involves Datadog profiling label Nov 17, 2025
@gyuheon0h gyuheon0h marked this pull request as ready for review November 17, 2025 03:52
@gyuheon0h gyuheon0h changed the title [WIP][crashtracking] Runtime stack collection callback registration [crashtracking] Runtime stack collection callback registration Nov 17, 2025
@gyuheon0h gyuheon0h changed the title [crashtracking] Runtime stack collection callback registration [PROF-12743] Runtime stack collection callback registration Nov 17, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 10 times, most recently from d056dfc to 95fcf25 Compare November 23, 2025 02:30
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 3 times, most recently from 7b89539 to 3c6b8ae Compare November 24, 2025 20:28
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 3 times, most recently from a254d60 to 1401c5f Compare November 24, 2025 21:32
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 4 times, most recently from 47cf97a to 92758fd Compare November 25, 2025 15:19
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from 92758fd to 6c25fde Compare November 25, 2025 15:23
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a big pass! I think in general this seems reasonable -- most of my thoughts/worries are around ruby_runtime_stack_callback and making sure we get it into shape.

Comment on lines +135 to +139
// Check if the heap object pointed to by str is readable
if (!is_pointer_readable((const void *)str, sizeof(struct RBasic))) return false;

// For strings, we need to check the full RString structure
if (!is_pointer_readable(RSTRING(str), sizeof(struct RString))) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this AI...? This seems a redundant check...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be due to my unfamiliarity with Ruby internals, but the first check is intended to ensure the VALUE actually points to a live struct RBasic. Without the first check, even touching RSTRING(str) would dereference garbage and crash.The second check validates the string-specific payload (struct RString) since Ruby string’s header extends RBasic with extra fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... In fact RSTRING(...) is just a cast so that's why I was confused:

/**
 * Convenient casting macro.
 *
 * @param   obj  An object, which is in fact an ::RString.
 * @return  The passed object casted to ::RString.
 */
#define RSTRING(obj)            RBIMPL_CAST((struct RString *)(obj))

but TBH it's probably best to avoid as much as possible Ruby macros and other calls here to avoid relying on too many assumptions of what they do.

That is -- this should probably be a is_pointer_readable((const void *)str, sizeof(struct RString)) and thus avoid the reliance on Ruby headers other than to know "what's the size of it".

Comment on lines +167 to +182
static bool is_valid_control_frame(const rb_control_frame_t *cfp,
const rb_execution_context_t *ec) {
if (!cfp) return false;

void *stack_start = ec->vm_stack;
void *stack_end = (char*)stack_start + ec->vm_stack_size * sizeof(VALUE);
if ((void*)cfp < stack_start || (void*)cfp >= stack_end) {
return false;
}

if (!is_pointer_readable(cfp, sizeof(rb_control_frame_t))) {
return false;
}

return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're validating some pointers, but not others... Why?

Comment on lines +198 to +200
static void ruby_runtime_stack_callback(
void (*emit_frame)(const ddog_crasht_RuntimeStackFrame*)
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function is very complex, because it also does a very complex thing. My question is -- what happens if we get something wrong here? That is, we already have a bunch of safeguards, but what can actually happen if we miss a spot in pratice? Is there a safety net for us, or not?

function_name = safe_string_ptr(name);
}

VALUE filename = rb_iseq_path(iseq);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this slightly makes me nervous, and comes back to my question of "do we have a safety net for this or not". We're asking the VM to get something for us and thus we have no control over valid pointers and the VM doing sane things if something is wrong...

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from c7a819c to ef62258 Compare November 25, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants