Skip to content

Commit

Permalink
Add spark.stage details attribute at the end of the stage (#7608)
Browse files Browse the repository at this point in the history
Adds the spark.stage details attribute at the end of the stage, rather than at the beginning.

The details attribute contains a large amount of data, including the full stack trace that initiated the stage. When using long-running spans, the span is flushed multiple times, which can significantly increase the ingestion volume.

By adding the details attribute at the end of the stage, we reduce the ingestion volume while still ensuring the information is available once the stage has completed.
  • Loading branch information
paul-laffon-dd authored Sep 12, 2024
1 parent 435a1d2 commit a472c9d
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,6 @@ public synchronized void onStageSubmitted(SparkListenerStageSubmitted stageSubmi
"parent_stage_ids", Arrays.toString(getStageParentIds(stageSubmitted.stageInfo())))
.withTag("task_count", stageSubmitted.stageInfo().numTasks())
.withTag("attempt_id", stageAttemptId)
.withTag("details", stageSubmitted.stageInfo().details())
.withTag(DDTags.RESOURCE_NAME, stageSubmitted.stageInfo().name())
.start();

Expand All @@ -519,6 +518,7 @@ public synchronized void onStageCompleted(SparkListenerStageCompleted stageCompl
return;
}

span.setTag("details", stageCompleted.stageInfo().details());
if (stageInfo.failureReason().isDefined()) {
span.setError(true);
span.setErrorMessage(getErrorMessageWithoutStackTrace(stageInfo.failureReason().get()));
Expand Down

0 comments on commit a472c9d

Please sign in to comment.