Is there a way to achieve correct segment parent-child relationship with batch utility and aws sdk clients captured with the tracer utility? #4185
-
Hi, as title states I have found myself stuck on an issue of producing AWS X-ray segments per each record processed with a batch utility. As documentation correctly suggests Batch processing | Tracing with AWS X-Ray, when working inside of a recordHandler, it makes sense to work with the created subsegment directly since they provide an object with which we can work independently for every invocation of the recordHandler. On the other hand, other utilities like logger would require us to spawn a child logger to eliminate a possibility of appended keys' override by handlers which were invoked concurrently. This brings us to the issue: what if my recordHandler uses aws sdk client patched by tracer? import {
BatchProcessor,
EventType,
processPartialResponse,
} from '@aws-lambda-powertools/batch';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { captureLambdaHandler } from '@aws-lambda-powertools/tracer/middleware';
import middy from '@middy/core';
import type { SQSEvent, SQSHandler, SQSRecord } from 'aws-lambda';
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
const processor = new BatchProcessor(EventType.SQS);
const tracer = new Tracer({ serviceName: 'serverlessAirline' });
const s3Client = tracer.captureAWSv3Client(
new S3Client({ region: 'eu-west-1' })
);
const recordHandler = async (record: SQSRecord): Promise<void> => {
const subsegment = tracer.getSegment()?.addNewSubsegment('### recordHandler');
const command = new PutObjectCommand({
Body: record.body,
Bucket: "examplebucket",
Key: "objectkey"
});
await s3Client.send(command)
subsegment?.close();
};
export const handler: SQSHandler = middy(async (event: SQSEvent, context) =>
processPartialResponse(event, recordHandler, processor, {
context,
})
).use(captureLambdaHandler(tracer)); Ideally, we would want to be able to produce the following structure, however there is no way for tracer to know to which segment it should attach a new subsegment to correctly show the connections.
The use case for such x-ray trace would be that it represents the reality in which recordHandlers run concurrently + if recordHandler segment would put an annotation with some sort of message id, then it would allow for queries like: Which segments have failed while processing a message with id "123". However, as you might imagine, depending on how long each recorHandler takes to execute the x-ray trace may take a shape of an arbitrary structure which would make it hard to interpret parent/child relationships of trace segment. So, in summary, do you have an idea of how such tracing may be accomplished and whether it's even possible? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
HI, I have come up with an idea how I would be able to solve it, yet the method used from aws-xray-sdk-core Powertools tracer provider was restricted to 1 parameter instead of its original 2. I wouldn't mind if it was on the tracer itself, but I thought that the provider object would allow me to use the provider (in this case aws-xray-sdk) the way it was written x-ray-sdk. |
Beta Was this translation helpful? Give feedback.
-
Hey, sorry for the delay - I spent some time thinking about your question and unfortunately I didn't come up with anything very useful. The use case you're describing is valid however you're hitting a limitation of the X-Ray SDK for Node.js when working with promises. The X-Ray SDK uses on an older version of context tracking based on this module, which in turns uses Node.js Because of this, the whole context tracking for segments breaks down when running parallel async code - which is also why we added the workaround you linked to the docs. When it comes to patched AWS SDK clients however the parent segment is resolved automatically based on the currently active one, which breaks the workaround. With this in mind, I am unsure how to get around this beyond suggesting to not use |
Beta Was this translation helpful? Give feedback.
-
To follow up on my previous comment, I think I found a workaround that requires a bit of custom implementation but that achieves what you're looking for. Before using it though, please be aware that this is a proof of concept I made only to demonstrate the overall logic involved to attach AWS SDK operation's metadata to a X-Ray segment. If you decide to adopt it, I'd highly recommend you test it with the whole range of SDK operations you want to trace and make sure it doesn't break your workload - I have only tested this manually with one single operation. With that out of the way, I went and reverse engineered the way the X-Ray SDK patches AWS SDK clients and I got to a working solution that decorates a subsegment with most/all the fields and metadata needed for X-Ray to properly represent the segment. This includes proper lineage, metadata, and operation information. Since this logic is applied on a specific subsegment reference, it works also when using parallel processing since the broken async tracking of segments mentioned in my previous comment is never involved. This is the end result on the CloudWatch UI: and this is how the code looks like: import {
BatchProcessor,
EventType,
processPartialResponse,
} from '@aws-lambda-powertools/batch';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { captureLambdaHandler } from '@aws-lambda-powertools/tracer/middleware';
import { GetBucketAclCommand, S3Client } from '@aws-sdk/client-s3';
import middy from '@middy/core';
import type { Context, SQSEvent, SQSHandler, SQSRecord } from 'aws-lambda';
import type { Subsegment } from 'aws-xray-sdk-core';
type MetadataType = {
requestId?: string;
extendedRequestId?: string;
attempts?: number;
httpStatusCode?: number;
};
type HttpAttributesType = {
response?: {
status: number;
};
};
const processor = new BatchProcessor(EventType.SQS);
const tracer = new Tracer({ serviceName: 'serverlessAirline' });
const s3Client = new S3Client({ region: 'us-east-1' });
/**
* Adds AWS X-Ray attributes to a subsegment based on command metadata
* Following the AWS X-Ray SDK structure for proper segment visualization
*/
const addXRayAttributes = (
segment: Subsegment | undefined,
metadata: MetadataType | undefined,
operation: string,
params: Record<string, unknown>,
service: string,
region: string
) => {
// Process service-specific attributes according to AWS X-Ray whitelist
const processedParams: Record<string, unknown> = { ...params };
const awsSpecificAttributes: Record<string, unknown> = {};
// Handle S3 bucket name according to AWS X-Ray whitelist
if (service === 'S3' && 'Bucket' in params) {
// For S3 operations, bucket is captured as bucket_name instead of in params
awsSpecificAttributes.bucket_name = params.Bucket;
delete processedParams.Bucket;
}
// Add AWS attributes
const awsAttributes = {
operation,
region,
request_id: metadata?.requestId,
retries: metadata?.attempts || 0,
extendedRequestId: metadata?.extendedRequestId,
request: {
operation,
params: processedParams,
},
...awsSpecificAttributes,
};
// Add HTTP attributes
const httpAttributes: HttpAttributesType = {};
if (metadata?.httpStatusCode) {
httpAttributes.response = {
status: metadata.httpStatusCode,
};
}
// Add attributes to segment
segment?.addAttribute('aws', awsAttributes);
segment?.addAttribute('http', httpAttributes);
// Add appropriate flags based on status code
if (metadata?.httpStatusCode === 429) {
segment?.addThrottleFlag();
} else if (metadata?.httpStatusCode && metadata.httpStatusCode >= 500) {
segment?.addFaultFlag();
} else if (metadata?.httpStatusCode && metadata.httpStatusCode >= 400) {
segment?.addErrorFlag();
}
};
/**
* Extracts operation name from AWS SDK command constructor
* Removes the "Command" suffix from the command's name
*/
const getOperationName = (command: object): string => {
// Get the constructor name (e.g., "GetBucketAclCommand")
const commandName = command.constructor.name;
// Remove "Command" suffix
return commandName.endsWith('Command')
? commandName.slice(0, -7)
: commandName;
};
const recordHandler = async (record: SQSRecord): Promise<void> => {
const subsegment = tracer.getSegment()?.addNewSubsegment('### recordHandler');
// Create a new subsegment for the S3 operation
const serviceName = s3Client.config.serviceId;
const regionName = await s3Client.config.region();
const sdkSegment = subsegment?.addNewSubsegment(serviceName);
sdkSegment?.addAttribute('namespace', 'aws');
// Create the S3 command
const command = new GetBucketAclCommand({ Bucket: record.body });
// Extract operation name and command input
const operation = getOperationName(command);
const commandInput = { Bucket: record.body };
try {
// Send the command using the AWS SDK client
const res = await s3Client.send(command);
// Add X-Ray attributes to the segment
addXRayAttributes(
sdkSegment,
res.$metadata,
operation,
commandInput,
serviceName,
regionName
);
} catch (error) {
const err = error as Error & { $metadata?: MetadataType };
// Add X-Ray attributes for error reporting
addXRayAttributes(
sdkSegment,
err.$metadata,
operation,
commandInput,
serviceName,
regionName
);
sdkSegment?.addError(err);
} finally {
// Close the S3 operation subsegment
sdkSegment?.close();
}
subsegment?.close();
};
export const handler: SQSHandler = middy(
async (event: SQSEvent, context: Context) =>
processPartialResponse(event, recordHandler, processor, {
context,
})
).use(captureLambdaHandler(tracer)); At this stage I'm not inclined to add this to the Tracer library primarily because it'd require a lot of testing to make sure it's generic enough to work with all AWS SDK clients. This in itself is not a big deal, however we're currently shifting our focus to OpenTelemetry (OTEL) for tracing #665 and so I'd prefer to invest time/effort there when it comes to new features - if OTEL is something you're interested in, please feel free to engage in the RFC discussion I linked. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
Hi, I believe wrapping the record handler in tracer.provider.getNamspace().runPromise() works as well. A manual segment has to be created and set inside that promise though. I can share the snippet later, but it did work out pretty nice for me. I did, however, abandon instrumenting clients with x-ray in favour of adot, which I saw is something you are interested as well (just consider that a very supportive +1 from me in that direction) |
Beta Was this translation helpful? Give feedback.
To follow up on my previous comment, I think I found a workaround that requires a bit of custom implementation but that achieves what you're looking for.
Before using it though, please be aware that this is a proof of concept I made only to demonstrate the overall logic involved to attach AWS SDK operation's metadata to a X-Ray segment. If you decide to adopt it, I'd highly recommend you test it with the whole range of SDK operations you want to trace and make sure it doesn't break your workload - I have only tested this manually with one single operation.
With that out of the way, I went and reverse engineered the way the X-Ray SDK patches AWS SDK clients and I got to a working solution …