Skip to content

Genesis LLO Support in ADOT SDK #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

yiyuan-he
Copy link
Contributor

@yiyuan-he yiyuan-he commented May 12, 2025

What does this pull request do?

Adds support to handle LLO from third-party instrumentation SDKs in ADOT SDK.

The following SDKs are supported:

  • Traceloop/Openllmetry
  • OpenInference
  • OpenLit

Note: OTel dependencies in ADOT SDK have been loosened as a short-term workaround to support the various conflicting dependency requirements of third-party instrumentation SDKs.

Test plan

Built this custom ADOT SDK into various sample apps and exported the span and logs data to the OTLP X-Ray and Logs endpoint, respectively, to validate the LLO extraction and transformation to Gen AI Events.

Configurations tested:

  • LangChain + Traceloop/Openllmetry
  • LangChan + OpenInference
  • LangChain + OpenLit
  • CrewAI + Traceloop/Openllmetry
  • CrewAI + OpenInference
  • CrewAI + OpenLit

Environment variable configuration:

λ  env OTEL_METRICS_EXPORTER=none \
       OTEL_TRACES_EXPORTER=otlp \
       OTEL_LOGS_EXPORTER=otlp \
       OTEL_PYTHON_DISTRO=aws_distro \
       OTEL_PYTHON_CONFIGURATOR=aws_configurator \
       OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
       OTEL_EXPORTER_OTLP_LOGS_HEADERS="x-aws-log-group=test,x-aws-log-stream=default" \
       OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://xray.us-east-1.amazonaws.com/v1/traces \
       OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=https://logs.us-east-1.amazonaws.com/v1/logs \
       OTEL_RESOURCE_ATTRIBUTES="service.name=langchain-app" \
       OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED="true" \
       OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="true" \
       OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="http,sqlalchemy,psycopg2,pymysql,sqlite3,aiopg,asyncpg,mysql_connector,botocore,boto3,urllib3,requests,starlette" \
       AGENT_OBSERVABILITY_ENABLED="true" \
       python app.py

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yiyuan-he yiyuan-he requested a review from a team as a code owner May 12, 2025 20:54
@yiyuan-he yiyuan-he force-pushed the genesis-llo-extraction-dev-v2 branch from afd727c to 49069e2 Compare May 12, 2025 20:57
@yiyuan-he yiyuan-he force-pushed the genesis-llo-extraction-dev-v2 branch from 49069e2 to f4e93d6 Compare May 12, 2025 21:14
@yiyuan-he yiyuan-he changed the base branch from genesis_dev to main May 12, 2025 21:14
@yiyuan-he yiyuan-he force-pushed the genesis-llo-extraction-dev-v2 branch from dbc0fcf to d04f786 Compare May 15, 2025 20:11
@yiyuan-he yiyuan-he changed the base branch from main to genesis_dev May 16, 2025 03:08
@yiyuan-he yiyuan-he changed the base branch from genesis_dev to main May 16, 2025 03:09
@yiyuan-he yiyuan-he changed the base branch from main to genesis-dev-v2 May 16, 2025 03:10
@yiyuan-he yiyuan-he changed the title Genesis LLO Handling [WIP] Genesis LLO Handling May 16, 2025
"""
events = []
span_ctx = span.context
gen_ai_system = span.attributes.get("traceloop.entity.name", "unknown")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gen_ai.system is not always available in the span. This is a best attempt at still retrieving a relevant value.

Comment on lines +228 to +233
all_events = []
all_events.extend(self._extract_gen_ai_prompt_events(span, attributes, event_timestamp))
all_events.extend(self._extract_gen_ai_completion_events(span, attributes, event_timestamp))
all_events.extend(self._extract_traceloop_events(span, attributes, event_timestamp))
all_events.extend(self._extract_openlit_span_event_attributes(span, attributes, event_timestamp))
all_events.extend(self._extract_openinference_attributes(span, attributes, event_timestamp))
Copy link
Contributor Author

@yiyuan-he yiyuan-he May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for more third-party SDKs can be added by following this pattern.

I don't think we can go further with isolating these rules since there is no consistent and generic way to determine which SDK family is being instrumented. Also, many of these third-party SDKs may have overlapping rules for generic OTel attributes such as gen_ai.prompt.{n}.content and gen_ai.completion.{n}.content.

Comment on lines +166 to +169
logging_enabled = os.getenv(_OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED, "false")
if logging_enabled.strip().lower() == "true":
_init_logging(log_exporters, resource)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed order of pipeline initialization so we can pass the global logger provider instance to the span pipeline if AGENT_OBSERVABILITY_ENABLED flag is enabled.

import sys
from logging import Logger, getLogger

import pkg_resources

_logger: Logger = getLogger(__name__)

AGENT_OBSERVABILITY_ENABLED = "AGENT_OBSERVABILITY_ENABLED"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLO handling is gated behind this configuration. If false, then the ADOT SDK will have default behavior in span and logs pipelines.

@yiyuan-he yiyuan-he changed the title Genesis LLO Handling Genesis LLO Support in ADOT SDK May 16, 2025
@yiyuan-he
Copy link
Contributor Author

yiyuan-he commented May 17, 2025

Merging to genesis-dev-v2 branch to better enable E2E integration testing.

@yiyuan-he yiyuan-he merged commit 409cb6a into aws-observability:genesis-dev-v2 May 17, 2025
yiyuan-he added a commit that referenced this pull request May 18, 2025
## What does this pull request do?
Handles additional CrewAI LLO attributes not originally [in
scope](https://quip-amazon.com/Dni9AztXMB2x/Genesis-observability-trace-data-attributes).
Certain configurations of CrewAI Agents in customer applications can
produce the following LLO attributes:
- `gen_ai.agent.human_input` -> generated by
[OpenLit](https://github.com/openlit/openlit/blob/9f285555330ae7c92f3382c105c44373b2c9a77d/sdk/python/src/openlit/semcov/__init__.py#L269C33-L269C57)
- `gen_ai.agent.actual_output` -> generated by
[OpenLit](https://github.com/openlit/openlit/blob/9f285555330ae7c92f3382c105c44373b2c9a77d/sdk/python/src/openlit/semcov/__init__.py#L268)
- `crewai.crew.tasks_output` -> generated by
[Traceloop/Openllmetry](https://github.com/traceloop/openllmetry/blob/de23561e4e45fc63a8f0020f15e68df525cf29c1/packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/crewai_span_attributes.py#L31)
- `crewai.crew.result` -> generated by
[Traceloop/Openllmetry](https://github.com/traceloop/openllmetry/blob/de23561e4e45fc63a8f0020f15e68df525cf29c1/packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/crewai_span_attributes.py#L31)

Example CrewAI Agent Configurations:
```
assistant_agent = Agent(
    role="Assistant",
    goal="Provide helpful responses to user queries",
    backstory="You are a helpful assistant that provides accurate and useful information.",
    verbose=True,
    llm=llm,
)

crew = Crew(
    agents=[assistant_agent],
    tasks=[response_task],
    verbose=True,
    process=Process.sequential
)
```

Related PR:
#361

## Test plan
Built this custom ADOT SDK into various sample apps and exported the
span and logs data to the OTLP X-Ray and Logs endpoint, respectively, to
validate the LLO extraction and transformation to Gen AI Events.

Configurations tested:
- CrewAI + Traceloop/Openllmetry
- CrewAI + OpenInference
- CrewAI + OpenLit

Environment variable configuration:
```
λ  env OTEL_METRICS_EXPORTER=none \
       OTEL_TRACES_EXPORTER=otlp \
       OTEL_LOGS_EXPORTER=otlp \
       OTEL_PYTHON_DISTRO=aws_distro \
       OTEL_PYTHON_CONFIGURATOR=aws_configurator \
       OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
       OTEL_EXPORTER_OTLP_LOGS_HEADERS="x-aws-log-group=test,x-aws-log-stream=default" \
       OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://xray.us-east-1.amazonaws.com/v1/traces \
       OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=https://logs.us-east-1.amazonaws.com/v1/logs \
       OTEL_RESOURCE_ATTRIBUTES="service.name=langchain-app" \
       OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED="true" \
       OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="true" \
       OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="http,sqlalchemy,psycopg2,pymysql,sqlite3,aiopg,asyncpg,mysql_connector,botocore,boto3,urllib3,requests,starlette" \
       AGENT_OBSERVABILITY_ENABLED="true" \
       python app.py
```

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
liustve added a commit that referenced this pull request May 19, 2025
Supporting ADOT auto instrumentation to handle LLO log events.

*Description of changes:*

1. Adds `AwsBatchLogRecordProcessor` a backwards compatible custom logs
BatchProcessor which has the following invariants:
- The unserialized, uncompressed data size of exported batches will
ALWAYS be <= 1 MB except for the case below:
- If the data size of an `exported batch is ever > 1 MB` then the batch
size is always length 1
2. `OTLPAwsLogExporter`: Adds a new behavior for Retry delay based on
server-side response of Retry-After header. Injects the LLO header flag
if the size of the exported data > 1 MB.
3. Customize the auto instrumentation to use the new
`AwsBatchLogRecordProcessor`

*Testing:*

TODO:
1. Add unit tests to validate behavior of `AwsBatchLogRecordProcessor`
and `OTLPAwsLogExporter`
2. E2E testing to validate any performance hits and compatibility with:
#361


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
yiyuan-he added a commit that referenced this pull request May 19, 2025
## What does this pull request do?
Refactoring changes to improve performance, especially for spans with
few or no LLO attributes, while maintaining the same functionality and
behavior.

## Test plan
Same test strategy as
#361
and
#365

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant