Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Upgrading to 1.4.0 breaks sending traces #2889

Closed
jc-beyer-tqgg opened this issue Oct 11, 2024 · 13 comments
Closed

[Bug]: Upgrading to 1.4.0 breaks sending traces #2889

jc-beyer-tqgg opened this issue Oct 11, 2024 · 13 comments
Labels
🐛 bug Something isn't working

Comments

@jc-beyer-tqgg
Copy link

jc-beyer-tqgg commented Oct 11, 2024

Bug report

Hey everyone !
After updating dd_trace extension to 1.4.0 from 1.3.2, traces are not being send anymore.

I can see the following errors in Datadog:

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.392
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Oct 11 09:57:56.388
XXX:production_myservice
production_myservice
[ddtrace] [error] Failed sending traces to the sidecar: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

My php settings are:

extension=ddtrace.so
datadog.trace.request_init_hook=/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php
datadog.trace.cli_enabled=On
datadog.trace.generate_root_span=Off
datadog.trace.auto_flush_enabled=On

Edit: My services are all executed as Lambdas in AWS !

PHP version

8.3.12

Tracer or profiler version

1.4.0

Installed extensions

No response

Output of phpinfo()

No response

Upgrading from

1.3.2

@jc-beyer-tqgg jc-beyer-tqgg added the 🐛 bug Something isn't working label Oct 11, 2024
@bwoebi
Copy link
Collaborator

bwoebi commented Oct 11, 2024

Hey @jc-beyer-tqgg,

Yes, we are rolling out a new version of trace sending with 1.4.0, so it seems that doesn't work for you sadly.

We would be very interested in reproducing this behaviour. I.e. a strace (strace -fTtts 1000 <php executable invocation here>, with no dd-ipc-helper process living when the process is launched), along with trace logs (DD_TRACE_LOG_LEVEL=trace DD_TRACE_LOG_FILE=/tmp/helper.log) would help a lot. If you want to provide it, please contact the support and mention this ticket and to route it directly to me, thanks a lot!

In case you just want it working, you can set datadog.trace.sidecar_trace_sender=0.

@jc-beyer-tqgg
Copy link
Author

Hey @bwoebi !
Im happy to help and provide any helpful information but my services are all executed as lambda. So its not easy to trace or save a log file 😓

Is there anything i can do in lambda context that would help you ?

@bwoebi
Copy link
Collaborator

bwoebi commented Oct 11, 2024

Having mentioned that you're in a lambda context is possibly already helpful?
I don't know very much about lambda, we'll try to reproduce it soon there. And if we don't manage to, we'll come back to you - thanks for offering help!

So yeah, for now please just revert to the old sender with the ini config mentioned before.

@j-fulbright
Copy link

Seeing the same issue here with our Lambdas (using Images (Docker)

@j-fulbright
Copy link

Having mentioned that you're in a lambda context is possibly already helpful? I don't know very much about lambda, we'll try to reproduce it soon there. And if we don't manage to, we'll come back to you - thanks for offering help!

So yeah, for now please just revert to the old sender with the ini config mentioned before.

We added the ini setting at the Docker level when installing the extension, the same place we add some other settings and it didn't seem to help at all. I'm now trying DD_SIDECAR_TRACE_SENDER_DEFAULT = false in our ENV

@rquinaud
Copy link

@j-fulbright
Hi,
I'm not in the same context as your are (Lambda AWS) but having similar issues.
My context is GCP GKE execution in a cronjob , PHP 8.3.8, dd library 1.4.0 or 1.4.1 (bot failing).

After hours of research, the "sidecar" feature seems to be the cumber-stone.
By the way, I did not find relevant documentation about this sadly (maybe my bad)

I tried setting up the DD_SIDECAR_TRACE_SENDER_DEFAULT as well with no result.

BUT;

DD_TRACE_SIDECAR_TRACE_SENDER: "0"

Setting up this variable make the trace uploaded as it was with dd-library 1.3.1

Regards

@j-fulbright
Copy link

j-fulbright commented Oct 18, 2024

We ended up rolling back to the older tracer for the time being, as it was filling up server disk space due to core dumps

@bwoebi
Copy link
Collaborator

bwoebi commented Oct 18, 2024

We're going to release a 1.4.2 on Monday which will detect lambda and disable the sidecar trace sender by default.

@bwoebi
Copy link
Collaborator

bwoebi commented Oct 21, 2024

1.4.2 has been released, with us looking for the AWS_LAMBDA_FUNCTION_NAME env in lambda now, and autodisabling the sidecar in that case for now (until we can properly fix this).

@bwoebi bwoebi closed this as completed Oct 21, 2024
@j-fulbright
Copy link

This doesn't seem to be reliable or has other issues, as we're still seeing Broken Pipe errors (Docker image running on Lambda)


[ddtrace] [error] Failed flushing telemetry buffer: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
--
[ddtrace] [error] Failed flushing service data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

@bwoebi
Copy link
Collaborator

bwoebi commented Oct 25, 2024

We're looking at fully fixing this, but these errors have been present for a long time (but were silently discarded).
However the tracing itself should work again by now. You may ignore these Broken pipe errors for now.

@matthew-mcmullin
Copy link

With the broken pipe errors now reporting we are being billed for log usage / indexing. We are still seeing this problem in our PHP images. What is the timeline on getting them fixed so that errors aren't reported?

@bwoebi
Copy link
Collaborator

bwoebi commented Nov 14, 2024

@matthew-mcmullin The fix (disabling the faulty telemetry sending on lambda environments) is in #2948. We intend to release early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants