Skip to content

Add cats-effect instrumentation #13576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

iRevive
Copy link

@iRevive iRevive commented Mar 23, 2025

Closes #10599.

Hey folks.

Cats Effect is a high-performance, asynchronous, composable framework for building real-world applications in a purely functional style within the Typelevel ecosystem.

How the instrumentation works

Cats Effect has its context propagation mechanism known as IOLocal. 3.6.0 release provides a way to represent IOLocal as a ThreadLocal, which creates an opportunity to manipulate the context from the outside.

  • Agent instruments the constructor of IORuntime and stores a ThreadLocal representation of the IOLocal[Context] in the bootstrap classloader, so the agent and application both access the same instance
  • Instrumentation installs a custom ContextStorage wrapper (for the agent context storage). This wrapper uses FiberLocalContextHelper to retrieve the fiber's current context (if available)
  • Agent instruments IOFiber's constructor and starts the fiber with the currently available context

@iRevive iRevive requested a review from a team as a code owner March 23, 2025 19:23
import cats.effect.IOLocal;
import io.opentelemetry.javaagent.instrumentation.opentelemetryapi.context.AgentContextStorage;

public class IoLocalContextSingleton {
Copy link
Author

@iRevive iRevive Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be defined in a common package so we can lately reuse it to instrument otel4s.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a prototype: iRevive@b2f6501

@iRevive
Copy link
Author

iRevive commented Mar 23, 2025

Some tests have failed with the following error:

java.lang.IllegalStateException: Cannot write to this reference for cats.effect.IO arg0 in read-only context

I assume some VMs aren't happy with the body modification of IO?


@Advice.OnMethodEnter(suppress = Throwable.class)
public static void onEnter() {
FiberLocalContextHelper.initialize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have deployed 2 wars on tomcat that use this library, won't this break? Messing with the context storage is unusual, my hunch is that this is not a good idea. Typically such instrumentations restore the otel context when fiber starts running on a thread and save the context when it stops using the thread.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern indeed.

deployed 2 wars on tomcat that use this library

If I understand correctly, each deployment (app) will have its own classloader, but the bootstrap will still be shared.
If that's the case, my implementation won't work, I'm afraid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose I don't find a proper way to make the instrumentation work. Can I distribute the current implementation as a third-party extension? Can the extension have access to the bootstrap loader?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the extension have access to the bootstrap loader?

Not directly, but you could try using byte-buddy to define the class you need in boot loader or you could experiment with Instrumentation.appendToBootstrapClassLoaderSearch.

@iRevive
Copy link
Author

iRevive commented Apr 15, 2025

Hey @laurit, I've tried a few different less-invasive approaches. Unfortunately they don't work.

The Fiber's context (IOLocal) is slightly more complex than ThreadLocal because it is pinned to a fiber rather than a thread. The fiber can switch threads, be suspended/resumed, and more.

I've tried attaching a current context to a fiber via the VirtualField, but that means I must reimplement IOLocal propagation logic on the agent level. For example, when the fiber switches a thread (e.g., to execute a blocking task), I must activate the attached context on a new thread. This approach also won't work with otel4s (at least without drastic changes to the otel4s propagation model).


I also tried installing a custom ContextWrapper (a variation of FiberContextBridge) for the application's context.

It works to some degree. However, the agent's tracer is unaware of the wrapper:

val span = tracer.spanBuilder("my-span").start()
val scope = span.makeCurrent()

IO {
  val current = Span.current() // returns the 'my-span', because the context wrapper is respected, all good
  val span = tracer.spanBuilder("span").start() // creates a brand new span, because it calls agent's Context.current(), which is a ThreadLocal<Context> in the agent scope
}.unsafeRunSync()

scope.close()
span.end()

Unfortunately, I lack knowledge of agent instrumentation, so there may be other approaches I am unaware of.
Could you suggest some alternatives?

Currently, I have only a few ideas:

  1. Would it be possible to keep the current instrumentation but disable it by default? Users must enable the instrumentation manually, so we should prevent some non-trivial cases. However, I understand that that's a subpar and dangerous implementation, and I'm fine with the no.
  2. From what I see, I can create a customized distribution of the OTel agent, something similar to
    https://github.com/elastic/elastic-otel-java. We can test it for a few iterations, and if it works fine, we can upstream it to the OTel agent (point 1, basically).

@laurit
Copy link
Contributor

laurit commented May 6, 2025

Unfortunately, I lack knowledge of agent instrumentation, so there may be other approaches I am unaware of.
Could you suggest some alternatives?

Actually I think the main question is whether you need this instrumentation at all. context.makeCurrent() sets the thread local context to provided context and returns a Scope that can be closed to restore the previous context. Essentially this allows accessing the current context with Context.current() without needing to pass the context around. Usage of thread local doesn't play nice when code can be relocated to a different thread. Instead of using the makeCurrent it might make more sense to consider alternatives what the library provides. For example when using kotlin coroutines you'd use withContext(context1.with(animalKey, "dog").asContextElement()) {...} to update the context and coroutineContext.getOpenTelemetryContext() to access the current context. The code for this is in https://github.com/open-telemetry/opentelemetry-java/tree/main/extensions/kotlin Now to interact with libraries that use Context.current() you could still use makeCurrent() before calling the library code. The important bit is that execution thread should not change between opening and closing the scope. For zio instrumentation we didn't set this restriction but in retrospect we probably should have. Allowing execution to transition while there are open scopes just creates problems. We can't reliably close the scope when execution is suspended and have to use Context.root().makeCurrent() to reset the thread context.
What your instrumentation might wish to so is propagate context from the parent to newly launched fibers. We do this in the agent part of the kotlin coroutines instrumentation. Idk whether this would be easier or how helpful it would be.
If you look at the kotlin coroutine instrumentation https://github.com/open-telemetry/opentelemetry-java/blob/5bda810da87731e113ecab85287d327ec88f9969/extensions/kotlin/src/main/java/io/opentelemetry/extension/kotlin/KotlinContextElement.java#L42 then you'll see that it provides callbacks on when the routine is resumed and suspended so we can activate the thread local context. Zio provides similar callbacks https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/zio/zio-2.0/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/zio/v2_0/TracingSupervisor.java It could help if cats-effects provided something similar, but it might not even be that useful if you replace makeCurrent() with something that is more cats-effects friendly. If cats-effects does not provide something similar to withContext you could build a library that provides utilities and documentation for the cats-effects users that steer them away from using makeCurrent() to alternatives better suited for cats-effects.
@iRevive does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support context propagation in Cats Effect library (Scala)
2 participants