Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using external / imported python extension module DSOs with py_test, py_binary #2593

Open
wade-arista opened this issue Jan 29, 2025 · 1 comment

Comments

@wade-arista
Copy link

wade-arista commented Jan 29, 2025

🚀 feature request

Relevant Rules

This describes a modifcation to the py_executable-based rules and intersects with

  • py_test
  • py_binary
  • py_library
  • cc_import / cc_library

Description

When a py_test depends on a python extension module that was not built within bazel, the DT_NEEDED entries cannot be resolved in a hermetic environment because the imported extension module (DSO) will be missing or have an incorrect runpath / rpath.

I believe this is substantially different from #2562 such that it warrants its own feature request.

I've provided a minimal test case here: https://github.com/wade-arista/bazel-demo/tree/min-py-ext-case#overview where it simulates the external dynamic libs by using chrpath to remove RPATH and then re-imports the built libs. I also brought it up on slack.

Describe the solution you'd like

Given some new py_xxx argument or --@rules_python//python/config_settings flag, we would inspect cc_info.linking_context.linker_inputs.to_list() from inside _create_run_environment_info() and inject / extend LD_LIBRARY_PATH in the RunEnvironmentInfo.environment with the path(s) to the dependent dynamic libraries (see the "Sample Patch" below for details).

Sample Patch

diff --git a/python/private/py_executable.bzl b/python/private/py_executable.bzl
index da7127e0..1247b445 100644
--- a/python/private/py_executable.bzl
+++ b/python/private/py_executable.bzl
@@ -1628,7 +1628,7 @@ def _create_providers(
             ),
         ),
         create_instrumented_files_info(ctx),
-        _create_run_environment_info(ctx, inherited_environment),
+        _create_run_environment_info(ctx, inherited_environment, cc_info),
         PyExecutableInfo(
             main = main_py,
             runfiles_without_exe = runfiles_details.runfiles_without_exe,
@@ -1701,8 +1701,9 @@ def _create_providers(
     providers.extend(extra_providers)
     return providers

-def _create_run_environment_info(ctx, inherited_environment):
+def _create_run_environment_info(ctx, inherited_environment, cc_info):
     expanded_env = {}
+    ld_library_path = {}
     for key, value in ctx.attr.env.items():
         expanded_env[key] = _py_builtins.expand_location_and_make_variables(
             ctx = ctx,
@@ -1710,6 +1711,21 @@ def _create_run_environment_info(ctx, inherited_environment):
             expression = value,
             targets = ctx.attr.data,
         )
+
+    for inp in cc_info.linking_context.linker_inputs.to_list():
+        for lib in inp.libraries:
+            dl = lib.dynamic_library
+            if not dl:
+                continue
+            lib_path = dl.short_path.removesuffix(dl.basename).rstrip("/")
+            ld_library_path[lib_path] = True
+
+    if ld_library_path:
+        parts = []
+        parts += [expanded_env["LD_LIBRARY_PATH"]] if "LD_LIBRARY_PATH" in expanded_env else []
+        parts += sorted(ld_library_path.keys())
+        expanded_env["LD_LIBRARY_PATH"] = ":".join(parts)
+
     return RunEnvironmentInfo(
         environment = expanded_env,
         inherited_environment = inherited_environment,

Describe alternatives you've considered

  1. I think this would also be doable using a wrapper-generating rule, but it's not super clear to me how well it would interact with the existing sh / python wrapper script for py executable rules.
  2. Instead of setting RunEnvironmentInfo.environment, this could be added to the existing wrapper (shell or python depending on bootstrap_impl).
@rickeylev
Copy link
Collaborator

instead of using RunEnvironmentInfo, set it in bootstrap

RunEnvironmentInfo can't be used because those values are only respected for bazel test or bazel run. So yes, setting LD_LIBRARY_PATH has to be done in the bootstrap.

I don't like setting LD_LIBRARY_PATH, but its the only tool we have available when we're given a prebuilt binary that we can't modify. Well, maybe not only tool. The other option is to modify the interpreter at build time to add some rpath entries. LD_LIBRARY_PATH is a simpler, though.

Don't sort the path entries, though -- order matters because the first library to be loaded and define a symbol wins. I think the libs will be in "top down" order (i.e. a target closer to the binary comes earlier in the ordering), which allows a binary to control ordering in case deps further away aren't in an order it needs.

The logic in your diff looks pretty good otherwise.

I'm not sure the logic has to loop over everything though. I think CcToolchainInfo.solib_dir has the directory that dynamically loaded libs get put into, though I don't remember offhand if that's publicly accessible. The py_internal.cc_helper object might provide a way to get at it if its not.

If you look at the paths for the artifacts in cc_info, you should see paths like e.g. _solib_<something>/libfoo.so -- thats the directory i'm talking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants