Skip to content

[Bug]: Running same binary in container build with py_image_layer causes failed imports #526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ianb-pomelo opened this issue Feb 6, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ianb-pomelo
Copy link

What happened?

We recently migrated from version 0.7.1 to 1.2.1 and migrated the way we build our docker images from using a modified version of the old template to py_image_layer. Overall it has been great except for one thing: we deploy our containers to K8S and have health/status checks on them. The issue is the health checks use the same binary that the image runs normally, just in different modes. Concretely, we are running Dagster which has the container run dagster api grpc and then the health checks use dagster api grpc-health-check. What we found is since the same binary target is being run by two separate cases in the same container, the venv that backed the python script was being re-created each health check run. This caused the base process to lose the packages in it's venv temporarily, causing it to be unhealthy and thus fail.

It seems like #522 would fix this since the venv would be stable but in the meantime, is there a way to fix this temporarily?

Version

Development (host) and target OS/architectures: aarch Darwin -> aarch Darwin, Linux x86_64 -> Linux x86_64

Output of bazel --version: 8.0.0

Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file: 1.2.1

Language(s) and/or frameworks involved: Python 3.11, Docker/rules_oci 1.7.4

How to reproduce

Hard to reliably reproduce since it is a bit of a race condition.

One way would be to have a python binary that continually tries to import a package and create an OCI image using `py_image_layer`. Then run the image and then `exec` the binary again in another window. One of the two should error out but may take several iterations

Any other information?

We were getting errors that looked like

ERROR 2025-02-05T17:31:44.082500645Z [resource.labels.containerName: dagster-user-deployments] File "/data/pomelo/dagster.runfiles/.dagster.venv/lib/python3.11/site-packages/dsp/modules/__init__.py", line 22, in <module>
ERROR 2025-02-05T17:31:44.082966540Z [resource.labels.containerName: dagster-user-deployments] from .pyserini import *
ERROR 2025-02-05T17:31:44.082988358Z [resource.labels.containerName: dagster-user-deployments] File "/data/pomelo/dagster.runfiles/.dagster.venv/lib/python3.11/site-packages/dsp/modules/pyserini.py", line 4, in <module>
ERROR 2025-02-05T17:31:44.083429066Z [resource.labels.containerName: dagster-user-deployments] from datasets import Dataset
ERROR 2025-02-05T17:31:44.083460Z [resource.labels.containerName: dagster-user-deployments] ModuleNotFoundError: No module named 'datasets'

despite having datasets included in the binary. After turning off our health checks, the error went away. I also SSH'd into the pod and inspected the packages in the venv that was generated and saw it would repeatedly have a subset of the expected packages and then quickly after have all of the expected packages

@ianb-pomelo ianb-pomelo added the bug Something isn't working label Feb 6, 2025
@arrdem
Copy link
Contributor

arrdem commented May 2, 2025

Will be obviated by #551.

@arrdem arrdem self-assigned this May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants