Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugpy listen silently crashing #1749

Open
koenlek opened this issue Nov 28, 2024 · 3 comments
Open

debugpy listen silently crashing #1749

koenlek opened this issue Nov 28, 2024 · 3 comments
Assignees
Labels
needs repro Issue has not been reproduced yet user responded

Comments

@koenlek
Copy link

koenlek commented Nov 28, 2024

Environment data

  • debugpy version: 1.8.8
  • OS and version: A k8s pod running an Ubuntu 20.04.6 based container
  • Python version (& distribution if applicable, e.g. Anaconda): 3.9
  • Using VS Code or Visual Studio: VS Code

Actual behavior

I'm using the Ray Distributed Debugger (their code here) with Ray on K8S. It runs debugpy.listen , but when I check the port on which it listens, nothing is bound to that port (sudo lsof -i :$LISTEN_PORT). I enabled DEBUGPY_LOG_DIR to get more detailed logs, and I noticed that debugpy.pydevd.NNNN.log contains this near the end, indicating that it indeed crashed:

Traceback (most recent call last):
  File "/my_app/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/my_app/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe

I looked in the issue trackers of debugpy, pydevd, and ray, and did some googling, and couldn't find much unfortunately. The only thing I found is that this may point to the connection between the local services (there is a client, server, and "debug server" and some incoming client (?) involved in running debugpy on the application side, it seems) breaking. I found this snippet in debugpy.adapter.NNNN.log:

I+00000.071: Listening for incoming Client connections on 10.40.0.130:51507...

I+00000.071: Listening for incoming Server connections on 127.0.0.1:39415...

I+00000.071: Sending endpoints info to debug server at localhost:60997:
             {
                 "client": {
                     "host": "10.40.0.130",
                     "port": 51507
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 39415
                 }
             }

I+00000.076: Accepted incoming Server connection from 127.0.0.1:43864.

Lastly, I noticed this in debugpy.{adapter,server}.NNNN.log but that seems to be ok, as I also saw this in healthy local runs:

I+00000.049: Error while enumerating installed packages.
             
Traceback (most recent call last):
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 362, in get_environment_description
    report("    {0}=={1}\n", pkg.name, pkg.version)
AttributeError: 'PathDistribution' object has no attribute 'name'

Stack where logged:
  File "/my_app/python3_x86_64/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/my_app/python3_x86_64/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/my_app/debugpy/adapter/__main__.py", line 227, in <module>
    main(_parse_argv(sys.argv))
  File "/my_app/debugpy/adapter/__main__.py", line 50, in main
    log.describe_environment("debugpy.adapter startup environment:")
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 372, in describe_environment
    info("{0}", get_environment_description(header))
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 364, in get_environment_description
    swallow_exception(
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 215, in swallow_exception
    _exception(format_string, *args, **kwargs)

All of this crashes already before I try connecting to the debugger.

I was also able to reproduce this without using Ray Distributed Debugger. I just connect to the k8s pod, create a small python script:

import debugpy
debugpy.listen(5678)
print("before wait_for_client")
debugpy.wait_for_client()
print("after wait_for_client")
print("before breakpoint")
debugpy.breakpoint()
print("after breakpoint")

Run it and check the log files and see the same crash happening (BrokenPipeError: [Errno 32] Broken pipe) in the pydevd logs.

When I run all of this locally, everything works fine. When running on ray on k8s, I run into this issue...

These are the full, lightly redacted, logs:

Questions:

  • Is there a way to detect a crashed listen from code? If so, how?
  • Any ideas on what makes this crash?

Expected behavior

Accepting oncoming connections on the debugpy.listen endpoint.

Steps to reproduce:

I'm afraid it will be hard to reproduce this in an environment other than our "ray on k8s" setup. But details are in the "Actual behavior" section.

@rchiodo
Copy link
Contributor

rchiodo commented Dec 13, 2024

Not sure what a Ray cluster is, but the broken pipe sometimes happens in our test suite. I believe it's usually from one of two reasons:

  • Debugger processes are shutting down during terminate but not waiting for the debuggee to finish
  • Debugger is trying to make a connection to another process and it times out before that process starts.

This line in your adapter log makes me think it's the latter:

0.00s - PyDB.dispose_and_kill_all_pydevd_threads (called from: File "/my_app/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 324, in _terminate_on_socket_close)

The connection the debuggee has to the adapter process is being killed.

What's bazel_python? That looks to be the python being used?

@koenlek
Copy link
Author

koenlek commented Feb 13, 2025

Hi. Thanks for your response!

What's bazel_python? That looks to be the python being used?

Correct, that's the python interpreter. The same interpreter is used in local environments, in which this works fine (it only crashes when we run this in a k8s cluster).

The connection the debuggee has to the adapter process is being killed.

I see. Hopefully this leads us to a solution. Is there any way in which we can dig deeper into that? As I'm not sure yet what I could try next... I also don't think I understand the hierarchy of clients/servers that well for the debugger. Seems like the debuggee sets up a debugpy server with debugpy.listen, but that in fact sets up multiple things (a server, client and adapter?) and that all those are something else than the client coming from starting an "attach" in vscode? I.e. there is a debuggee side client/server/(adapter?) and a vscode-side client?

@rchiodo
Copy link
Contributor

rchiodo commented Feb 13, 2025

This md file sort of explains the different parts involved (it's for subprocess debugging so there's an extra debuggee, but the other parts are the same):
https://github.com/microsoft/debugpy/blob/main/doc/Subprocess%20debugging.md

This code here is being hit:

pydev_log.debug("ReaderThread: empty contents received (len(line) == 0).")

That code would happen if:

  • An OS error occurred reading from the socket that the debuggee is listening on (well because the adapter process was killed or not running soon enough?)
  • The adapter sent empty data (not likely).

Looking at your logs again, the adapter never gets a connection from the client. Meaning VS code never talks to it.

On my local machine, there's a message like so:

I+00006.165: Listening for incoming Client connections on 0.0.0.0:5679...

That's the port that's being listened on by debugpy and which VS code connects to.

In your logs it's showing this:

I+00000.071: Listening for incoming Client connections on 10.40.0.130:51507...

Is that the port you're using to connect with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs repro Issue has not been reproduced yet user responded
Projects
None yet
Development

No branches or pull requests

3 participants