-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awslambdaric
prevents graceful lambda shutdown
#105
Comments
I just ran into the same problem and found this GitHub issue. Thanks for all the details @aclemons provide, it helped me to locate the problem. The issue is that, In more details, what happens is that this line would call into the C extension,
which then calls this line and it blocks the main thread,
According to the Python signal document, even if SIGTERM is received, the registered signal handler will not be called until the current instruction finishes. In our case, the Python interpreter is waiting for Fortunately, the solution is simple, we can just move the blocking call into a thread to unblock the main thread. Since the C extension will release the GIL (#33) before polling the lambda API, main thread will be able to call the signal handler now. with ThreadPoolExecutor() as e:
fut = e.submit(runtime_client.next)
response_body, headers = fut.result() |
@tgsong do you mind creating a PR with that change? |
runtime_client.next is calling into the C extension which blocks the main thread. Moving it to a separate thread enables the main thread to process signal, see this issue for more details: aws#105
I just ran into this issue and I think it is not solved yet. I see @tgsong implemented the fix in a general way in #115 , but it was re-worked by @pushkarchawda in #124 The fix is conditioned on |
Overview
I've followed the guide described in graceful-shutdown-with-aws-lambda. Unfortunately, my signal handler is never invoked.
I spent time investigating and it looks like the native code in
awslambdaric
is not bubbling up the signal preventing the signal handler registered from ever running. I don't have experience with writing native code python extensions, so I don't know what needs to change inruntime_client.cpp
so that this works.The graceful shutdown guide actually has a note at the end of the readme which says:
I think this is exactly because of the issue I am describing here (and mentioned here aws-samples/graceful-shutdown-with-aws-lambda#2).
Steps to reproduce.
I took a simple lambda function
handler.py
:and created a lambda in
eu-central-1
with this as a zip, adding the layerarn:aws:lambda:eu-central-1:580247275435:layer:LambdaInsightsExtension:38
and invoked it once. I waited until the instance was spun down (10-15 minutes), but there was no log from the shutdown hook.I also deployed it as a docker image with this Dockerfile:
I pushed this into ECR and then created a container-based lambda with this image again in
eu-central-1
and invoked it once. Again, I waited until the instance was spun down (10-15 minutes), but there was no log from the shutdown hook.To prove that SIGTERM was being sent to the runtime, I altered the previous image to just be a simple shell script and updated the container-based lambda from the previous step:
startup.sh
with this
Dockerfile
:I invoked the lambda and got the
"SUCCESS"
response. I waited for it to spin down and checked the cloudwatch logs again:Success, so I knew that SIGTERM was being sent. After this, I suspected that it had to be somewhere in the runtime handling for python. While reading the source code for
awslambdaric
I realised part of it was written in cpp and I wondered if this might be the issue.To test whether this was the problem, I replaced the
runtime_client
written in cpp, with one written in python.runtime_client_py.py
and I put this in an image with this
Dockerfile
:I updated the image used by the lambda again and invoked it and waited for it to spin back down. I checked the cloudwatch logs again:
Success! So indeed it looks like the native code needs some adjustment to handle the signal.
Unfortunately, I can't offer a pull request to fix this since I don't know enough about how this works. Some searching points out that perhaps the code needs to be using
PyErr_CheckSignals
.awslambdaric
is also usingaws-lambda-runtime
internally, so there might be some interaction there too that needs even more handling.Anyway, I hope these reproduction steps can help someone with the skills find the correct solution.
Thanks.
The text was updated successfully, but these errors were encountered: