Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid stack traces for known advanced logs errors #5713

Closed

Conversation

agners
Copy link
Member

@agners agners commented Mar 3, 2025

Proposed change

Known advanced errors, like connection reset, should not lead to stack traces in the Supervisor logs. Let those errors bubble up to the API error handling.

Fixes: #5606

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced error handling for asynchronous log retrieval operations. Specific errors, such as connectivity issues, are now managed and propagated more transparently. This update improves overall system reliability by ensuring that any errors encountered do not lead to unexpected behavior, offering a more consistent and clear experience when accessing detailed logs.

Known advanced errors, like connection reset, should not lead to stack
traces in the Supervisor logs. Let those errors bubble up to the
API error handling.

Fixes: #5606
@agners agners added the bugfix A bug fix label Mar 3, 2025
@agners agners requested a review from sairon March 3, 2025 09:23
Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

📝 Walkthrough

Walkthrough

The changes update the get_supervisor_logs asynchronous function by importing the APIError exception from the exceptions module and adding explicit handling for it. When an APIError is raised during log retrieval, the function catches and re-raises it, ensuring that the error is propagated to the API layer while retaining the previous broad exception handling for other error types.

Changes

File Change Summary
supervisor/api/init.py Imported APIError from ..exceptions; added explicit exception handling in get_supervisor_logs to catch and re-raise APIError

Sequence Diagram(s)

sequenceDiagram
    participant Client as API Client
    participant Logs as get_supervisor_logs
    participant Handler as Exception Handler

    Client->>Logs: Request supervisor logs
    Logs->>Logs: Retrieve logs
    alt APIError occurs
        Logs->>Handler: Catch APIError
        Handler-->>Logs: Re-raise APIError
        Logs-->>Client: Propagate APIError
    else Other Exception occurs
        Logs->>Logs: Handle general exception (logged internally)
        Logs-->>Client: Return handled error response
    end
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
supervisor/api/__init__.py (1)

406-409: Well-implemented specific exception handling for APIError.

This change correctly implements specific handling for APIError exceptions before the generic exception handler. By re-raising these specific errors, they can properly propagate to the API layer where they can be handled appropriately instead of being caught by the broad exception handler and generating stack traces in logs.

The comment clearly explains the rationale - these are known errors like connection resets that should be handled at the API level rather than generating verbose stack traces.

A minor style suggestion (not critical): In Python, you can use just raise without arguments to re-raise the current exception:

-            except APIError as err:
-                # Advanced logs are generally available, but we did encounter an error
-                # (e.g. connection reset). Those errors should bubble up to the API.
-                raise err
+            except APIError:
+                # Advanced logs are generally available, but we did encounter an error
+                # (e.g. connection reset). Those errors should bubble up to the API.
+                raise
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84b265a and 3a00a6a.

📒 Files selected for processing (1)
  • supervisor/api/__init__.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Build i386 supervisor
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (2)
supervisor/api/__init__.py (2)

12-12: Good import addition for better exception handling.

Adding the APIError import is necessary for the specific exception handling implemented in the get_supervisor_logs function.


401-422:

✅ Verification successful

Verify this change works with all error handling scenarios.

The implementation looks good, but it's worth verifying that all expected error types are properly categorized. This change assumes that any APIError should be propagated while other exceptions should be caught and handled with the fallback mechanism.

Let's check if there are specific subclasses of APIError that might need different handling:


🏁 Script executed:

#!/bin/bash
# Search for all APIError subclasses to ensure we're handling the right error types

# Look for class definitions that inherit from APIError
ast-grep --pattern 'class $name($_APIError, $_): $$$'

# Alternate search using ripgrep
echo "Alternate search:"
rg "class.*\(.*APIError" --type py -A 1

Length of output: 984


Final Verification: Error Handling is Correct

The error handling implementation in supervisor/api/__init__.py has been verified against the defined APIError subclasses. The shell script confirmed that all expected subclasses (e.g., APIForbidden, APINotFound, APIAddonNotInstalled, APIDBMigrationInProgress, as well as HomeAssistantAuthError and HomeAssistantWSError) derive from APIError and will therefore be propagated by the first exception clause. Non-APIError exceptions are correctly caught by the broad exception handler, logged, and then the fallback to Docker logs is applied—with special handling in place to exclude HostNotSupportedError from being captured by Sentry.

  • APIError and subclasses: All errors are properly propagated.
  • Fallback mechanism: Non-API errors are caught, logged, and then fallback logic is executed.
  • Logging & Exception Capture: The conditional check for HostNotSupportedError is functioning as expected.

This implementation meets the intended error handling coverage.

Copy link
Member

@sairon sairon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper fix for the linked issue should be #5715 (but note that both PRs fix only the first error mentioned there, not #5606 (comment)).

I am unsure if we should silence the APIErrors though, do you have other examples where it makes sense?

@agners
Copy link
Member Author

agners commented Mar 3, 2025

I think the proper fix for the linked issue should be #5715 (but note that both PRs fix only the first error mentioned there, not #5606 (comment)).

Hm, yeah I think you are right, in that particular case the connection error happened on the response stream. I agree, I think this we can rather safely ignore.

I wonder though, should we maybe try .. catch them and print a debug log? You might want to be aware during development that these connections are being dropped/resetted. 🤔

I am unsure if we should silence the APIErrors though, do you have other examples where it makes sense?

To be clear, it won't be entirely silenced: This will trigger api_process default API error handling. e.g. in the case of a connection reset on systemd-journal-gatewayd side, Connection reset when trying to fetch data from systemd-journald. would be returned with an HTTP 400 error. That should then show up to the user as an error.

I guess this would happen e.g. if systemd-journal-gatewayd were to crash or restart.

@sairon
Copy link
Member

sairon commented Mar 3, 2025

I wonder though, should we maybe try .. catch them and print a debug log? You might want to be aware during development that these connections are being dropped/resetted. 🤔

Yeah, maybe, that wouldn't hurt. I'll update the other PR accordingly.

To be clear, it won't be entirely silenced: This will trigger api_process default API error handling. e.g. in the case of a connection reset on systemd-journal-gatewayd side, Connection reset when trying to fetch data from systemd-journald. would be returned with an HTTP 400 error. That should then show up to the user as an error.

Right, I was a bit unclear by saying "silenced". It's been a while but IIRC the goal was to catch all those unexpected errors and report them to Sentry, because they should not occur in normal operation. And once we learn about the possible failures, add special handling only to those that we really don't care about (or better said - it's sufficient for them to be just logged).

@agners
Copy link
Member Author

agners commented Mar 3, 2025

Right, I was a bit unclear by saying "silenced". It's been a while but IIRC the goal was to catch all those unexpected errors and report them to Sentry, because they should not occur in normal operation.

Agreed, and that is what Exception handling does here.

That said, this actually should no longer be necessary since #5681: With that change, by default all unhandled exception will be reported to Sentry (if optted-in, that is). So if Sentry is the only argument, we could also remove this. That said, this does annotate the error a bit (Failed to get supervisor logs using advanced_logs API). If we let it bubble up to aiohttp, it will just say Error handling request from 172.30.32.1...

And once we learn about the possible failures, add special handling only to those that we really don't care about (or better said - it's sufficient for them to be just logged).

Yes, and I am considering ConnectionResetError as one of this known issues, we handle and re-raise with APIError.

Although, i wonder if that exception even can get raised systemd-journal-gatewayd side (so in journal_logs_reader). The ClientResponse read() supposedly only raises aiohttp.ClientResponseError.

But then again, in actual testing, by sudo systemctl restart systemd-journal-gatewayd while following logs, it's actually ClientPayloadError errors which get raised. Which brings us back to...

(but note that both PRs fix only the first error mentioned there, not #5606 (comment)).

...and I think we should try to handle this more gracefully.

Now my fix is indeed wrong: APIError is not handled in this case, since we dont use the @api_process annotation. In fact, we probably can't use regular error handling here, as we are already streaming the response. It is sort of too late for regular http error handling.

In any case, this PR is the wrong fix. So closing.

We probably could try to reconnect "behind the scenes". But not sure if it's worth the effort. Maybe just logging as a warning that a client error while talking to systemd-joournal-gatewayd happened, and move on.

@agners agners closed this Mar 3, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Mar 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bugfix A bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SUPERVISOR - Failed to get logs using advanced API (NEW - NOT THE SAME)
2 participants