Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace non-unicode characters for add-on static files #5712

Merged
merged 4 commits into from
Mar 3, 2025

Conversation

agners
Copy link
Member

@agners agners commented Mar 3, 2025

Proposed change

Add-on documentation and changelog get read and returned as text file. However, in case the original author used non-unicode characters, or the file corrupted, loading currently fails with an UnicodeDecodeError.

Let's just use the built-in replace error handling of Python, so they appear for the user as non-unicode characters by replacing them with the official unicode replacement character "�".

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Summary by CodeRabbit

  • Refactor
    • Improved the asset loading process by separating the handling of text and binary files.
    • Enhanced the reliability of add-on resources such as imagery and documentation through more precise file processing.
  • Tests
    • Added tests to verify API behavior with corrupted documentation and changelog data containing invalid UTF-8 characters.
  • Chores
    • Updated test fixtures to include additional information about add-on locations for improved testing accuracy.

@agners agners added the refactor A code change that neither fixes a bug nor adds a feature label Mar 3, 2025
Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

📝 Walkthrough

Walkthrough

The changes involve the refactoring of file reading operations within the APIStore class. The original method _read_static_file has been split into two distinct methods: _read_static_text_file for reading text files with error handling and _read_static_binary_file for reading binary files without error handling. Corresponding API methods have been updated to call the appropriate new function based on the type of asset being retrieved. Additionally, new asynchronous tests have been added to verify the handling of corrupted UTF-8 characters in documentation and changelog files.

Changes

File Change Summary
supervisor/api/store.py Replaced _read_static_file with two functions: _read_static_text_file (for text files) and _read_static_binary_file (for binary files). Updated methods (addons_addon_icon, addons_addon_logo, addons_addon_changelog, addons_addon_documentation) to use the new functions accordingly.
tests/api/test_store.py Added two new asynchronous test methods: test_api_store_addons_documentation_corrupted and test_api_store_addons_changelog_corrupted, which test the API's response to corrupted UTF-8 characters in documentation and changelog files.
tests/conftest.py Enhanced store_addon fixture by adding a line to assign tmp_path to the location key of the addons dictionary for the addon_obj.slug.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as APIStore
    participant T as Text Reader
    participant B as Binary Reader

    C->>A: Request addon changelog/documentation
    A->>T: Call _read_static_text_file(path)
    T-->>A: Return text content
    A-->>C: Respond with text asset

    C->>A: Request addon icon/logo
    A->>B: Call _read_static_binary_file(path)
    B-->>A: Return binary content
    A-->>C: Respond with binary asset
Loading

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 096e48c and 4495b85.

📒 Files selected for processing (3)
  • supervisor/api/store.py (5 hunks)
  • tests/api/test_store.py (1 hunks)
  • tests/conftest.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (8)
supervisor/api/store.py (6)

72-78: Good implementation of error handling for text files.

The new _read_static_text_file function correctly implements the error handling with errors="replace", which will replace any non-unicode characters with the Unicode replacement character (�) instead of raising UnicodeDecodeError exceptions.


80-87: Good separation of binary file handling.

Creating a separate function for binary files makes the code more explicit and easier to understand. The function correctly opens files in binary mode ("rb"), which is appropriate since binary files don't need Unicode decoding.


259-259: Proper use of binary file reader for icons.

The icon API endpoint now correctly uses the binary file reader, which is appropriate for image files.


268-268: Proper use of binary file reader for logos.

The logo API endpoint now correctly uses the binary file reader, which is appropriate for image files.


282-284: Good implementation of error handling for changelog files.

The changelog API endpoint now correctly uses the text file reader with error handling, which addresses the issue with non-unicode characters in changelog files.


299-299: Good implementation of error handling for documentation files.

The documentation API endpoint now correctly uses the text file reader with error handling, which addresses the issue with non-unicode characters in documentation files.

tests/api/test_store.py (2)

354-371: Excellent test for corrupted documentation files.

This test properly verifies that the API can handle documentation files with invalid UTF-8 characters by:

  1. Creating a file with an invalid UTF-8 character (\xff)
  2. Verifying that the API returns the content with the replacement character

This is exactly what we need to test the error handling functionality implemented in the _read_static_text_file method.


373-389: Excellent test for corrupted changelog files.

This test properly verifies that the API can handle changelog files with invalid UTF-8 characters, similar to the documentation test. The test ensures that the API correctly replaces invalid characters with the Unicode replacement character (�).

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84b265a and 1064bcc.

📒 Files selected for processing (1)
  • supervisor/api/store.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
  • GitHub Check: Check ruff
🔇 Additional comments (5)
supervisor/api/store.py (5)

72-78: Good implementation of text file handling with error replacement.

This function correctly implements reading text files with unicode error handling by using the errors="replace" parameter. This will replace non-unicode characters with the unicode replacement character (�), allowing files with encoding issues to be properly displayed instead of throwing UnicodeDecodeError exceptions.


81-87: Binary file reader function properly implemented.

The separation of binary file reading into its own function maintains clean code separation. Binary files don't need error replacement since they're not interpreted as text.


281-281: Text file reading with error replacement correctly implemented.

This change properly implements the new _read_static_text_file function for changelog files, which will handle non-unicode characters gracefully by replacing them with the unicode replacement character.


295-297: Text file reading with error replacement correctly implemented for documentation.

The implementation correctly uses the new _read_static_text_file function for documentation files, which will handle non-unicode characters properly.


72-297:

✅ Verification successful

Verify all usages of the replaced function in the codebase.

This PR replaces a generic _read_static_file function with two specialized functions. Let's verify that all usages of the previous function have been updated throughout the codebase.


🏁 Script executed:

#!/bin/bash
# Search for any remaining instances of the old _read_static_file function
echo "Checking for remaining usages of _read_static_file:"
rg "_read_static_file" --type py

# Check if there are any other places where files are read that might benefit from the same error handling approach
echo "Looking for other file opening patterns that might need similar error handling:"
rg "\.open\(" --type py | grep -v "errors=\"replace\""

Length of output: 2363


All usages of the old _read_static_file have been successfully removed

  • We verified that no instances of _read_static_file remain in the codebase.
  • The new specialized functions (_read_static_text_file with errors="replace" for text files and _read_static_binary_file for binary files) are correctly used in supervisor/api/store.py.
  • The other file opening patterns (e.g., in tests) do not interfere with this change and use the appropriate modes for their specific purposes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
supervisor/api/store.py (2)

81-87: Consider adding a specific return type annotation.

The function signature uses Any as the return type, but it's always returning bytes for binary files. Consider improving type safety by using a more specific return type.

-def _read_static_binary_file(path: Path) -> Any:
+def _read_static_binary_file(path: Path) -> bytes:

72-72: Consider adding a specific return type annotation.

Similar to the binary file function, the text file function would benefit from a more specific return type instead of Any.

-def _read_static_text_file(path: Path) -> Any:
+def _read_static_text_file(path: Path) -> str:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1064bcc and b6d8cf2.

📒 Files selected for processing (1)
  • supervisor/api/store.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (5)
supervisor/api/store.py (5)

72-78: Good handling of Unicode decoding errors in text files.

The implementation of _read_static_text_file with errors="replace" parameter addresses the issue with non-unicode characters or file corruption. This will replace problematic characters with the Unicode replacement character "�" rather than causing the entire file read to fail.


259-259: LGTM! The function now correctly reads binary data for icons.

This change properly separates the binary file reading logic, which is appropriate for image files like icons.


268-268: LGTM! The function now correctly reads binary data for logos.

This change properly separates the binary file reading logic, which is appropriate for image files like logos.


282-284: Good enhancement to handle Unicode errors in changelog files.

Using _read_static_text_file with error replacement will prevent UnicodeDecodeError and improve user experience by displaying content even with problematic characters.


298-300: Good enhancement to handle Unicode errors in documentation files.

Using _read_static_text_file with error replacement will prevent UnicodeDecodeError and improve user experience by displaying content even with problematic characters.

@agners agners requested a review from mdegat01 March 3, 2025 17:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/api/test_store.py (1)

378-378: Fix the docstring to match the test purpose.

The docstring incorrectly states that this is testing the documentation API endpoint, but it's actually testing the changelog endpoint.

-    """Test /store/addons/{addon}/documentation REST API.
+    """Test /store/addons/{addon}/changelog REST API.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6d8cf2 and 85756a6.

📒 Files selected for processing (2)
  • tests/api/test_store.py (1 hunks)
  • tests/conftest.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
`*/**(html|markdown|md)`: - For instructional content in doc...

*/**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
  • tests/conftest.py
`*/**(html|markdown|md)`: - Use bold to mark UI strings. - I...

*/**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.
  • tests/conftest.py
`*/**(html|markdown|md)`: - Be brief in your replies and don...

*/**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"

  • tests/conftest.py
`*/**(html|markdown|md)`: - Use sentence-style capitalizatio...

*/**(html|markdown|md): - Use sentence-style capitalization also in headings.

  • tests/conftest.py
`*/**(html|markdown|md)`: do not comment on HTML used for ic...

*/**(html|markdown|md): do not comment on HTML used for icons

  • tests/conftest.py
`*/**(html|markdown|md)`: Avoid flagging inline HTML for emb...

*/**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

  • tests/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Build armv7 supervisor
  • GitHub Check: Build armhf supervisor
  • GitHub Check: Build aarch64 supervisor
  • GitHub Check: Run tests Python 3.13.2
🔇 Additional comments (3)
tests/conftest.py (1)

519-519: Ensures test paths are available to addons in store.

This addition properly sets up the addon's location path in the store data, which is necessary for the new tests handling corrupted text files to access physical file paths.

tests/api/test_store.py (2)

354-370: Good test for handling corrupted documentation files.

This test properly verifies that the API can handle documentation files with invalid UTF-8 characters, replacing them with the Unicode replacement character (�) instead of failing with a UnicodeDecodeError. This aligns with the PR objective of improving user experience when encountering non-unicode characters.


373-389: Good test for handling corrupted changelog files.

This test correctly verifies that the API can handle changelog files with invalid UTF-8 characters, ensuring they're properly replaced with the Unicode replacement character (�) rather than failing with an exception.

@agners agners force-pushed the replace-non-unicode-characters-in-text-files branch from 85756a6 to 096e48c Compare March 3, 2025 17:32
agners added 4 commits March 3, 2025 19:22
Add-on documentation and changelog get read and returned as text file.
However, in case the original author used non-unicode characters, or
the file corrupted, loading currently fails with an UnicodeDecodeError.

Let's just use the built-in replace error handling of Python, so they
appear for the user as  non-unicode characters by replacing them with
the official unicode replacement character "�".
@agners agners force-pushed the replace-non-unicode-characters-in-text-files branch from 096e48c to 4495b85 Compare March 3, 2025 18:22
@agners agners merged commit f8bab20 into main Mar 3, 2025
20 checks passed
@agners agners deleted the replace-non-unicode-characters-in-text-files branch March 3, 2025 19:14
@github-actions github-actions bot locked and limited conversation to collaborators Mar 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed refactor A code change that neither fixes a bug nor adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants