Replace non-unicode characters for add-on static files #5712

agners · 2025-03-03T09:19:26Z

Proposed change

Add-on documentation and changelog get read and returned as text file. However, in case the original author used non-unicode characters, or the file corrupted, loading currently fails with an UnicodeDecodeError.

Let's just use the built-in replace error handling of Python, so they appear for the user as non-unicode characters by replacing them with the official unicode replacement character "�".

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality to the supervisor)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:
Link to cli pull request:
Link to client library pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Ruff (ruff format supervisor tests)
Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Documentation added/updated for developers.home-assistant.io
CLI updated (if necessary)
Client library updated (if necessary)

Summary by CodeRabbit

Refactor
- Improved the asset loading process by separating the handling of text and binary files.
- Enhanced the reliability of add-on resources such as imagery and documentation through more precise file processing.
Tests
- Added tests to verify API behavior with corrupted documentation and changelog data containing invalid UTF-8 characters.
Chores
- Updated test fixtures to include additional information about add-on locations for improved testing accuracy.

coderabbitai · 2025-03-03T09:21:49Z

📝 Walkthrough

Walkthrough

The changes involve the refactoring of file reading operations within the APIStore class. The original method _read_static_file has been split into two distinct methods: _read_static_text_file for reading text files with error handling and _read_static_binary_file for reading binary files without error handling. Corresponding API methods have been updated to call the appropriate new function based on the type of asset being retrieved. Additionally, new asynchronous tests have been added to verify the handling of corrupted UTF-8 characters in documentation and changelog files.

Changes

File	Change Summary
supervisor/api/store.py	Replaced `_read_static_file` with two functions: `_read_static_text_file` (for text files) and `_read_static_binary_file` (for binary files). Updated methods (`addons_addon_icon`, `addons_addon_logo`, `addons_addon_changelog`, `addons_addon_documentation`) to use the new functions accordingly.
tests/api/test_store.py	Added two new asynchronous test methods: `test_api_store_addons_documentation_corrupted` and `test_api_store_addons_changelog_corrupted`, which test the API's response to corrupted UTF-8 characters in documentation and changelog files.
tests/conftest.py	Enhanced `store_addon` fixture by adding a line to assign `tmp_path` to the `location` key of the `addons` dictionary for the `addon_obj.slug`.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as APIStore
    participant T as Text Reader
    participant B as Binary Reader

    C->>A: Request addon changelog/documentation
    A->>T: Call _read_static_text_file(path)
    T-->>A: Return text content
    A-->>C: Respond with text asset

    C->>A: Request addon icon/logo
    A->>B: Call _read_static_binary_file(path)
    B-->>A: Return binary content
    A-->>C: Respond with binary asset

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 096e48c and 4495b85.

📒 Files selected for processing (3)

supervisor/api/store.py (5 hunks)
tests/api/test_store.py (1 hunks)
tests/conftest.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/conftest.py

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Build armv7 supervisor
GitHub Check: Build armhf supervisor
GitHub Check: Build aarch64 supervisor
GitHub Check: Run tests Python 3.13.2

🔇 Additional comments (8)

supervisor/api/store.py (6)

72-78: Good implementation of error handling for text files.

The new _read_static_text_file function correctly implements the error handling with errors="replace", which will replace any non-unicode characters with the Unicode replacement character (�) instead of raising UnicodeDecodeError exceptions.

80-87: Good separation of binary file handling.

Creating a separate function for binary files makes the code more explicit and easier to understand. The function correctly opens files in binary mode ("rb"), which is appropriate since binary files don't need Unicode decoding.

259-259: Proper use of binary file reader for icons.

The icon API endpoint now correctly uses the binary file reader, which is appropriate for image files.

268-268: Proper use of binary file reader for logos.

The logo API endpoint now correctly uses the binary file reader, which is appropriate for image files.

282-284: Good implementation of error handling for changelog files.

The changelog API endpoint now correctly uses the text file reader with error handling, which addresses the issue with non-unicode characters in changelog files.

299-299: Good implementation of error handling for documentation files.

The documentation API endpoint now correctly uses the text file reader with error handling, which addresses the issue with non-unicode characters in documentation files.

tests/api/test_store.py (2)

354-371: Excellent test for corrupted documentation files.

This test properly verifies that the API can handle documentation files with invalid UTF-8 characters by:

Creating a file with an invalid UTF-8 character (\xff)

Verifying that the API returns the content with the replacement character

This is exactly what we need to test the error handling functionality implemented in the _read_static_text_file method.

373-389: Excellent test for corrupted changelog files.

This test properly verifies that the API can handle changelog files with invalid UTF-8 characters, similar to the documentation test. The test ensures that the API correctly replaces invalid characters with the Unicode replacement character (�).

✨ Finishing Touches

📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84b265a and 1064bcc.

📒 Files selected for processing (1)

supervisor/api/store.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Build armv7 supervisor
GitHub Check: Build armhf supervisor
GitHub Check: Build aarch64 supervisor
GitHub Check: Run tests Python 3.13.2
GitHub Check: Check ruff

🔇 Additional comments (5)

supervisor/api/store.py (5)

72-78: Good implementation of text file handling with error replacement.

This function correctly implements reading text files with unicode error handling by using the errors="replace" parameter. This will replace non-unicode characters with the unicode replacement character (�), allowing files with encoding issues to be properly displayed instead of throwing UnicodeDecodeError exceptions.

81-87: Binary file reader function properly implemented.

The separation of binary file reading into its own function maintains clean code separation. Binary files don't need error replacement since they're not interpreted as text.

281-281: Text file reading with error replacement correctly implemented.

This change properly implements the new _read_static_text_file function for changelog files, which will handle non-unicode characters gracefully by replacing them with the unicode replacement character.

295-297: Text file reading with error replacement correctly implemented for documentation.

The implementation correctly uses the new _read_static_text_file function for documentation files, which will handle non-unicode characters properly.

72-297:
✅ Verification successful

Verify all usages of the replaced function in the codebase.

This PR replaces a generic _read_static_file function with two specialized functions. Let's verify that all usages of the previous function have been updated throughout the codebase.

🏁 Script executed:
#!/bin/bash
# Search for any remaining instances of the old _read_static_file function
echo "Checking for remaining usages of _read_static_file:"
rg "_read_static_file" --type py

# Check if there are any other places where files are read that might benefit from the same error handling approach
echo "Looking for other file opening patterns that might need similar error handling:"
rg "\.open\(" --type py | grep -v "errors=\"replace\""
Length of output: 2363
All usages of the old _read_static_file have been successfully removed

We verified that no instances of _read_static_file remain in the codebase.

The new specialized functions (_read_static_text_file with errors="replace" for text files and _read_static_binary_file for binary files) are correctly used in supervisor/api/store.py.

The other file opening patterns (e.g., in tests) do not interfere with this change and use the appropriate modes for their specific purposes.

supervisor/api/store.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

supervisor/api/store.py (2)
81-87: Consider adding a specific return type annotation.

The function signature uses Any as the return type, but it's always returning bytes for binary files. Consider improving type safety by using a more specific return type.
-def _read_static_binary_file(path: Path) -> Any:
+def _read_static_binary_file(path: Path) -> bytes:
72-72: Consider adding a specific return type annotation.

Similar to the binary file function, the text file function would benefit from a more specific return type instead of Any.
-def _read_static_text_file(path: Path) -> Any:
+def _read_static_text_file(path: Path) -> str:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1064bcc and b6d8cf2.

📒 Files selected for processing (1)

supervisor/api/store.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Build armv7 supervisor
GitHub Check: Build armhf supervisor
GitHub Check: Build aarch64 supervisor
GitHub Check: Run tests Python 3.13.2

🔇 Additional comments (5)

supervisor/api/store.py (5)

72-78: Good handling of Unicode decoding errors in text files.

The implementation of _read_static_text_file with errors="replace" parameter addresses the issue with non-unicode characters or file corruption. This will replace problematic characters with the Unicode replacement character "�" rather than causing the entire file read to fail.

259-259: LGTM! The function now correctly reads binary data for icons.

This change properly separates the binary file reading logic, which is appropriate for image files like icons.

268-268: LGTM! The function now correctly reads binary data for logos.

This change properly separates the binary file reading logic, which is appropriate for image files like logos.

282-284: Good enhancement to handle Unicode errors in changelog files.

Using _read_static_text_file with error replacement will prevent UnicodeDecodeError and improve user experience by displaying content even with problematic characters.

298-300: Good enhancement to handle Unicode errors in documentation files.

Using _read_static_text_file with error replacement will prevent UnicodeDecodeError and improve user experience by displaying content even with problematic characters.

supervisor/api/store.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/api/test_store.py (1)
378-378: Fix the docstring to match the test purpose.

The docstring incorrectly states that this is testing the documentation API endpoint, but it's actually testing the changelog endpoint.
-    """Test /store/addons/{addon}/documentation REST API.
+    """Test /store/addons/{addon}/changelog REST API.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6d8cf2 and 85756a6.

📒 Files selected for processing (2)

tests/api/test_store.py (1 hunks)
tests/conftest.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

`*/**(html|markdown|md)`: - For instructional content in doc...

*/**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.

In step-by-step instructions, front the location phrase in the instructional sentence.

In step-by-step instructions, front the 'goal' in the instructional sentence.

In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.

do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

tests/conftest.py

`*/**(html|markdown|md)`: - Use bold to mark UI strings. - I...

*/**(html|markdown|md): - Use bold to mark UI strings.

If "" are used to mark UI strings, replace them by bold.

tests/conftest.py

`*/**(html|markdown|md)`: - Be brief in your replies and don...

*/**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"

tests/conftest.py

`*/**(html|markdown|md)`: - Use sentence-style capitalizatio...

*/**(html|markdown|md): - Use sentence-style capitalization also in headings.

tests/conftest.py

`*/**(html|markdown|md)`: do not comment on HTML used for ic...

*/**(html|markdown|md): do not comment on HTML used for icons

tests/conftest.py

`*/**(html|markdown|md)`: Avoid flagging inline HTML for emb...

*/**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

tests/conftest.py

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Build armv7 supervisor
GitHub Check: Build armhf supervisor
GitHub Check: Build aarch64 supervisor
GitHub Check: Run tests Python 3.13.2

🔇 Additional comments (3)

tests/conftest.py (1)

519-519: Ensures test paths are available to addons in store.

This addition properly sets up the addon's location path in the store data, which is necessary for the new tests handling corrupted text files to access physical file paths.

tests/api/test_store.py (2)

354-370: Good test for handling corrupted documentation files.

This test properly verifies that the API can handle documentation files with invalid UTF-8 characters, replacing them with the Unicode replacement character (�) instead of failing with a UnicodeDecodeError. This aligns with the PR objective of improving user experience when encountering non-unicode characters.

373-389: Good test for handling corrupted changelog files.

This test correctly verifies that the API can handle changelog files with invalid UTF-8 characters, ensuring they're properly replaced with the Unicode replacement character (�) rather than failing with an exception.

Add-on documentation and changelog get read and returned as text file. However, in case the original author used non-unicode characters, or the file corrupted, loading currently fails with an UnicodeDecodeError. Let's just use the built-in replace error handling of Python, so they appear for the user as non-unicode characters by replacing them with the official unicode replacement character "�".

agners added the refactor A code change that neither fixes a bug nor adds a feature label Mar 3, 2025

home-assistant bot added the cla-signed label Mar 3, 2025

coderabbitai bot reviewed Mar 3, 2025

View reviewed changes

supervisor/api/store.py Outdated Show resolved Hide resolved

supervisor/api/store.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 3, 2025

View reviewed changes

supervisor/api/store.py Show resolved Hide resolved

agners requested a review from mdegat01 March 3, 2025 17:29

coderabbitai bot reviewed Mar 3, 2025

View reviewed changes

agners force-pushed the replace-non-unicode-characters-in-text-files branch from 85756a6 to 096e48c Compare March 3, 2025 17:32

agners added 4 commits March 3, 2025 19:22

Remove superflous parameter for binary files

07073b7

ruff format

96aaffc

Add pytests

4495b85

agners force-pushed the replace-non-unicode-characters-in-text-files branch from 096e48c to 4495b85 Compare March 3, 2025 18:22

mdegat01 approved these changes Mar 3, 2025

View reviewed changes

agners merged commit f8bab20 into main Mar 3, 2025
20 checks passed

agners deleted the replace-non-unicode-characters-in-text-files branch March 3, 2025 19:14

github-actions bot locked and limited conversation to collaborators Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace non-unicode characters for add-on static files #5712

Replace non-unicode characters for add-on static files #5712

agners commented Mar 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

Replace non-unicode characters for add-on static files #5712

Replace non-unicode characters for add-on static files #5712

Conversation

agners commented Mar 3, 2025 • edited by coderabbitai bot Loading

Proposed change

Type of change

Additional information

Checklist

Summary by CodeRabbit

coderabbitai bot commented Mar 3, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

agners commented Mar 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)