Skip to content

Conversation

@ithallys
Copy link

Summary

This PR addresses an issue where the CLI would corrupt characters in files encoded in ISO-8859-1 (or similar single-byte encodings) when modifying them. Previously, files without a Byte Order Mark (BOM) were
defaulted to UTF-8, leading to data loss for accented characters. This change introduces encoding detection and preservation for non-UTF-8 files.

Details

  • Intelligent File Reading: The readFileWithEncoding function in packages/core/src/utils/fileUtils.ts has been enhanced. After checking for a BOM, it now uses chardet (via systemEncoding) to detect the
    file's encoding. If an ISO-8859-* or windows-125* encoding is identified, the file content is decoded using Node.js's latin1 encoding, which correctly handles these character sets.
  • Encoding Preservation on Write: The StandardFileSystemService.writeTextFile method in packages/core/src/services/fileSystemService.ts now attempts to determine the original encoding of an existing file
    before writing new content. If the existing file is detected as latin1 (or compatible), the updated content is written back using latin1, ensuring that the original encoding is maintained and character
    corruption is prevented. New files or files with unsupported encodings will still default to utf-8.
  • New Test Case: A new dedicated test file, packages/core/src/utils/iso-encoding.test.ts, has been added to validate the correct reading and writing of ISO-8859-1 encoded files.

@ithallys ithallys requested a review from a team as a code owner November 29, 2025 03:16
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ithallys, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where the CLI would inadvertently corrupt characters in files using single-byte encodings like ISO-8859-1 during modification. By implementing intelligent encoding detection and preservation mechanisms, the system can now correctly read and write such files, maintaining character integrity and preventing data loss for accented characters that were previously mishandled.

Highlights

  • Enhanced File Encoding Detection: The readFileWithEncoding function now intelligently detects file encodings, specifically ISO-8859-* and windows-125*, using chardet when no Byte Order Mark (BOM) is present, and decodes them using Node.js's latin1 encoding.
  • Preservation of Original Encoding on Write: The writeTextFile method has been updated to preserve the original encoding of existing files. If an existing file is detected as latin1 (or compatible), new content is written back using latin1 to prevent character corruption. New files or those with unsupported encodings will still default to UTF-8.
  • Comprehensive ISO-8859-1 Test Coverage: A new dedicated test file, iso-encoding.test.ts, has been introduced to thoroughly validate the correct reading and writing of ISO-8859-1 encoded files, ensuring the robustness of the new encoding handling logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of character corruption in files with single-byte encodings like ISO-8859-1 by adding encoding detection on read and preserving it on write. The changes in fileUtils.ts and fileSystemService.ts correctly use chardet to identify these encodings and handle them using latin1 to prevent data loss. The new test suite provides solid validation for these changes. My main feedback is to refactor the duplicated encoding detection logic into a shared utility function to improve code maintainability.

@ithallys ithallys force-pushed the fix/iso-encoding-support branch from f039655 to 0be6833 Compare November 29, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant