-
Notifications
You must be signed in to change notification settings - Fork 9.7k
fix(core): add support for reading and writing ISO-8859-1/latin1 files #14126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @ithallys, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where the CLI would inadvertently corrupt characters in files using single-byte encodings like ISO-8859-1 during modification. By implementing intelligent encoding detection and preservation mechanisms, the system can now correctly read and write such files, maintaining character integrity and preventing data loss for accented characters that were previously mishandled. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the issue of character corruption in files with single-byte encodings like ISO-8859-1 by adding encoding detection on read and preserving it on write. The changes in fileUtils.ts and fileSystemService.ts correctly use chardet to identify these encodings and handle them using latin1 to prevent data loss. The new test suite provides solid validation for these changes. My main feedback is to refactor the duplicated encoding detection logic into a shared utility function to improve code maintainability.
f039655 to
0be6833
Compare
Summary
This PR addresses an issue where the CLI would corrupt characters in files encoded in ISO-8859-1 (or similar single-byte encodings) when modifying them. Previously, files without a Byte Order Mark (BOM) were
defaulted to UTF-8, leading to data loss for accented characters. This change introduces encoding detection and preservation for non-UTF-8 files.
Details
file's encoding. If an ISO-8859-* or windows-125* encoding is identified, the file content is decoded using Node.js's latin1 encoding, which correctly handles these character sets.
before writing new content. If the existing file is detected as latin1 (or compatible), the updated content is written back using latin1, ensuring that the original encoding is maintained and character
corruption is prevented. New files or files with unsupported encodings will still default to utf-8.