Skip to content

Fix: Replace manual unzip shell command with Python zipfile#2229

Open
jaytiwarihub wants to merge 4 commits intokeras-team:masterfrom
jaytiwarihub:fix-audio-dataset-unzip
Open

Fix: Replace manual unzip shell command with Python zipfile#2229
jaytiwarihub wants to merge 4 commits intokeras-team:masterfrom
jaytiwarihub:fix-audio-dataset-unzip

Conversation

@jaytiwarihub
Copy link
Copy Markdown

Description
The current tutorial relies on a shell command (unzip) to extract the dataset. This fails on Windows environments where unzip is not installed by default, and it interrupts the flow for users who haven't manually configured the Kaggle CLI.

Changes

  • Replaced the manual shell instructions with Python's built-in zipfile module.
  • Added a check to verify if the dataset exists before attempting extraction.
  • Updated the code to be cross-platform (works on Windows, Linux, and macOS).

Verification

  • I tested the script locally.
  • Verified that the zip file is correctly extracted to the 16000_pcm_speeches directory automatically.

@google-cla
Copy link
Copy Markdown

google-cla bot commented Jan 3, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @jaytiwarihub, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness and cross-platform compatibility of the speaker recognition tutorial by refactoring the dataset extraction process. It moves away from platform-dependent shell commands to a pure Python solution using the zipfile module, which includes checks for dataset availability and provides user-friendly guidance.

Highlights

  • Cross-platform compatibility: Replaced the reliance on the unzip shell command with Python's built-in zipfile module, ensuring the tutorial works seamlessly across Windows, Linux, and macOS environments.
  • Robust dataset handling: Implemented checks to verify the existence of the dataset before attempting extraction, providing clearer instructions to users if the dataset or zip file is missing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a good improvement, replacing the platform-dependent unzip command with Python's zipfile module. This enhances cross-platform compatibility, especially for Windows users. The added check for the dataset's existence is also a thoughtful addition. I've pointed out a small redundancy in variable definition and suggested a minor change to improve consistency in path handling using pathlib.

Comment on lines +60 to +61
if not os.path.exists(DATASET_ROOT):
if os.path.exists(ZIP_FILE):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pathlib module is already imported and used in this file for path operations. For consistency and better cross-platform compatibility, it's good practice to use pathlib.Path for checking file existence as well, instead of os.path.exists.

Suggested change
if not os.path.exists(DATASET_ROOT):
if os.path.exists(ZIP_FILE):
if not Path(DATASET_ROOT).exists():
if Path(ZIP_FILE).exists():

print(f"Save it as '{ZIP_FILE}' in this directory and run again.")
exit()

DATASET_ROOT = "16000_pcm_speeches"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DATASET_ROOT variable is already defined on line 56. This re-definition is redundant and can be safely removed.

@jaytiwarihub
Copy link
Copy Markdown
Author

@googlebot I signed it!

1 similar comment
@jaytiwarihub
Copy link
Copy Markdown
Author

@googlebot I signed it!

@jaytiwarihub jaytiwarihub force-pushed the fix-audio-dataset-unzip branch from dd8b736 to 85896c5 Compare January 5, 2026 13:43
@jaytiwarihub
Copy link
Copy Markdown
Author

@googlebot I signed it!

1 similar comment
@jaytiwarihub
Copy link
Copy Markdown
Author

@googlebot I signed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants