Skip to content

feat: adds streamingV2 sample for speech #10079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

amirx-cloud
Copy link

Description

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

  • I have followed Sample Format Guide
  • pom.xml parent set to latest shared-configuration
  • Appropriate changes to README are included in PR
  • These samples need a new API enabled in testing projects to pass (let us know which ones)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
  • Tests pass: mvn clean verify required
  • Lint passes: mvn -P lint checkstyle:check required
  • Static Analysis: mvn -P lint clean compile pmd:cpd-check spotbugs:check advisory only
  • This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
  • Please merge this PR for me once it is approved

@amirx-cloud amirx-cloud requested review from yoshi-approver and a team as code owners April 23, 2025 17:11
Copy link

snippet-bot bot commented Apr 23, 2025

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: speech Issues related to the Speech-to-Text API. labels Apr 23, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @amirx-cloud, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request adds a new sample, TranscribeStreamingV2.java, demonstrating how to transcribe audio from a file stream using the Google Cloud Speech-to-Text API V2. It includes a corresponding integration test, TranscribeStreamingV2IT.java, to verify the functionality. The sample reads an audio file, splits it into chunks, sends these chunks to the Speech-to-Text API, and then prints the transcribed text.

Highlights

  • New Sample: Streaming Speech-to-Text V2: Introduces a new sample demonstrating streaming transcription using the Speech-to-Text V2 API.
  • Audio Chunking: The sample code splits the audio file into smaller chunks for streaming to the API.
  • Integration Test: An integration test is included to verify the accuracy of the transcription.

Changelog

  • speech/src/main/java/com/example/speech/TranscribeStreamingV2.java
    • Added a new class TranscribeStreamingV2 to demonstrate streaming transcription.
    • Implements the transcribeStreamingV2 method to read audio from a file, chunk it, and send it to the Speech-to-Text API.
    • Includes a main method to run the transcription on a sample audio file (brooklyn_bridge.wav).
  • speech/src/test/java/com/example/speech/TranscribeStreamingV2IT.java
    • Added a new integration test TranscribeStreamingV2IT to verify the streaming transcription functionality.
    • The test reads the brooklyn_bridge.wav audio file and asserts that the transcribed text contains the expected phrase 'how old is the Brooklyn Bridge'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A stream of sound,
To text it is bound,
Words from the cloud.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new sample for streaming speech transcription using the V2 API. The code appears to be well-structured and includes a test case. However, there are a few areas that could be improved for clarity and robustness.

Summary of Findings

  • Missing Error Handling: The code lacks proper error handling for file operations and API calls. This could lead to unexpected behavior or crashes if errors occur during these operations. Specifically, the Files.readAllBytes method in TranscribeStreamingV2 and the audio file loading in the test case should include error handling.
  • Magic Number for Chunk Length: The chunk length calculation audioContent.length / 5 uses a magic number (5). It would be better to define this as a constant with a descriptive name to improve readability and maintainability.
  • Incomplete Javadoc: The Javadoc comments for the transcribeStreamingV2 method are incomplete. The Args and Returns sections are commented out, and the description is not very detailed. Providing comprehensive Javadoc is crucial for understanding and maintaining the code.

Merge Readiness

The pull request introduces a valuable new sample. However, the identified issues related to error handling and code clarity should be addressed before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging. Addressing the high severity issues is crucial for the stability and reliability of the sample.

try (SpeechClient client = SpeechClient.create()) {

Path path = Paths.get(streamFile);
byte[] audioContent = Files.readAllBytes(path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling for the Files.readAllBytes method. If the file does not exist or cannot be read, an IOException will be thrown, and the program will crash. Wrapping this in a try-catch block will make the code more robust.

What happens if the file does not exist?

            try {
                audioContent = Files.readAllBytes(path);
            } catch (IOException e) {
                System.err.println("Failed to read audio file: " + e.getMessage());
                return null; // Or throw the exception, depending on desired behavior
            }

String testFilePath = "./resources/brooklyn_bridge.wav";

// Call the method to test
List<StreamingRecognizeResponse> responses = TranscribeStreamingV2.transcribeStreamingV2(testFilePath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling for the TranscribeStreamingV2.transcribeStreamingV2 method call. If an IOException is thrown, the test will fail without a clear error message. Wrapping this in a try-catch block will make the test more robust.

        try {
            responses = TranscribeStreamingV2.transcribeStreamingV2(testFilePath);
        } catch (IOException e) {
            System.err.println("Failed to transcribe audio: " + e.getMessage());
            throw e; // Re-throw the exception to fail the test
        }

Comment on lines +36 to +43
// Transcribes audio from an audio file stream using Google Cloud Speech-to-Text API.
// Args:
// streamFile (String): Path to the local audio file to be transcribed.
// Example: "resources/audio.wav"
// Returns:
// List<StreamingRecognizeResponse>: A list of objects.
// Each response includes the transcription results for the corresponding audio segment.
//

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Javadoc comments are incomplete. Please provide a comprehensive description of the method, its arguments, and return values. The Args and Returns sections are commented out. A complete Javadoc is important for understanding and maintaining the code.

    /**
     * Transcribes audio from an audio file stream using Google Cloud Speech-to-Text API.
     *
     * @param streamFile (String): Path to the local audio file to be transcribed.
     *                   Example: "resources/audio.wav"
     * @return List<StreamingRecognizeResponse>: A list of objects.
     *         Each response includes the transcription results for the corresponding audio segment.
     * @throws IOException if an I/O error occurs reading the file.
     */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

The Javadoc comments as-written look correct for Python; however, Javadoc style is different.

byte[] audioContent = Files.readAllBytes(path);

// In practice, stream should be a generator yielding chunks of audio data
int chunkLength = audioContent.length / 5;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 5 is a magic number. Define a constant with a descriptive name to represent the number of chunks. This improves readability and maintainability.

            int numChunks = 5; // Define the number of chunks
            int chunkLength = audioContent.length / numChunks;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

question: why use 5? Provide a comment explaining why the number must be 5 and make this value a constant.

.build();

StreamingRecognizeRequest configRequest = StreamingRecognizeRequest.newBuilder()
.setRecognizer(String.format("projects/%s/locations/global/recognizers/_", PROJECT_ID))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The recognizer is set to _. Should this be a specific recognizer? If so, please specify it, otherwise it should be removed.

                    .setRecognizer(String.format("projects/%s/locations/global/recognizers/{YOUR_RECOGNIZER}", PROJECT_ID))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

If the user needs to provide a specific recognizer by ID, pass in the recognizer ID into the sample function.

@telpirion telpirion changed the title Adding StreamingV2 sample for speech feat: adds streamingV2 sample for speech Apr 28, 2025
@telpirion telpirion assigned telpirion and unassigned bourgeoisor Apr 28, 2025
@telpirion telpirion marked this pull request as draft April 28, 2025 22:59
Copy link
Contributor

@telpirion telpirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amirx-cloud -- thank you for submitting this PR!

I changed the status of this PR to "draft" because it needs some work. I've left comments to help you improve the quality of the PR. Please reopen the PR once you've addressed the comments.

Related -- the CI/CD reports show that there is an NPE somewhere in the code sample. Be sure to address that problem before re-opening.

import java.util.List;

public class TranscribeStreamingV2 {
private static final String PROJECT_ID = System.getenv("GOOGLE_CLOUD_PROJECT");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: pass project ID into the sample function. Look at other samples for how to do this.

// [START speech_to_text_transcribe_streaming_v2]

import com.google.api.gax.rpc.BidiStream;
import com.google.cloud.speech.v2.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: import each type individually -- don't use star imports.

Comment on lines +36 to +43
// Transcribes audio from an audio file stream using Google Cloud Speech-to-Text API.
// Args:
// streamFile (String): Path to the local audio file to be transcribed.
// Example: "resources/audio.wav"
// Returns:
// List<StreamingRecognizeResponse>: A list of objects.
// Each response includes the transcription results for the corresponding audio segment.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

The Javadoc comments as-written look correct for Python; however, Javadoc style is different.

// List<StreamingRecognizeResponse>: A list of objects.
// Each response includes the transcription results for the corresponding audio segment.
//
public static List<StreamingRecognizeResponse> transcribeStreamingV2(String streamFile) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: provide a main() method that invokes the sample method.

Path path = Paths.get(streamFile);
byte[] audioContent = Files.readAllBytes(path);

// In practice, stream should be a generator yielding chunks of audio data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: fix the formatting on this comment.

StreamingRecognizeResponse response = responseIterator.next();
System.out.println(response);
// Process the response and extract the transcript
System.out.println("Transcript: " + response.getResultsList().get(0).getAlternativesList().get(0).getTranscript());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: put individual method calls on separate lines for better readability

System.out.println("Transcript: " + response.getResultsList().get(0).getAlternativesList().get(0).getTranscript());
responses.add(response);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: remove excess blank lines.

String transcript = "";
for (StreamingRecognizeResponse response : responses) {
if (response.getResultsCount() > 0) {
transcript += response.getResults(0).getAlternatives(0).getTranscript();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: put individual method calls on separate lines.

}
}
// Use a regex to match the expected transcript
Pattern pattern = Pattern.compile("how old is the Brooklyn Bridge", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: check for "Transcript:" only, if you check for output from model. The model doesn't deterministically produce "how old is the Brooklyn Bridge", but DOES deterministically return "Transcript:".


// In practice, stream should be a generator yielding chunks of audio data
int chunkLength = audioContent.length / 5;
List<byte[]> stream = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: rename this variable to something like "chunks".

@telpirion
Copy link
Contributor

Tip: You should be able to find the linting errors by running:

mvn -P lint --quiet --batch-mode checkstyle:check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: speech Issues related to the Speech-to-Text API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants