Skip to content

Commit a8aec77

Browse files
add audio scribe sample (#1475)
* add audio scribe sample * Fix wrong parameter name * Cleanup and more robust audio scribe * better manifest descriptions * demo chat app clean up
1 parent a42c412 commit a8aec77

File tree

20 files changed

+808
-6
lines changed

20 files changed

+808
-6
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ node_modules
55
_debug
66
_metadata
77
dist
8-
*.swp # vim temp files
8+
**/*.swp

functional-samples/ai.gemini-on-device-alt-texter/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# On-device multimodal AI with Gemini Nano - image understanding
1+
# Alt-texter: On-device multimodal AI with Gemini Nano - image understanding
22

33
This sample demonstrates how to use the image understanding capabilities of the multi-modal Gemini Nano API preview together with [Chrome's translation API](https://developer.chrome.com/docs/ai/translator-api). To learn more about the API and how to sign-up for the origin trial, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api).
44

functional-samples/ai.gemini-on-device-alt-texter/background.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ chrome.contextMenus.onClicked.addListener(async (info, tab) => {
3535
]);
3636
chrome.runtime.sendMessage({
3737
action: 'alt-text',
38-
text: result.value === 'fulfilled' ? result.value : result.reason.message
38+
text: result.status === 'fulfilled' ? result.value : result.reason.message
3939
});
4040
}
4141
});

functional-samples/ai.gemini-on-device-alt-texter/manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"manifest_version": 3,
33
"name": "Alt Texter",
44
"version": "1.0",
5-
"description": "Generates alt text for images using the Prompt API.",
5+
"description": "Generates alt text for images using the Gemini Nano Prompt API.",
66
"permissions": ["contextMenus", "clipboardWrite"],
77
"host_permissions": ["<all_urls>"],
88
"minimum_chrome_version": "138",
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Audio-Scribe: On-device multimodal AI with Gemini Nano - audio transcription
2+
3+
This sample demonstrates how to use the audio transcription capabilities of the multi-modal Gemini Nano API preview. To learn more about the API and how to sign-up for the origin trial, head over to [Built-in AI on developer.chrome.com](https://developer.chrome.com/docs/extensions/ai/prompt-api).
4+
5+
## Overview
6+
7+
This extension adds a sidepanel that will, when opened, display a transcription of all audio files on a web page (currently it looks only for audio files created using `URL.createObjectUrl`).
8+
9+
## Running this extension
10+
11+
1. Clone this repository.
12+
1. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked).
13+
1. Open the audio-scribe sidepanel by clicking the audio-scribe action or by pressing the `ALT + A` keyboard shortcut.
14+
1. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the demo chat app via:
15+
```
16+
npx serve demo-chat-app
17+
```
18+
1. All audio messages in the current chat will be transcribed in the side panel.
19+
20+
![Screenshot displaying a demo chat app with a few audio messages. On the right, there is the audio-scribe extension's sidepanel which displayes the transcribed text messages](assets/screenshot.png)
13.5 KB
Loading
114 KB
Loading
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
// Copyright 2025 Google LLC
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
chrome.sidePanel.setPanelBehavior({ openPanelOnActionClick: true });
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// Copyright 2025 Google LLC
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
// Forward messages from the content script in the MAIN world to the
16+
// side panel
17+
window.addEventListener('message', ({ data }) => {
18+
if (data.type !== 'audio-scribe') {
19+
return;
20+
}
21+
chrome.runtime.sendMessage({ data });
22+
});
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
<!doctype html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8" />
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6+
<title>Chat App Demo</title>
7+
<link rel="stylesheet" href="style.css" />
8+
</head>
9+
<body>
10+
<div class="app-container">
11+
<div class="sidebar">
12+
<!-- Chat list will go here -->
13+
<h2>Chats</h2>
14+
<ul id="chat-list">
15+
<!-- Example chat items -->
16+
<li class="chat-item active" data-chat="Alice">
17+
<span class="avatar">😊</span>
18+
<span>Alice</span>
19+
</li>
20+
<li class="chat-item" data-chat="Bob">
21+
<span class="avatar">😎</span>
22+
<span>Bob</span>
23+
</li>
24+
<li class="chat-item" data-chat="Charlie">
25+
<span class="avatar">🥳</span>
26+
<span>Charlie</span>
27+
</li>
28+
</ul>
29+
</div>
30+
<div class="chat-panel">
31+
<div class="chat-header">
32+
<!-- Header for the current chat -->
33+
<span class="avatar" id="current-chat-avatar">😊</span>
34+
<h3 id="current-chat-name">Alice</h3>
35+
</div>
36+
<div class="message-list" id="message-list">
37+
<!-- Messages will be loaded here by JavaScript -->
38+
</div>
39+
<div class="message-input">
40+
<input
41+
type="text"
42+
id="message-input-field"
43+
placeholder="Type a message..."
44+
/>
45+
<button id="send-button">Send</button>
46+
</div>
47+
</div>
48+
</div>
49+
50+
<script src="script.js"></script>
51+
</body>
52+
</html>

0 commit comments

Comments
 (0)