Skip to content

Commit 77ec997

Browse files
author
clams-bot
committed
adding metadata of whisper-wrapper.v10
1 parent 06edd69 commit 77ec997

File tree

5 files changed

+279
-2
lines changed

5 files changed

+279
-2
lines changed
+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
layout: posts
3+
classes: wide
4+
title: "Whisper Wrapper (v10)"
5+
date: 2024-08-29T22:13:54+00:00
6+
---
7+
## About this version
8+
9+
- Submitter: [keighrim](https://github.com/keighrim)
10+
- Submission Time: 2024-08-29T22:13:54+00:00
11+
- Prebuilt Container Image: [ghcr.io/clamsproject/app-whisper-wrapper:v10](https://github.com/clamsproject/app-whisper-wrapper/pkgs/container/app-whisper-wrapper/v10)
12+
- Release Notes
13+
14+
> This version adds some delegation parameters to whisper.transcribe
15+
> - `task`: delegate to `--task`
16+
> - `initialPrompt`: delegate to `--initial-prompt`
17+
> - `conditionOnPreviousText`: delegate to `--condition-on-previous-text`
18+
> - `noSpeechThreshold`: delegate to `--no-speech-threshold`
19+
20+
## About this app (See raw [metadata.json](metadata.json))
21+
22+
**A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.**
23+
24+
- App ID: [http://apps.clams.ai/whisper-wrapper/v10](http://apps.clams.ai/whisper-wrapper/v10)
25+
- App License: Apache 2.0
26+
- Source Repository: [https://github.com/clamsproject/app-whisper-wrapper](https://github.com/clamsproject/app-whisper-wrapper) ([source tree of the submitted version](https://github.com/clamsproject/app-whisper-wrapper/tree/v10))
27+
- Analyzer Version: 20231117
28+
- Analyzer License: MIT
29+
30+
31+
#### Inputs
32+
(**Note**: "*" as a property value means that the property is required but can be any value.)
33+
34+
One of the following is required: [
35+
- [http://mmif.clams.ai/vocabulary/AudioDocument/v1](http://mmif.clams.ai/vocabulary/AudioDocument/v1) (required)
36+
(of any properties)
37+
38+
- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
39+
(of any properties)
40+
41+
42+
43+
]
44+
45+
46+
#### Configurable Parameters
47+
(**Note**: _Multivalued_ means the parameter can have one or more values.)
48+
49+
- `modelSize`: optional, defaults to `tiny`
50+
51+
- Type: string
52+
- Multivalued: False
53+
- Choices: **_`tiny`_**, `True`, `base`, `b`, `small`, `s`, `medium`, `m`, `large`, `l`, `large-v2`, `l2`, `large-v3`, `l3`
54+
55+
56+
> The size of the model to use. When `modelLang=en` is given, for non-`large` models, English-only models will be used instead of multilingual models for speed and accuracy. (For `large` models, English-only models are not available.) (also can be given as alias: tiny=t, base=b, small=s, medium=m, large=l, large-v2=l2, large-v3=l3)
57+
- `modelLang`: required
58+
59+
- Type: string
60+
- Multivalued: False
61+
62+
63+
> Language of the model to use, accepts two- or three-letter ISO 639 language codes, however Whisper only supports a subset of languages. If the language is not supported, error will be raised.For the full list of supported languages, see https://github.com/openai/whisper/blob/20231117/whisper/tokenizer.py . In addition to the langauge code, two-letter region codes can be added to the language code, e.g. "en-US" for US English. Note that the region code is only for compatibility and recording purpose, and Whisper neither detects regional dialects, nor use the given one for transcription. When the langauge code is not given, Whisper will run in langauge detection mode, and will use first few seconds of the audio to detect the language.
64+
- `task`: optional, defaults to `transcribe`
65+
66+
- Type: string
67+
- Multivalued: False
68+
- Choices: **_`transcribe`_**, `translate`
69+
70+
71+
> (from whisper CLI) whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')
72+
- `initialPrompt`: required
73+
74+
- Type: string
75+
- Multivalued: False
76+
77+
78+
> (from whisper CLI) optional text to provide as a prompt for the first window.
79+
- `conditionOnPreviousText`: optional, defaults to `true`
80+
81+
- Type: boolean
82+
- Multivalued: False
83+
- Choices: `false`, **_`true`_**
84+
85+
86+
> (from whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
87+
- `noSpeechThreshold`: optional, defaults to `0.6`
88+
89+
- Type: number
90+
- Multivalued: False
91+
92+
93+
> (from whisper CLI) if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
94+
- `pretty`: optional, defaults to `false`
95+
96+
- Type: boolean
97+
- Multivalued: False
98+
- Choices: **_`false`_**, `true`
99+
100+
101+
> The JSON body of the HTTP response will be re-formatted with 2-space indentation
102+
- `runningTime`: optional, defaults to `false`
103+
104+
- Type: boolean
105+
- Multivalued: False
106+
- Choices: **_`false`_**, `true`
107+
108+
109+
> The running time of the app will be recorded in the view metadata
110+
- `hwFetch`: optional, defaults to `false`
111+
112+
- Type: boolean
113+
- Multivalued: False
114+
- Choices: **_`false`_**, `true`
115+
116+
117+
> The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
118+
119+
120+
#### Outputs
121+
(**Note**: "*" as a property value means that the property is required but can be any value.)
122+
123+
(**Note**: Not all output annotations are always generated.)
124+
125+
- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
126+
(of any properties)
127+
128+
- [http://mmif.clams.ai/vocabulary/TimeFrame/v5](http://mmif.clams.ai/vocabulary/TimeFrame/v5)
129+
- _timeUnit_ = "milliseconds"
130+
131+
- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
132+
(of any properties)
133+
134+
- [http://vocab.lappsgrid.org/Token](http://vocab.lappsgrid.org/Token)
135+
(of any properties)
136+
137+
- [http://vocab.lappsgrid.org/Sentence](http://vocab.lappsgrid.org/Sentence)
138+
(of any properties)
139+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
{
2+
"name": "Whisper Wrapper",
3+
"description": "A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.",
4+
"app_version": "v10",
5+
"mmif_version": "1.0.5",
6+
"analyzer_version": "20231117",
7+
"app_license": "Apache 2.0",
8+
"analyzer_license": "MIT",
9+
"identifier": "http://apps.clams.ai/whisper-wrapper/v10",
10+
"url": "https://github.com/clamsproject/app-whisper-wrapper",
11+
"input": [
12+
[
13+
{
14+
"@type": "http://mmif.clams.ai/vocabulary/AudioDocument/v1",
15+
"required": true
16+
},
17+
{
18+
"@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
19+
"required": true
20+
}
21+
]
22+
],
23+
"output": [
24+
{
25+
"@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
26+
},
27+
{
28+
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v5",
29+
"properties": {
30+
"timeUnit": "milliseconds"
31+
}
32+
},
33+
{
34+
"@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
35+
},
36+
{
37+
"@type": "http://vocab.lappsgrid.org/Token"
38+
},
39+
{
40+
"@type": "http://vocab.lappsgrid.org/Sentence"
41+
}
42+
],
43+
"parameters": [
44+
{
45+
"name": "modelSize",
46+
"description": "The size of the model to use. When `modelLang=en` is given, for non-`large` models, English-only models will be used instead of multilingual models for speed and accuracy. (For `large` models, English-only models are not available.) (also can be given as alias: tiny=t, base=b, small=s, medium=m, large=l, large-v2=l2, large-v3=l3)",
47+
"type": "string",
48+
"choices": [
49+
"tiny",
50+
true,
51+
"base",
52+
"b",
53+
"small",
54+
"s",
55+
"medium",
56+
"m",
57+
"large",
58+
"l",
59+
"large-v2",
60+
"l2",
61+
"large-v3",
62+
"l3"
63+
],
64+
"default": "tiny",
65+
"multivalued": false
66+
},
67+
{
68+
"name": "modelLang",
69+
"description": "Language of the model to use, accepts two- or three-letter ISO 639 language codes, however Whisper only supports a subset of languages. If the language is not supported, error will be raised.For the full list of supported languages, see https://github.com/openai/whisper/blob/20231117/whisper/tokenizer.py . In addition to the langauge code, two-letter region codes can be added to the language code, e.g. \"en-US\" for US English. Note that the region code is only for compatibility and recording purpose, and Whisper neither detects regional dialects, nor use the given one for transcription. When the langauge code is not given, Whisper will run in langauge detection mode, and will use first few seconds of the audio to detect the language.",
70+
"type": "string",
71+
"default": "",
72+
"multivalued": false
73+
},
74+
{
75+
"name": "task",
76+
"description": "(from whisper CLI) whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')",
77+
"type": "string",
78+
"choices": [
79+
"transcribe",
80+
"translate"
81+
],
82+
"default": "transcribe",
83+
"multivalued": false
84+
},
85+
{
86+
"name": "initialPrompt",
87+
"description": "(from whisper CLI) optional text to provide as a prompt for the first window.",
88+
"type": "string",
89+
"default": "",
90+
"multivalued": false
91+
},
92+
{
93+
"name": "conditionOnPreviousText",
94+
"description": "(from whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop",
95+
"type": "boolean",
96+
"default": true,
97+
"multivalued": false
98+
},
99+
{
100+
"name": "noSpeechThreshold",
101+
"description": "(from whisper CLI) if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence",
102+
"type": "number",
103+
"default": 0.6,
104+
"multivalued": false
105+
},
106+
{
107+
"name": "pretty",
108+
"description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
109+
"type": "boolean",
110+
"default": false,
111+
"multivalued": false
112+
},
113+
{
114+
"name": "runningTime",
115+
"description": "The running time of the app will be recorded in the view metadata",
116+
"type": "boolean",
117+
"default": false,
118+
"multivalued": false
119+
},
120+
{
121+
"name": "hwFetch",
122+
"description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
123+
"type": "boolean",
124+
"default": false,
125+
"multivalued": false
126+
}
127+
]
128+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"time": "2024-08-29T22:13:54+00:00",
3+
"submitter": "keighrim",
4+
"image": "ghcr.io/clamsproject/app-whisper-wrapper:v10",
5+
"releasenotes": "This version adds some delegation parameters to whisper.transcribe\n\n- `task`: delegate to `--task`\n- `initialPrompt`: delegate to `--initial-prompt`\n- `conditionOnPreviousText`: delegate to `--condition-on-previous-text`\n- `noSpeechThreshold`: delegate to `--no-speech-threshold`\n\n"
6+
}

docs/_data/app-index.json

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
{
22
"http://apps.clams.ai/whisper-wrapper": {
33
"description": "A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.",
4-
"latest_update": "2024-08-16T15:05:09+00:00",
4+
"latest_update": "2024-08-29T22:13:54+00:00",
55
"versions": [
6+
[
7+
"v10",
8+
"keighrim"
9+
],
610
[
711
"v9",
812
"keighrim"

docs/_data/apps.json

+1-1
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)