|
| 1 | +--- |
| 2 | +layout: posts |
| 3 | +classes: wide |
| 4 | +title: "Distil Whisper Wrapper (v1.2)" |
| 5 | +date: 2024-08-08T15:48:34+00:00 |
| 6 | +--- |
| 7 | +## About this version |
| 8 | + |
| 9 | +- Submitter: [BenLambright](https://github.com/BenLambright) |
| 10 | +- Submission Time: 2024-08-08T15:48:34+00:00 |
| 11 | +- Prebuilt Container Image: [ghcr.io/clamsproject/app-distil-whisper-wrapper:v1.2](https://github.com/clamsproject/app-distil-whisper-wrapper/pkgs/container/app-distil-whisper-wrapper/v1.2) |
| 12 | +- Release Notes |
| 13 | + |
| 14 | + > reverting back to HF pipeline using chunking transcription |
| 15 | +
|
| 16 | +## About this app (See raw [metadata.json](metadata.json)) |
| 17 | + |
| 18 | +**The wrapper of Distil-Whisper, avaliable models: distil-large-v3, distil-large-v2, distil-medium.en, distil-small.en. The default model is distil-small.en.** |
| 19 | + |
| 20 | +- App ID: [http://apps.clams.ai/distil-whisper-wrapper/v1.2](http://apps.clams.ai/distil-whisper-wrapper/v1.2) |
| 21 | +- App License: Apache 2.0 |
| 22 | +- Source Repository: [https://github.com/clamsproject/app-distil-whisper-wrapper](https://github.com/clamsproject/app-distil-whisper-wrapper) ([source tree of the submitted version](https://github.com/clamsproject/app-distil-whisper-wrapper/tree/v1.2)) |
| 23 | +- Analyzer Version: 1.0 |
| 24 | +- Analyzer License: MIT |
| 25 | + |
| 26 | + |
| 27 | +#### Inputs |
| 28 | +(**Note**: "*" as a property value means that the property is required but can be any value.) |
| 29 | + |
| 30 | +One of the following is required: [ |
| 31 | +- [http://mmif.clams.ai/vocabulary/AudioDocument/v1](http://mmif.clams.ai/vocabulary/AudioDocument/v1) (required) |
| 32 | +(of any properties) |
| 33 | + |
| 34 | +- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required) |
| 35 | +(of any properties) |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +] |
| 40 | + |
| 41 | + |
| 42 | +#### Configurable Parameters |
| 43 | +(**Note**: _Multivalued_ means the parameter can have one or more values.) |
| 44 | + |
| 45 | +- `modelSize`: optional, defaults to `distil-small.en` |
| 46 | + |
| 47 | + - Type: string |
| 48 | + - Multivalued: False |
| 49 | + - Choices: `distil-large-v3`, `distil-large-v2`, `distil-medium.en`, **_`distil-small.en`_**, `small`, `s`, `medium`, `m`, `large-v2`, `l2`, `large-v3`, `l3` |
| 50 | + |
| 51 | + |
| 52 | + > The size of the model to use. There are four size of model to use distil-large-v3, distil-large-v2, distil-medium.en, distil-small.en. You can also enter the abbreviation of the model as parameter. 'small' and 's' for distil-small.en; 'medium' and 'm' for distil-medium.en; 'large-v2' and 'l2' for distil-large-v2; 'large-v3' and 'l3' for distil-large-v3. The default model is distil-medium.en.) |
| 53 | +- `pretty`: optional, defaults to `false` |
| 54 | + |
| 55 | + - Type: boolean |
| 56 | + - Multivalued: False |
| 57 | + - Choices: **_`false`_**, `true` |
| 58 | + |
| 59 | + |
| 60 | + > The JSON body of the HTTP response will be re-formatted with 2-space indentation |
| 61 | +- `runningTime`: optional, defaults to `false` |
| 62 | + |
| 63 | + - Type: boolean |
| 64 | + - Multivalued: False |
| 65 | + - Choices: **_`false`_**, `true` |
| 66 | + |
| 67 | + |
| 68 | + > The running time of the app will be recorded in the view metadata |
| 69 | +- `hwFetch`: optional, defaults to `false` |
| 70 | + |
| 71 | + - Type: boolean |
| 72 | + - Multivalued: False |
| 73 | + - Choices: **_`false`_**, `true` |
| 74 | + |
| 75 | + |
| 76 | + > The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata |
| 77 | +
|
| 78 | + |
| 79 | +#### Outputs |
| 80 | +(**Note**: "*" as a property value means that the property is required but can be any value.) |
| 81 | + |
| 82 | +(**Note**: Not all output annotations are always generated.) |
| 83 | + |
| 84 | +- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1) |
| 85 | + - _@lang_ = "en" |
| 86 | + |
| 87 | + > Fully serialized text content of the recognized text in the input audio/video. |
| 88 | +- [http://mmif.clams.ai/vocabulary/TimeFrame/v5](http://mmif.clams.ai/vocabulary/TimeFrame/v5) |
| 89 | + - _timeUnit_ = "milliseconds" |
| 90 | + |
| 91 | +- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1) |
| 92 | +(of any properties) |
| 93 | + |
| 94 | + > Alignments between 1) `TimeFrame` <-> `SENTENCE`, 2) `audio/video document` <-> `TextDocument` |
| 95 | +- [http://vocab.lappsgrid.org/Sentence](http://vocab.lappsgrid.org/Sentence) |
| 96 | +(of any properties) |
| 97 | + |
| 98 | + > The smallest recognized unit of distil-whisper. Normally a complete sentence. |
0 commit comments