@@ -9,6 +9,20 @@ your favorite Digital Audio Workstation.
9
9
- Lightweight and very fast transcription
10
10
- Can scale and time quantize transcribed MIDI directly in the plugin
11
11
12
+ ## Install NeuralNote
13
+
14
+ Download the latest release for your platform [ here] ( https://github.com/DamRsn/NeuralNote/releases ) (Windows and Mac (
15
+ Universal) supported)!
16
+
17
+ Currently, only the raw ` .vst3 ` , ` .component ` (Audio Unit), ` .app ` and ` .exe ` (Standalone) files are provided.
18
+ Installers will be created soon. In the meantime, you can manually copy the plugin/app file in the appropriate
19
+ directory. Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security
20
+ settings, as it currently comes from an unidentified developer.
21
+
22
+ ## Usage
23
+
24
+ ![ UI] ( NeuralNote_UI.png )
25
+
12
26
NeuralNote comes as a simple AudioFX plugin (VST3/AU/Standalone app) to be applied on the track to transcribe.
13
27
14
28
The workflow is very simple:
@@ -19,26 +33,18 @@ The workflow is very simple:
19
33
- The midi transcription instantly appears in the piano roll section. Play with the different settings to adjust it.
20
34
- Export the MIDI transcription with a simple drag and drop from the plugin to a MIDI track.
21
35
36
+ ** Watch our presentation video for the Neural Audio Plugin
37
+ competition [ here] ( https://www.youtube.com/watch?v=6_MC0_aG_DQ ) ** .
38
+
22
39
NeuralNote uses internally the model from Spotify's [ basic-pitch] ( https://github.com/spotify/basic-pitch ) . See
23
40
their [ blogpost] ( https://engineering.atspotify.com/2022/06/meet-basic-pitch/ )
24
- and [ paper] ( https://arxiv.org/abs/2203.09893 ) for more information.
25
-
26
- In NeuralNote, basic-pitch is run
41
+ and [ paper] ( https://arxiv.org/abs/2203.09893 ) for more information. In NeuralNote, basic-pitch is run
27
42
using [ RTNeural] ( https://github.com/jatinchowdhury18/RTNeural ) for the CNN part
28
43
and [ ONNXRuntime] ( https://github.com/microsoft/onnxruntime ) for the feature part (Constant-Q transform calculation +
29
44
Harmonic Stacking).
30
45
As part of this project, [ we contributed to RTNeural] ( https://github.com/jatinchowdhury18/RTNeural/pull/89 ) to add 2D
31
46
convolution support.
32
47
33
- ## Install and use the plugin
34
-
35
- To simply install and start to use the plugin right away, download the latest release for your platform! (Windows and
36
- Mac (Universal) supported)
37
-
38
- Currently, only the .vst3, .component (Audio Unit), .app and .exe files are provided. Installers will be created soon.
39
- Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security settings, as
40
- it currently comes from an unidentified developer.
41
-
42
48
## Build from source
43
49
44
50
Use this when cloning:
@@ -65,23 +71,40 @@ with [ort-builder](https://github.com/olilarkin/ort-builder)) before calling CMa
65
71
66
72
#### IDEs
67
73
68
- Once the build script corresponding as been executed at least once, you can load this project in your favorite IDE
74
+ Once the build script has been executed at least once, you can load this project in your favorite IDE
69
75
(CLion/Visual Studio/VSCode/etc) and click 'build' for one of the targets.
70
76
77
+ ## Reuse code from NeuralNote’s transcription engine
78
+
79
+ All the code to perform the transcription is in ` Lib/Model ` and all the model weights are in ` Lib/ModelData/ ` . Feel free
80
+ to use only this part of the code in your own project! We'll try to isolate it more from the rest of the repo in the
81
+ future and make it a library.
82
+
83
+ The code to generate the files in ` Lib/ModelData/ ` is not currently available as it required a lot of manual operations.
84
+ But here's a description of the process we followed to create those files:
85
+
86
+ - ` features_model.onnx ` was generated by converting a keras model containing only the CQT + Harmonic Stacking part of
87
+ the full basic-pitch graph using ` tf2onnx ` (with manually added weights for batch normalization).
88
+ - the ` .json ` files containing the weights of the basic-pitch cnn were generated from the tensorflow-js model available
89
+ in the [ basic-pitch-tf repository] ( https://github.com/spotify/basic-pitch-ts ) , then converted to onnx with ` tf2onnx ` .
90
+ Finally, the weights were gathered manually to ` .npy ` thanks to [ Netron] ( https://netron.app/ ) and finally applied to a
91
+ split keras model created with [ basic-pitch] ( https://github.com/spotify/basic-pitch ) code.
92
+
93
+ The original basic-pitch CNN was split in 4 sequential models wired together, so they can be run with RTNeural.
94
+
71
95
## Roadmap
72
96
73
- - Improve stability.
74
- - Save plugin internal state properly, so it can be loaded back when reentering a session.
97
+ - Improve stability
98
+ - Save plugin internal state properly, so it can be loaded back when reentering a session
75
99
- Add tooltips
76
100
- Build a simple synth in the plugin so that one can listen to the transcription while playing with the settings, before
77
- export.
78
- - Allow pitch bends on non-overlapping parts of overlapping notes.
101
+ export
102
+ - Allow pitch bends on non-overlapping parts of overlapping notes
79
103
- Support transcription of mp3 files
80
104
81
105
## Bug reports and feature requests
82
106
83
- If you have any request/suggestion concerning the plugin or encounter a bug, please fill a Github issue, we'll
84
- do our best to address it.
107
+ If you have any request/suggestion concerning the plugin or encounter a bug, please file a GitHub issue.
85
108
86
109
## Contributing
87
110
@@ -103,6 +126,18 @@ Here's a list of all the third party libraries used in NeuralNote and the licens
103
126
- [ basic-pitch] ( https://github.com/spotify/basic-pitch ) (Apache-2.0 license)
104
127
- [ basic-pitch-ts] ( https://github.com/spotify/basic-pitch-ts ) (Apache-2.0 license)
105
128
129
+ ## Could NeuralNote transcribe audio in real-time?
130
+
131
+ Unfortunately no and this for a few reasons:
132
+
133
+ - Basic Pitch uses the Constant-Q transform (CQT) as input feature. The CQT requires really long audio chunks (> 1s) to
134
+ get amplitudes for the lowest frequency bins. This makes the latency too high to have real-time transcription.
135
+ - The basic pitch CNN has an additional latency of approximately 120ms.
136
+ - Very few DAWs support audio input/MIDI output plugins as far as I know. This is partially why NeuralNote is an
137
+ Audio FX plugin (audio-to-audio) and that MIDI is exported via drag and drop.
138
+
139
+ But if you have ideas please share!
140
+
106
141
## Credits
107
142
108
143
NeuralNote was developed by [ Damien Ronssin] ( https://github.com/DamRsn ) and [ Tibor Vass] ( https://github.com/tiborvass ) .
0 commit comments