@@ -9,6 +9,20 @@ your favorite Digital Audio Workstation.
99- Lightweight and very fast transcription
1010- Can scale and time quantize transcribed MIDI directly in the plugin
1111
12+ ## Install NeuralNote
13+
14+ Download the latest release for your platform [ here] ( https://github.com/DamRsn/NeuralNote/releases ) (Windows and Mac (
15+ Universal) supported)!
16+
17+ Currently, only the raw ` .vst3 ` , ` .component ` (Audio Unit), ` .app ` and ` .exe ` (Standalone) files are provided.
18+ Installers will be created soon. In the meantime, you can manually copy the plugin/app file in the appropriate
19+ directory. Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security
20+ settings, as it currently comes from an unidentified developer.
21+
22+ ## Usage
23+
24+ ![ UI] ( NeuralNote_UI.png )
25+
1226NeuralNote comes as a simple AudioFX plugin (VST3/AU/Standalone app) to be applied on the track to transcribe.
1327
1428The workflow is very simple:
@@ -19,26 +33,18 @@ The workflow is very simple:
1933- The midi transcription instantly appears in the piano roll section. Play with the different settings to adjust it.
2034- Export the MIDI transcription with a simple drag and drop from the plugin to a MIDI track.
2135
36+ ** Watch our presentation video for the Neural Audio Plugin
37+ competition [ here] ( https://www.youtube.com/watch?v=6_MC0_aG_DQ ) ** .
38+
2239NeuralNote uses internally the model from Spotify's [ basic-pitch] ( https://github.com/spotify/basic-pitch ) . See
2340their [ blogpost] ( https://engineering.atspotify.com/2022/06/meet-basic-pitch/ )
24- and [ paper] ( https://arxiv.org/abs/2203.09893 ) for more information.
25-
26- In NeuralNote, basic-pitch is run
41+ and [ paper] ( https://arxiv.org/abs/2203.09893 ) for more information. In NeuralNote, basic-pitch is run
2742using [ RTNeural] ( https://github.com/jatinchowdhury18/RTNeural ) for the CNN part
2843and [ ONNXRuntime] ( https://github.com/microsoft/onnxruntime ) for the feature part (Constant-Q transform calculation +
2944Harmonic Stacking).
3045As part of this project, [ we contributed to RTNeural] ( https://github.com/jatinchowdhury18/RTNeural/pull/89 ) to add 2D
3146convolution support.
3247
33- ## Install and use the plugin
34-
35- To simply install and start to use the plugin right away, download the latest release for your platform! (Windows and
36- Mac (Universal) supported)
37-
38- Currently, only the .vst3, .component (Audio Unit), .app and .exe files are provided. Installers will be created soon.
39- Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security settings, as
40- it currently comes from an unidentified developer.
41-
4248## Build from source
4349
4450Use this when cloning:
@@ -65,23 +71,40 @@ with [ort-builder](https://github.com/olilarkin/ort-builder)) before calling CMa
6571
6672#### IDEs
6773
68- Once the build script corresponding as been executed at least once, you can load this project in your favorite IDE
74+ Once the build script has been executed at least once, you can load this project in your favorite IDE
6975(CLion/Visual Studio/VSCode/etc) and click 'build' for one of the targets.
7076
77+ ## Reuse code from NeuralNote’s transcription engine
78+
79+ All the code to perform the transcription is in ` Lib/Model ` and all the model weights are in ` Lib/ModelData/ ` . Feel free
80+ to use only this part of the code in your own project! We'll try to isolate it more from the rest of the repo in the
81+ future and make it a library.
82+
83+ The code to generate the files in ` Lib/ModelData/ ` is not currently available as it required a lot of manual operations.
84+ But here's a description of the process we followed to create those files:
85+
86+ - ` features_model.onnx ` was generated by converting a keras model containing only the CQT + Harmonic Stacking part of
87+ the full basic-pitch graph using ` tf2onnx ` (with manually added weights for batch normalization).
88+ - the ` .json ` files containing the weights of the basic-pitch cnn were generated from the tensorflow-js model available
89+ in the [ basic-pitch-tf repository] ( https://github.com/spotify/basic-pitch-ts ) , then converted to onnx with ` tf2onnx ` .
90+ Finally, the weights were gathered manually to ` .npy ` thanks to [ Netron] ( https://netron.app/ ) and finally applied to a
91+ split keras model created with [ basic-pitch] ( https://github.com/spotify/basic-pitch ) code.
92+
93+ The original basic-pitch CNN was split in 4 sequential models wired together, so they can be run with RTNeural.
94+
7195## Roadmap
7296
73- - Improve stability.
74- - Save plugin internal state properly, so it can be loaded back when reentering a session.
97+ - Improve stability
98+ - Save plugin internal state properly, so it can be loaded back when reentering a session
7599- Add tooltips
76100- Build a simple synth in the plugin so that one can listen to the transcription while playing with the settings, before
77- export.
78- - Allow pitch bends on non-overlapping parts of overlapping notes.
101+ export
102+ - Allow pitch bends on non-overlapping parts of overlapping notes
79103- Support transcription of mp3 files
80104
81105## Bug reports and feature requests
82106
83- If you have any request/suggestion concerning the plugin or encounter a bug, please fill a Github issue, we'll
84- do our best to address it.
107+ If you have any request/suggestion concerning the plugin or encounter a bug, please file a GitHub issue.
85108
86109## Contributing
87110
@@ -103,6 +126,18 @@ Here's a list of all the third party libraries used in NeuralNote and the licens
103126- [ basic-pitch] ( https://github.com/spotify/basic-pitch ) (Apache-2.0 license)
104127- [ basic-pitch-ts] ( https://github.com/spotify/basic-pitch-ts ) (Apache-2.0 license)
105128
129+ ## Could NeuralNote transcribe audio in real-time?
130+
131+ Unfortunately no and this for a few reasons:
132+
133+ - Basic Pitch uses the Constant-Q transform (CQT) as input feature. The CQT requires really long audio chunks (> 1s) to
134+ get amplitudes for the lowest frequency bins. This makes the latency too high to have real-time transcription.
135+ - The basic pitch CNN has an additional latency of approximately 120ms.
136+ - Very few DAWs support audio input/MIDI output plugins as far as I know. This is partially why NeuralNote is an
137+ Audio FX plugin (audio-to-audio) and that MIDI is exported via drag and drop.
138+
139+ But if you have ideas please share!
140+
106141## Credits
107142
108143NeuralNote was developed by [ Damien Ronssin] ( https://github.com/DamRsn ) and [ Tibor Vass] ( https://github.com/tiborvass ) .
0 commit comments