Speech enhancement refers to the important task of removing distorting sounds from speech recordings, e.g., when someone is talking in a crowded place with a lot of background noise as can be seen in this picture:
Using deep neural networks, it is possible to reduce considerably the noise in such speech recordings while keeping the speech signal – and, hence, it is currently an active field of research.
We are happy to announce that Open-Unmix, which is an open-source implementation developed by INRIA in close collaboration with Sony now also supports speech enhancement. Although it was originally developed for music separation, it can also be used for the task of speech enhancement.
We trained Open-Unmix on the VoiceBank+DEMAND corpus (28 speaker version @ 16kHz sampling rate) and the learned ONNX model is available here. It achieves the following scores on the test set of VoiceBank+DEMAND: