Côme Peladeau and Geoffroy Peeters
LTCI, Télécom Paris, Institut Polytechnique de Paris, France.
Accepted to ICASSP 2024
Blind estimation of audio effects (BE-AFX) aims at estimating the audio effects (AFX) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system, traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFX used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation.
The audios labelled as "Input" are the unprocessed mixes, the "Ground Truth" are the processed audios to be reproduced and the others are the estimations of the gound truth. The figures were drawn using the first row of the tables.
The figure shows EQ curves estimated by neural networks using either a parametric or a graphic EQ. Both networks were trained using our approach. They both replicate the general shape of the curve, but fail to match sharp peaks and notches.
Input | Ground truth | Parametric EQ | Graphic EQ |
Here are the listening examples of dynamic range compression estimation.
Input | Ground truth | Neural proxy | Hybrid proxy | Simplified compressor |
The figure below shows harmonic patterns of the effects applied to an amplitude 1 (0dBFS) sine wave. Note that the effects are memoryless, so they do not depend on the input signal's frequency (except if aliasing). Using the model used for synthesis facilitates the estimation. It also allows for better matching of the upper harmonics without having too many parameters.
Input | Ground truth | Soft clipper | Taylor | Chebyshev |
Here are the listening examples for the entire AFX chain estimation.
Input | Ground truth | Estimation |