Blind estimation of audio effects using an auto-encoder approach and differentiable signal processing

Côme Peladeau and Geoffroy Peeters

LTCI, Télécom Paris, Institut Polytechnique de Paris, France.

Accepted to ICASSP 2024


Abstract

Blind estimation of audio effects (BE-AFX) aims at estimating the audio effects (AFX) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system, traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFX used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation.

Proposed approach

The audios labelled as "Input" are the unprocessed mixes, the "Ground Truth" are the processed audios to be reproduced and the others are the estimations of the gound truth. The figures were drawn using the first row of the tables.


Equalization estimation

The figure shows EQ curves estimated by neural networks using either a parametric or a graphic EQ. Both networks were trained using our approach. They both replicate the general shape of the curve, but fail to match sharp peaks and notches.

Matching an EQ curve
Ground truth and estimated EQ curves.
Input Ground truth Parametric EQ Graphic EQ

Dynamic range compression estimation

Here are the listening examples of dynamic range compression estimation.

Input Ground truth Neural proxy Hybrid proxy Simplified compressor

Distortion estimation

The figure below shows harmonic patterns of the effects applied to an amplitude 1 (0dBFS) sine wave. Note that the effects are memoryless, so they do not depend on the input signal's frequency (except if aliasing). Using the model used for synthesis facilitates the estimation. It also allows for better matching of the upper harmonics without having too many parameters.

Matching an EQ curve
Harmonics patterns of ground truth and estimated memoryless distortions.
Input Ground truth Soft clipper Taylor Chebyshev

Effects chain estimation

Here are the listening examples for the entire AFX chain estimation.

Input Ground truth Estimation