Peladeau - ICASSP 2024

Abstract

Blind estimation of audio effects (BE-AFX) aims at estimating the audio effects (AFX) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system, traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFX used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation.

The audios labelled as "Input" are the unprocessed mixes, the "Ground Truth" are the processed audios to be reproduced and the others are the estimations of the gound truth. The figures were drawn using the first row of the tables.

Equalization estimation

The figure shows EQ curves estimated by neural networks using either a parametric or a graphic EQ. Both networks were trained using our approach. They both replicate the general shape of the curve, but fail to match sharp peaks and notches.

Matching an EQ curve — Ground truth and estimated EQ curves.

Input	Ground truth	Parametric EQ	Graphic EQ

Dynamic range compression estimation

Here are the listening examples of dynamic range compression estimation.

Input	Ground truth	Neural proxy	Hybrid proxy	Simplified compressor

Distortion estimation

The figure below shows harmonic patterns of the effects applied to an amplitude 1 (0dBFS) sine wave. Note that the effects are memoryless, so they do not depend on the input signal's frequency (except if aliasing). Using the model used for synthesis facilitates the estimation. It also allows for better matching of the upper harmonics without having too many parameters.

Input	Ground truth	Soft clipper	Taylor	Chebyshev

Effects chain estimation

Here are the listening examples for the entire AFX chain estimation.

Input	Ground truth	Estimation