Audio Declipping

A Survey and an Extensive Evaluation of Popular Audio Declipping Methods

Pavel Záviška, Pavel Rajmic, Alexey Ozerov, Lucas Rencker

Audio Declipping Performance Enhancement via Crossfading

Pavel Záviška, Pavel Rajmic, Ondřej Mokrý

Audio Declipping with (Weighted) Analysis Social Sparsity

Pavel Záviška, Pavel Rajmic

About

This is an website accompanying three articles about Audio Declipping (see the abstracts below). On this website you can find supplementary material such as a detailed description of the audio dataset used, additional plots with results for each individual audio excerpt, a link to the repository with MATLAB source codes, and last but not least, you can listen to the restored audio excerpts.

A Survey and an Extensive Evaluation of Popular Audio Declipping Methods

Abstract: Dynamic range limitations in signal processing often lead to clipping, or saturation, in signals. Audio declipping is the task of estimating the original audio signal given its clipped measurements and has attracted a lot of interest in recent years. Audio declipping algorithms often make assumptions about the underlying signal, such as sparsity or low-rankness, as well as the measurement system. In this paper, we provide an extensive review of audio declipping algorithms proposed in the literature. For each algorithm, we present the assumptions being made about the audio signal, the modeling domain, as well as the optimization algorithm. Furthermore, we provide an extensive numerical evaluation of popular declipping algorithms, on real audio data. We evaluate each algorithm in terms of the Signal-to-Distortion Ratio, as well as using perceptual metrics of sound quality. The article is accompanied with the repository containing the evaluated methods.

Full-text: IEEE Xplore, arXiv postprint.
Citations: Plain Text, BibTeX.

Audio declipping performance enhancement via crossfading

Abstract: Some audio declipping methods produce waveforms that do not fully respect the physical process of clipping, which is why we refer to them as inconsistent. This letter reports what effect on perception it has if the solution by inconsistent methods is forced consistent by postprocessing. We first propose a simple sample replacement method, then we identify its main weaknesses and propose an improved variant. The experiments show that the vast majority of inconsistent declipping methods significantly benefit from the proposed approach in terms of objective perceptual metrics. In particular, we show that the SS PEW method based on social sparsity combined with the proposed method performs comparable to top methods from the consistent class, but at a computational cost of one order of magnitude lower.

Full-text: ScienceDirect, arXiv preprint.
Citations: Plain Text, BibTeX.

Audio Declipping with (Weighted) Analysis Social Sparsity

Abstract: We develop the analysis (cosparse) variant of the popular audio declipping algorithm of Siedenburg et al. (2014). Furthermore, we extend both the old and the new variants by the possibility of weighting the time-frequency coefficients. We examine the audio reconstruction performance of several combinations of weights and shrinkage operators. The weights are shown to improve the reconstruction quality in some cases; however, the best scores achieved by the non-weighted methods are not surpassed with the help of weights. Yet, the analysis Empirical Wiener (EW) shrinkage was able to reach the quality of a computationally more expensive competitor, the Persistent Empirical Wiener (PEW). Moreover, the proposed analysis variant incorporating PEW slightly outperforms the synthesis counterpart in terms of an auditorily motivated metric.

Full-text: IEEE Xplore, arXiv postprint.
Citations: Plain Text, BibTeX.

Reproducible Research

Following the idea of reproducible research, we make all the implementations freely available at the GitHub repository.
Please note that LTFAT toolbox (version>=2.4.0, available here) must be installed and loaded in order to run the scripts and reproduce the results.

Algorithms

The following table contains abbreviations and full names of the algorithms used in the evaluation.
Algorithms inconsistent in the reliable part of the clipped signal are marked with an asterisk *.
Note that the analysis variant of Social Sparsity declipper (ASS) was introduced later in the above-mentioned conference paper and thus it is not part of the Declipping Survey nor the Crossfading article.

Abbreviation Full name
C-OMP* Constrained Orthogonal Matching Pursuit
A-SPADE Analysis SParse Audio DEclipper
S-SPADE Synthesis SParse Audio DEclipper
1 CP 1-minimization using Cambolle–Pock
1 DR 1-minimization using Douglas–Rachford
Rℓ1CC CP Reweighted ℓ1-minimization with Clipping Constraints using Chambolle–Pock (analysis)
Rℓ1CC DR Reweighted ℓ1-minimization with Clipping Constraints using Douglas–Rachford (synthesis)
SS EW* Social Sparsity with Empirical Wiener
SS PEW* Social Sparsity with Persistent Empirical Wiener
CSL1* Compressed Sensing method minimizing ℓ1-norm
PCSL1* Perceptual Compressed Sensing method minimizing ℓ1-norm
PWCSL1* Parabola-Weighted Compressed Sensing method minimizing ℓ1-norm
PWℓ1 CP Parabola-Weighted ℓ1-minimization using Chambolle–Pock (analysis)
PWℓ1 DR Parabola-Weighted ℓ1-minimization using Douglas–Rachford (synthesis)
DL* Dictionary Learning approach
NMF Nonnegative Matrix Factorization
Janssen Janssen method for inpainting
ASS EW* Analysis Social Sparsity with Empirical Wiener
ASS PEW* Analysis Social Sparsity with Persistent Empirical Wiener

Audio Excerpts

The audio database used for the evaluation consists of 10 musical excerpts in mono, sampled at 44.1 kHz, with an approximate length of 7 seconds. They were extracted from the EBU SQAM database. The excerpts were thoroughly selected to cover a wide range of audio signal characteristics. Since a significant number of methods is based on signal sparsity, the selection took care about including different levels of sparsity in the signals (w.r.t. the Gabor transform).

01. violin
02. clarinet
03. bassoon
04. harp
05. glockenspiel
06. celesta
07. accordion
08. guitar
09. piano
10. wind ensemble

The table below contains listenable excerpts from all three articles. It is possible to select the initial level of degradation (input SDR) and the displayed evaluation metric. The postprocessing switch is relevant only for the algorithms inconsistent in the reliable part, which are marked with *. To listen to results related only to the Audio Declipping Survey, leave the option “Inconsistent restoration” switched on. The other two options relate to methods presented in the article Audio declipping performance enhancement via crossfading.

The playback can be started by clicking on one of the table cells (the cells turn light blue when the cursor hovers over them). Your browser must support HTML5 audio player. Alternativelly, the file path is shown below the player and it can be downloaded by Save Link As ...

Select input SDR: 1 dB 3 dB 5 dB 7 dB 10 dB 15 dB 20 dB
Select table values: None ∆SDRc PEAQ ODG PEMO-Q ODG Rnonlin
Select postprocessing of reliable samples: Inconsistent restoration Replace Reliable Crossfaded Replace


Loaded file: None
01 02 03 04 05 06 07 08 09 10
Original X X X X X X X X X X
Clipped X X X X X X X X X X
C-OMP* X X X X X X X X X X
A-SPADE X X X X X X X X X X
S-SPADE X X X X X X X X X X
1 CP X X X X X X X X X X
1 DR X X X X X X X X X X
Rℓ1CC CP X X X X X X X X X X
Rℓ1CC DR X X X X X X X X X X
SS EW* X X X X X X X X X X
SS PEW* X X X X X X X X X X
CSL1* X X X X X X X X X X
PCSL1* X X X X X X X X X X
PWCSL1* X X X X X X X X X X
PWℓ1 CP X X X X X X X X X X
PWℓ1 DR X X X X X X X X X X
DL* X X X X X X X X X X
NMF X X X X X X X X X X
Janssen X X X X X X X X X X
ASS EW* X X X X X X X X X X
ASS PEW* X X X X X X X X X X

Individual results

The following figure presents supplementary results to the Audio Declipping Survey of the declipping algorithms for each audio excerpt individually. The respective audio excerpt and objective metric can be selected using the buttons below. In the plots that follow, algorithms coming from the same family share the same color. If a method was examined in both the analysis and the synthesis variant, the analysis variant is graphically distinguished via squared markers. Other variants (e.g., multiple shrinkage operators in the SS algorithms or different weights within the CSL family) diamond or triangle markers.

Select audio excerpt: 01 02 03 04 05 06 07 08 09 10
Select objective metric: ∆SDRc PEAQ PEMO-Q Rnonlin

Average ∆SDRc results, i.e., SDR improvement computed on the clipped samples only.

Overall results of the survey

In the following figures, the performance of the declipping algorithms is presented. The comparison is done in terms of four objective metrics — ∆SDRc (SDR improvement computed on the clipped samples only), PEAQ, PEMO-Q, and Rnonlin. In the bar graphs that follow, algorithms coming from the same family share the same color. If a method was examined in both the analysis and the synthesis variant, the analysis variant is graphically distinguished via hatching. Other variants (e.g., multiple shrinkage operators in the SS algorithms or different weights within the CSL family) use gray stippling.

Select order of the bar graphs: Algorithm-sorted Value-sorted

Average ∆SDRc results, i.e., SDR improvement computed on the clipped samples only.
Average PEAQ ODG results. The PEAQ ODG of the clipped signal is depicted in gray.
Average PEMO-Q ODG results.
Average Rnonlin results.

Results of the replacing approach

The figures below present average PEAQ ODG and PEMO-Q ODG values of the replacing approach proposed in the article Audio Declipping Performance Enhancement via Crossfading. The individual declipping algorithms are distinguished using different bar colors. Within a single bar, the lightest shade represents the quality of the originally declipped signal, i.e., inconsistent in the reliable part. The respective medium shade marks the results of the Replace Reliable (RR) strategy, and finally, the darkest shade corresponds to Crossfaded Replace (CR). In addition, the black dotted lines represent the average ODG value of the clipped signals, and each black dashed line indicates the best result obtained in the Audio Declipping Survey.

Average PEAQ results.
Average PEMO-Q results.
Average Rnonlin results.