This is an website accompanying three articles about Audio Declipping (see the abstracts below). On this website you can find supplementary material such as a detailed description of the audio dataset used, additional plots with results for each individual audio excerpt, a link to the repository with MATLAB source codes, and last but not least, you can listen to the restored audio excerpts.
Abstract: Dynamic range limitations in signal processing often lead to clipping, or saturation, in signals. Audio declipping is the task of estimating the original audio signal given its clipped measurements and has attracted a lot of interest in recent years. Audio declipping algorithms often make assumptions about the underlying signal, such as sparsity or low-rankness, as well as the measurement system. In this paper, we provide an extensive review of audio declipping algorithms proposed in the literature. For each algorithm, we present the assumptions being made about the audio signal, the modeling domain, as well as the optimization algorithm. Furthermore, we provide an extensive numerical evaluation of popular declipping algorithms, on real audio data. We evaluate each algorithm in terms of the Signal-to-Distortion Ratio, as well as using perceptual metrics of sound quality. The article is accompanied with the repository containing the evaluated methods.
Full-text:
IEEE Xplore,
arXiv postprint.
Citations:
Plain Text,
BibTeX.
Abstract: Some audio declipping methods produce waveforms that do not fully respect the physical process of clipping, which is why we refer to them as inconsistent. This letter reports what effect on perception it has if the solution by inconsistent methods is forced consistent by postprocessing. We first propose a simple sample replacement method, then we identify its main weaknesses and propose an improved variant. The experiments show that the vast majority of inconsistent declipping methods significantly benefit from the proposed approach in terms of objective perceptual metrics. In particular, we show that the SS PEW method based on social sparsity combined with the proposed method performs comparable to top methods from the consistent class, but at a computational cost of one order of magnitude lower.
Full-text:
ScienceDirect,
arXiv preprint.
Citations:
Plain Text,
BibTeX.
Abstract: We develop the analysis (cosparse) variant of the popular audio declipping algorithm of Siedenburg et al. (2014). Furthermore, we extend both the old and the new variants by the possibility of weighting the time-frequency coefficients. We examine the audio reconstruction performance of several combinations of weights and shrinkage operators. The weights are shown to improve the reconstruction quality in some cases; however, the best scores achieved by the non-weighted methods are not surpassed with the help of weights. Yet, the analysis Empirical Wiener (EW) shrinkage was able to reach the quality of a computationally more expensive competitor, the Persistent Empirical Wiener (PEW). Moreover, the proposed analysis variant incorporating PEW slightly outperforms the synthesis counterpart in terms of an auditorily motivated metric.
Full-text:
IEEE Xplore,
arXiv postprint.
Citations:
Plain Text,
BibTeX.
Following the idea of reproducible research, we make all the implementations freely available at the GitHub repository.
Please note that LTFAT toolbox (version>=2.4.0, available here) must be installed and loaded
in order to run the scripts and reproduce the results.
The following table contains abbreviations and full names of the algorithms used in the evaluation.
Algorithms inconsistent in the reliable part of the clipped signal are marked with an asterisk *.
Note that the analysis variant of Social Sparsity declipper (ASS) was introduced later in the above-mentioned conference paper and thus it is not part of the Declipping Survey nor the Crossfading article.
Abbreviation | Full name |
---|---|
C-OMP* | Constrained Orthogonal Matching Pursuit |
A-SPADE | Analysis SParse Audio DEclipper |
S-SPADE | Synthesis SParse Audio DEclipper |
ℓ1 CP | ℓ1-minimization using Cambolle–Pock |
ℓ1 DR | ℓ1-minimization using Douglas–Rachford |
Rℓ1CC CP | Reweighted ℓ1-minimization with Clipping Constraints using Chambolle–Pock (analysis) |
Rℓ1CC DR | Reweighted ℓ1-minimization with Clipping Constraints using Douglas–Rachford (synthesis) |
SS EW* | Social Sparsity with Empirical Wiener |
SS PEW* | Social Sparsity with Persistent Empirical Wiener |
CSL1* | Compressed Sensing method minimizing ℓ1-norm |
PCSL1* | Perceptual Compressed Sensing method minimizing ℓ1-norm |
PWCSL1* | Parabola-Weighted Compressed Sensing method minimizing ℓ1-norm |
PWℓ1 CP | Parabola-Weighted ℓ1-minimization using Chambolle–Pock (analysis) |
PWℓ1 DR | Parabola-Weighted ℓ1-minimization using Douglas–Rachford (synthesis) |
DL* | Dictionary Learning approach |
NMF | Nonnegative Matrix Factorization |
Janssen | Janssen method for inpainting |
ASS EW* | Analysis Social Sparsity with Empirical Wiener |
ASS PEW* | Analysis Social Sparsity with Persistent Empirical Wiener |
The audio database used for the evaluation consists of 10 musical excerpts in mono, sampled at 44.1 kHz, with an approximate length of 7 seconds. They were extracted from the EBU SQAM database. The excerpts were thoroughly selected to cover a wide range of audio signal characteristics. Since a significant number of methods is based on signal sparsity, the selection took care about including different levels of sparsity in the signals (w.r.t. the Gabor transform).
The table below contains listenable excerpts from all three articles. It is possible to select the initial level of degradation (input SDR) and the displayed evaluation metric. The postprocessing switch is relevant only for the algorithms inconsistent in the reliable part, which are marked with *. To listen to results related only to the Audio Declipping Survey, leave the option “Inconsistent restoration” switched on. The other two options relate to methods presented in the article Audio declipping performance enhancement via crossfading.
The playback can be started by clicking on one of the table cells (the cells turn light blue when the cursor hovers over them). Your browser must support HTML5 audio player. Alternativelly, the file path is shown below the player and it can be downloaded by Save Link As ...
Select input SDR:
1 dB
3 dB
5 dB
7 dB
10 dB
15 dB
20 dB
Select table values:
None
∆SDRc
PEAQ ODG
PEMO-Q ODG
Rnonlin
Select postprocessing of reliable samples:
Inconsistent restoration
Replace Reliable
Crossfaded Replace
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | |
Original | X | X | X | X | X | X | X | X | X | X |
Clipped | X | X | X | X | X | X | X | X | X | X |
C-OMP* | X | X | X | X | X | X | X | X | X | X |
A-SPADE | X | X | X | X | X | X | X | X | X | X |
S-SPADE | X | X | X | X | X | X | X | X | X | X |
ℓ1 CP | X | X | X | X | X | X | X | X | X | X |
ℓ1 DR | X | X | X | X | X | X | X | X | X | X |
Rℓ1CC CP | X | X | X | X | X | X | X | X | X | X |
Rℓ1CC DR | X | X | X | X | X | X | X | X | X | X |
SS EW* | X | X | X | X | X | X | X | X | X | X |
SS PEW* | X | X | X | X | X | X | X | X | X | X |
CSL1* | X | X | X | X | X | X | X | X | X | X |
PCSL1* | X | X | X | X | X | X | X | X | X | X |
PWCSL1* | X | X | X | X | X | X | X | X | X | X |
PWℓ1 CP | X | X | X | X | X | X | X | X | X | X |
PWℓ1 DR | X | X | X | X | X | X | X | X | X | X |
DL* | X | X | X | X | X | X | X | X | X | X |
NMF | X | X | X | X | X | X | X | X | X | X |
Janssen | X | X | X | X | X | X | X | X | X | X |
ASS EW* | X | X | X | X | X | X | X | X | X | X |
ASS PEW* | X | X | X | X | X | X | X | X | X | X |
The following figure presents supplementary results to the Audio Declipping Survey of the declipping algorithms for each audio excerpt individually. The respective audio excerpt and objective metric can be selected using the buttons below. In the plots that follow, algorithms coming from the same family share the same color. If a method was examined in both the analysis and the synthesis variant, the analysis variant is graphically distinguished via squared markers. Other variants (e.g., multiple shrinkage operators in the SS algorithms or different weights within the CSL family) diamond or triangle markers.
In the following figures, the performance of the declipping algorithms is presented. The comparison is done in terms of four objective metrics — ∆SDRc (SDR improvement computed on the clipped samples only), PEAQ, PEMO-Q, and Rnonlin. In the bar graphs that follow, algorithms coming from the same family share the same color. If a method was examined in both the analysis and the synthesis variant, the analysis variant is graphically distinguished via hatching. Other variants (e.g., multiple shrinkage operators in the SS algorithms or different weights within the CSL family) use gray stippling.
Select order of the bar graphs: Algorithm-sorted Value-sorted
The figures below present average PEAQ ODG and PEMO-Q ODG values of the replacing approach proposed in the article Audio Declipping Performance Enhancement via Crossfading. The individual declipping algorithms are distinguished using different bar colors. Within a single bar, the lightest shade represents the quality of the originally declipped signal, i.e., inconsistent in the reliable part. The respective medium shade marks the results of the Replace Reliable (RR) strategy, and finally, the darkest shade corresponds to Crossfaded Replace (CR). In addition, the black dotted lines represent the average ODG value of the clipped signals, and each black dashed line indicates the best result obtained in the Audio Declipping Survey.