This is the accompanying page for the article Janssen 2.0: Audio Inpainting in the Time-frequency Domain authored by Ondřej Mokrý, Peter Balušík and Pavel Rajmic, submitted to ICASSP 2025.
The paper focuses on inpainting missing parts of an audio signal spectrogram. First, a recent successful approach based on an untrained neural network is revised and its several modifications are proposed, improving the signal-to-noise ratio of the restored audio. Second, the Janssen algorithm, the autoregression-based state-of-the-art for time-domain audio inpainting, is adapted for the time-frequency setting. This novel method, coined Janssen-TF, is compared to the neural network approach using both objective metrics and a subjective listening test, proving Janssen-TF to be superior in all the considered measures.
The preprint is available at arXiv.
Audio examples from the listening test
You can listen to the audio excerpts used in the listening test. The denotation of the five examples is the same as in the article Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks.Example0 (piano)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Example1 (piano)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Example3 (voice)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Example4 (music)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Example5 (music)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Example7 (voice)
Gap size | 2 columns | 4 columns | 6 columns |
---|---|---|---|
Original audio | |||
Corrupted audio | |||
DPAI with context | |||
DPAI averaged (4 outputs) | |||
DPAI without context | |||
Janssen gapwise | |||
JanssenTF ADMM |
Supplementary plots
The following boxplot is identical to Fig. 3 in our paper, up to the order of the evaluated stimuli.
The plots below are not presented in the paper due to the lack of space. The next box presents the same listing test scores as before, but split according to the size of gaps. For short gaps, Janssen-TF is the clear winner, but for the longest gaps, its results are comparable to the DPAI approach.
For speech, the superiority of Janssen-TF is a bit more pronounced than for music, as seen in the next figure.
The next four plots illustrate the effect of averaging results of several epochs of training. For DPAI without context, the SNR and ODG can be improved slightly more than in the case of DPAI with context, but overall the improvement is negligible in most cases.