Skip to the content.

This is the accompanying page for the article Janssen 2.0: Audio Inpainting in the Time-frequency Domain authored by Ondřej Mokrý, Peter Balušík and Pavel Rajmic, submitted to ICASSP 2025.

The paper focuses on inpainting missing parts of an audio signal spectrogram. First, a recent successful approach based on an untrained neural network is revised and its several modifications are proposed, improving the signal-to-noise ratio of the restored audio. Second, the Janssen algorithm, the autoregression-based state-of-the-art for time-domain audio inpainting, is adapted for the time-frequency setting. This novel method, coined Janssen-TF, is compared to the neural network approach using both objective metrics and a subjective listening test, proving Janssen-TF to be superior in all the considered measures.

The preprint is available at arXiv.

Audio examples from the listening test

You can listen to the audio excerpts used in the listening test. The denotation of the five examples is the same as in the article Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks.

Example0 (piano)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example1 (piano)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example3 (voice)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example4 (music)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example5 (music)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example7 (voice)

Gap size 2 columns 4 columns 6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Supplementary plots

The following boxplot is identical to Fig. 3 in our paper, up to the order of the evaluated stimuli.

The plots below are not presented in the paper due to the lack of space. The next box presents the same listing test scores as before, but split according to the size of gaps. For short gaps, Janssen-TF is the clear winner, but for the longest gaps, its results are comparable to the DPAI approach.

For speech, the superiority of Janssen-TF is a bit more pronounced than for music, as seen in the next figure.

The next four plots illustrate the effect of averaging results of several epochs of training. For DPAI without context, the SNR and ODG can be improved slightly more than in the case of DPAI with context, but overall the improvement is negligible in most cases.