This is the accompanying page for the article Janssen 2.0: Audio Inpainting in the Time-frequency Domain authored by Ondřej Mokrý, Peter Balušík and Pavel Rajmic, presented at EUSIPCO 2025.

The paper focuses on inpainting missing parts of an audio signal spectrogram, i.e., estimating the lacking time-frequency coefficients. The autoregression-based Janssen algorithm, a state-of-the-art for the time-domain audio inpainting, is adapted for the time-frequency setting. This novel method, termed Janssen-TF, is compared with the deep-prior neural network approach using both objective metrics and a subjective listening test, proving Janssen-TF to be superior in all the considered measures.

The preprint is available at arXiv, the official version is published at IEEE Xplore.

Audio examples from the listening test

You can listen to the audio excerpts used in the listening test. The denotation of the six examples is the same as in the article Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks.

* Excerpts marked with an asterisk correspond to a modification of DPAI, which has not been included in the paper, due to negligible reconstruction improvements. The resulting signal is created by averaging four DPAI reconstructions, each started with a different noise initialization.

Example0 (piano)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example1 (piano)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example3 (voice)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example4 (music)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example5 (music)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Example7 (voice)

Gap size	2 columns	4 columns	6 columns
Original audio
Corrupted audio
DPAI with context
DPAI averaged* (4 outputs)
DPAI without context
Janssen gapwise
JanssenTF ADMM

Supplementary plots

The following boxplot is identical to Fig. 3 in our paper, up to the order of the evaluated stimuli.

The plots below are not presented in the paper due to the lack of space. The next box presents the same listening test scores as before, but split according to the size of gaps. For short gaps, Janssen-TF is the clear winner, but for the longest gaps, its results are comparable to the DPAI approach.

For speech, the superiority of Janssen-TF is a bit more pronounced than for music, as seen in the next figure.

The next four plots illustrate the effect of averaging results of several epochs of training, which is another modification sometimes leading to tiny improvements, but is not presented in the paper. For DPAI without context, the SNR and ODG can be improved slightly more than in the case of DPAI with context, but overall the improvement is negligible in most cases.