This is the accompanying page for the preprint article Audio Inpainting in Time-Frequency Domain with Phase-Aware Prior authored by Peter Balušík and Pavel Rajmic.
The so-called audio inpainting problem in the time domain refers to estimating missing segments of samples within a signal. Over the years, several methods have been developed for such type of audio inpainting. In contrast to this case, a time-frequency variant of inpainting appeared in the literature, where the challenge is to reconstruct missing spectrogram columns with reliable information. We propose a method to address this time-frequency audio inpainting problem. Our approach is based on the recently introduced phase-aware signal prior that exploits an estimate of the instantaneous frequency. An optimization problem is formulated and solved using the generalized Chambolle–Pock algorithm. The proposed method is evaluated both objectively and subjectively against other time-frequency inpainting methods, specifically a deep-prior neural network and the autoregression-based approach known as Janssen-TF. Our proposed approach surpassed these methods in the objective evaluation as well as in the conducted listening test. Moreover, this outcome is achieved with a substantially reduced computational requirement compared to alternative methods.
The preprint is available at arXiv.
Audio examples from the listening test
You can listen to the audio excerpts used in the listening test. The denotation of the six examples is the same as in the article Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks.Example0 (piano)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Example1 (piano)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Example3 (voice)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Example4 (music)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Example5 (music)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Example7 (voice)
| Gap size | 2 columns | 4 columns | 6 columns |
|---|---|---|---|
| Original audio | |||
| Corrupted audio | |||
| DPAI with context | |||
| JanssenTF ADMM | |||
| U-PHAIN-TF |
Supplementary plots
The plot below is not presented in the paper due to the lack of space.
The box plot presents the same test scores as in the paper, but split according to the size of gaps. For shorter gaps the results of Janssen-TF and U-PHAIN-TF are comparable and near-perfect, for larger gaps U-PHAIN-TF is the clear winner.
Masks
Only three masks were shown in the paper. Here, each mask is shown.