Enhancing Hierarchical Vector Quantized Autoencoders for Image Synthesis Through Multiple Decoders

Serez, Dario; Cristani, Marco; Murino, Vittorio; Alessio Del Bue,; Morerio, Pietro

doi:10.1007/978-3-031-43153-1_33

Vector Quantized Variational Autoencoders (VQ-VAEs) have gained popularity in recent years due to their ability to represent images as discrete sequences of tokens that index a learned codebook of vectors, enabling efficient image compression. One variant of particular interest is VQ-VAE 2, which extends previous works by representing images as a hierarchy of sequences, resulting in finer-grained representations.In this study, we further enhance such hierarchical autoencoder approach by introducing multiple decoders, which allow to represent images as a sum of multi-scale contributions in the pixel space. Our proposed model, the Multi Scale (MS) VQ-VAE, not only enables better control over the encoding of each sequence (resulting in improved explainability and codebook usage) but, as a consequence, also shows advantages in image synthesis. Our experiments demonstrate that the MS-VQVAE achieves comparable or superior reconstructions on various datasets and resolutions, as well as greater stability across runs. Moreover, we include a proof-of-concept trial to showcase the potential applications of our model in image synthesis.

Enhancing Hierarchical Vector Quantized Autoencoders for Image Synthesis Through Multiple Decoders

Dario Serez;Marco Cristani;Vittorio Murino;Alessio Del Bue;Pietro Morerio

2023-01-01

Abstract

Vector Quantized Variational Autoencoders (VQ-VAEs) have gained popularity in recent years due to their ability to represent images as discrete sequences of tokens that index a learned codebook of vectors, enabling efficient image compression. One variant of particular interest is VQ-VAE 2, which extends previous works by representing images as a hierarchy of sequences, resulting in finer-grained representations.In this study, we further enhance such hierarchical autoencoder approach by introducing multiple decoders, which allow to represent images as a sum of multi-scale contributions in the pixel space. Our proposed model, the Multi Scale (MS) VQ-VAE, not only enables better control over the encoding of each sequence (resulting in improved explainability and codebook usage) but, as a consequence, also shows advantages in image synthesis. Our experiments demonstrate that the MS-VQVAE achieves comparable or superior reconstructions on various datasets and resolutions, as well as greater stability across runs. Moreover, we include a proof-of-concept trial to showcase the potential applications of our model in image synthesis.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2023

Appare nelle tipologie:

02.01 - Contributo in volume (Capitolo o saggio)

File in questo prodotto:

File	Dimensione	Formato
Enhancing Hierarchical Vector Quantized Autoencoders for Image Synthesis Through Multiple Decoders.pdf accesso aperto Tipologia: Documento in versione editoriale Dimensione 1.32 MB Formato Adobe PDF Visualizza/Apri	1.32 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1204155

Citazioni

ND

0

0

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Enhancing Hierarchical Vector Quantized Autoencoders for Image Synthesis Through Multiple Decoders

Dario Serez;Marco Cristani;Vittorio Murino;Alessio Del Bue;Pietro Morerio

2023-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)