Convolutional neural networks (CNNs) represent the new reference approach for semantic segmentation of very-high-resolution (VHR) images, due to their ability to automatically capture semantic information while learning relevant features. However, as for most supervised methods, the map accuracy depends on the quantity and quality of ground truth (GT) used to train them. The use of densely annotated data (i.e., a detailed, exhaustive, pixel-level GT) allows to obtain effective CNN models but normally implies high efforts in annotation. Such ground truth is often available in benchmark datasets on which new methods are tested, but not on real data for land-cover applications, where only sparse annotations might be sufficiently cost effective. A CNN model trained with such incomplete GT maps has the tendency to smooth object boundaries because they are never precisely delineated in the GT. To cope with those shortcomings, we propose to exploit the intermediate activation maps of the CNN and to deploy a semisupervised fully connected conditional random field (CRF). In comparison with competitors using the same sparse annotations, the proposed method is able to better fill part of the performance gap compared to a CNN trained on the densely annotated, but generally unavailable, GTs.

A Semisupervised CRF Model for CNN-Based Semantic Segmentation with Sparse Ground Truth

Maggiolo L.;Moser G.;Serpico S. B.;Tuia D.
2022-01-01

Abstract

Convolutional neural networks (CNNs) represent the new reference approach for semantic segmentation of very-high-resolution (VHR) images, due to their ability to automatically capture semantic information while learning relevant features. However, as for most supervised methods, the map accuracy depends on the quantity and quality of ground truth (GT) used to train them. The use of densely annotated data (i.e., a detailed, exhaustive, pixel-level GT) allows to obtain effective CNN models but normally implies high efforts in annotation. Such ground truth is often available in benchmark datasets on which new methods are tested, but not on real data for land-cover applications, where only sparse annotations might be sufficiently cost effective. A CNN model trained with such incomplete GT maps has the tendency to smooth object boundaries because they are never precisely delineated in the GT. To cope with those shortcomings, we propose to exploit the intermediate activation maps of the CNN and to deploy a semisupervised fully connected conditional random field (CRF). In comparison with competitors using the same sparse annotations, the proposed method is able to better fill part of the performance gap compared to a CNN trained on the densely annotated, but generally unavailable, GTs.
File in questo prodotto:
File Dimensione Formato  
22.tgrs.luca.pdf

accesso chiuso

Descrizione: Articolo su rivista
Tipologia: Documento in Post-print
Dimensione 7.27 MB
Formato Adobe PDF
7.27 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1093199
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 7
social impact