Objective: Narrow band imaging (NBI) and white light (WL) are endoscopic techniques to visualize upper aero digestive tract (UADT) cancers. However, these imaging techniques are less effective for diagnosing tumors in less competent centers since they depend on skilled medical experts. Recently, there has been evidence that deep learning (DL) has potential applications in UADT video endoscopy. This research aims to develop a DL for the automatic identification and delineation of UADT cancer. Approach: In both WL and NBI frames, the YOLO DL model (YOLOv5s with YOLOv5m) ensemble, was used to diagnose laryngeal squamous cell carcinoma (LSCC). Six external LSCC laryngoscopy videos were tested in real-time for cancer detection. The SegMENT is a segmentation convolution neural networks (CNN), model proposed based on a modified DeepLabV3+ model for precise UADT delineation using an in-domain transfer learning ensemble technique. Its accuracy was further validated on external datasets with NBI images of oral cavity SCC (OSCC) and oropharyngeal SCC (OPSCC). The SegMENT-Plus is the improved version of SegMENT model designed for large LSCC datasets. SegMENT-Plus used EfficientNetB5 backbone as an encoder with a modified atrous spatial pyramid pooling (m-ASPP) block. The attentions blocks (SE and CBAM) were integrated into m-ASPP module to improve cancer segmentation. The m-ASPP was used to extract local and global LSCC features to overcome the limitation of conventional ASPP modules in literature. SegMENT-Plus was evaluated using a multi-center dataset from three hospitals (Genoa, Brescia, Seoul South Korea). The model was tested on LSCC frames, the delineation performance was compared with three otolaryngology experts. The unseen intraoperative laryngoscopy videos also validated for real-time performance. The SegMENT-Plus was compared with its predecessor SegMENT and other DL models (UNET, ResUNET, DeepLabv3+, DoubleUET,). Main results: In the LSCC detection task, 219 patients from Genoa, Italy were enrolled, and were provided 624 LSCC video frames. YOLO models were trained using an 82.6% training set, an 8.2% validation set, and a 9.2% testing set. The ensemble algorithm (YOLOv5s with YOLOv5m —Test Time Augmentation) achieved top LSCC detection with 66% Precision, 62% Recall, and 63% mean Average Precision at 0.5 intersection over union (IoU). The average computation time per frame on laryngoscopy videos was 0.026 seconds. The SegMENT model for the UADT cancer delineation was developed using 219 patients (624 larynx frames), and external validation from Brescia, Italy for the OPSCC and OCSCC cohorts involved 116 and 102 NBI images, respectively. The SegMENT model achieved 0.68% IoU and 0.81% dice coefficient (DSC), outperforming other DL models. The DSC values in the OCSCC and OPSCC datasets improved significantly, with median DSC values of 10.3% and 11.9%, respectively. This study includes 557 patients with 3933 laryngeal images from Genoa, Italy to the development of SegMENT-Plus to improve LDCC delineation. The optimal performance and generalization of the algorithm were confirmed by external testing cohorts from Seoul, South Korea, and Brescia, Italy. The external cohorts showed DSC between 81.4% and 84.9% and IoU between 81.8% and 85.7%. Significance: The study identified a suitable CNN model for LSCC detection in WL and NBI video laryngoscopes. SegMENT outperformed previous results in external validation cohorts, showing promise for precise tumor segmentation. SegMENT-Plus holds the potential for improved early tumor detection and delineation, laying the foundation for a clinical system in LSCC margin delineation.

Upper Aero Digestive Tract Cancer Diagnosis using Deep Learning Methods

AZAM, MUHAMMAD ADEEL
2024-02-19

Abstract

Objective: Narrow band imaging (NBI) and white light (WL) are endoscopic techniques to visualize upper aero digestive tract (UADT) cancers. However, these imaging techniques are less effective for diagnosing tumors in less competent centers since they depend on skilled medical experts. Recently, there has been evidence that deep learning (DL) has potential applications in UADT video endoscopy. This research aims to develop a DL for the automatic identification and delineation of UADT cancer. Approach: In both WL and NBI frames, the YOLO DL model (YOLOv5s with YOLOv5m) ensemble, was used to diagnose laryngeal squamous cell carcinoma (LSCC). Six external LSCC laryngoscopy videos were tested in real-time for cancer detection. The SegMENT is a segmentation convolution neural networks (CNN), model proposed based on a modified DeepLabV3+ model for precise UADT delineation using an in-domain transfer learning ensemble technique. Its accuracy was further validated on external datasets with NBI images of oral cavity SCC (OSCC) and oropharyngeal SCC (OPSCC). The SegMENT-Plus is the improved version of SegMENT model designed for large LSCC datasets. SegMENT-Plus used EfficientNetB5 backbone as an encoder with a modified atrous spatial pyramid pooling (m-ASPP) block. The attentions blocks (SE and CBAM) were integrated into m-ASPP module to improve cancer segmentation. The m-ASPP was used to extract local and global LSCC features to overcome the limitation of conventional ASPP modules in literature. SegMENT-Plus was evaluated using a multi-center dataset from three hospitals (Genoa, Brescia, Seoul South Korea). The model was tested on LSCC frames, the delineation performance was compared with three otolaryngology experts. The unseen intraoperative laryngoscopy videos also validated for real-time performance. The SegMENT-Plus was compared with its predecessor SegMENT and other DL models (UNET, ResUNET, DeepLabv3+, DoubleUET,). Main results: In the LSCC detection task, 219 patients from Genoa, Italy were enrolled, and were provided 624 LSCC video frames. YOLO models were trained using an 82.6% training set, an 8.2% validation set, and a 9.2% testing set. The ensemble algorithm (YOLOv5s with YOLOv5m —Test Time Augmentation) achieved top LSCC detection with 66% Precision, 62% Recall, and 63% mean Average Precision at 0.5 intersection over union (IoU). The average computation time per frame on laryngoscopy videos was 0.026 seconds. The SegMENT model for the UADT cancer delineation was developed using 219 patients (624 larynx frames), and external validation from Brescia, Italy for the OPSCC and OCSCC cohorts involved 116 and 102 NBI images, respectively. The SegMENT model achieved 0.68% IoU and 0.81% dice coefficient (DSC), outperforming other DL models. The DSC values in the OCSCC and OPSCC datasets improved significantly, with median DSC values of 10.3% and 11.9%, respectively. This study includes 557 patients with 3933 laryngeal images from Genoa, Italy to the development of SegMENT-Plus to improve LDCC delineation. The optimal performance and generalization of the algorithm were confirmed by external testing cohorts from Seoul, South Korea, and Brescia, Italy. The external cohorts showed DSC between 81.4% and 84.9% and IoU between 81.8% and 85.7%. Significance: The study identified a suitable CNN model for LSCC detection in WL and NBI video laryngoscopes. SegMENT outperformed previous results in external validation cohorts, showing promise for precise tumor segmentation. SegMENT-Plus holds the potential for improved early tumor detection and delineation, laying the foundation for a clinical system in LSCC margin delineation.
19-feb-2024
Larynx cancer, deep learning, narrow band imaging, tumor segmentation, HNSCC
File in questo prodotto:
File Dimensione Formato  
phdunige_4953500.pdf

accesso aperto

Descrizione: Upper Aero Digestive Tract Cancer Diagnosis using Deep Learning Methods
Tipologia: Tesi di dottorato
Dimensione 5.99 MB
Formato Adobe PDF
5.99 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1160223
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact