Binarization is an extreme quantization technique that is attracting research in the Internet of Things (IoT) field, as it radically reduces the memory footprint of deep neural networks without a correspondingly significant accuracy drop. To support the effective deployment of Binarized Neural Networks (BNNs), we propose CBin-NN, a library of layer operators that allows the building of simple yet flexible convolutional neural networks (CNNs) with binary weights and activations. CBin-NN is platform-independent and is thus portable to virtually any software-programmable device. Experimental analysis on the CIFAR-10 dataset shows that our library, compared to a set of state-of-the-art inference engines, speeds up inference by 3.6 times and reduces the memory required to store model weights and activations by 7.5 times and 28 times, respectively, at the cost of slightly lower accuracy (2.5%). An ablation study stresses the importance of a Quantized Input Quantized Kernel Convolution layer to improve accuracy and reduce latency at the cost of a slight increase in model size.
CBin-NN: An Inference Engine for Binarized Neural Networks
Sakr F.;Berta R.;Capello A.;Dabbous A.;Lazzaroni L.;Bellotti F.
2024-01-01
Abstract
Binarization is an extreme quantization technique that is attracting research in the Internet of Things (IoT) field, as it radically reduces the memory footprint of deep neural networks without a correspondingly significant accuracy drop. To support the effective deployment of Binarized Neural Networks (BNNs), we propose CBin-NN, a library of layer operators that allows the building of simple yet flexible convolutional neural networks (CNNs) with binary weights and activations. CBin-NN is platform-independent and is thus portable to virtually any software-programmable device. Experimental analysis on the CIFAR-10 dataset shows that our library, compared to a set of state-of-the-art inference engines, speeds up inference by 3.6 times and reduces the memory required to store model weights and activations by 7.5 times and 28 times, respectively, at the cost of slightly lower accuracy (2.5%). An ablation study stresses the importance of a Quantized Input Quantized Kernel Convolution layer to improve accuracy and reduce latency at the cost of a slight increase in model size.File | Dimensione | Formato | |
---|---|---|---|
electronics-13-01624 (1).pdf
accesso aperto
Tipologia:
Documento in versione editoriale
Dimensione
2.07 MB
Formato
Adobe PDF
|
2.07 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.