Statistical procedures for missing data imputation techniques have vastly improved, yet selection and suitability of optimal imputation technique for particular applicationdatasetscontext still confusing. This works frames the missing-data problem in building energy measurement systems, review different imputation methods and suggest the optimal imputation technique for missing values for energy metering data set. The main objective of this paper is to show performance of different imputation techniques with respect to accuracy and computation time in energy meter data. Missing values in the energy metering data set are imputed by seven imputation methods such as last value carried forward (LVCF), Mean, Median, Mode, multiple imputation by chain equation (MICE); K-nearest neighbors (K-NN) and long short term memory (LSTM). The performance of each imputation method is compared with respect to accuracy and execution time under a missing completely at random assumption. Based on the two evaluation criteria the LVCF imputation is very fast with high accuracy among single point imputation. The LSTM deserves the best among the seven imputation methods for energy metering data set, but the tradeoff is computation time compared to LVCF.

Performance Comparison of Imputation Methods in Building Energy Data Sets

Dhungana H.;Bellotti F.;Berta R.;De Gloria A.
2021-01-01

Abstract

Statistical procedures for missing data imputation techniques have vastly improved, yet selection and suitability of optimal imputation technique for particular applicationdatasetscontext still confusing. This works frames the missing-data problem in building energy measurement systems, review different imputation methods and suggest the optimal imputation technique for missing values for energy metering data set. The main objective of this paper is to show performance of different imputation techniques with respect to accuracy and computation time in energy meter data. Missing values in the energy metering data set are imputed by seven imputation methods such as last value carried forward (LVCF), Mean, Median, Mode, multiple imputation by chain equation (MICE); K-nearest neighbors (K-NN) and long short term memory (LSTM). The performance of each imputation method is compared with respect to accuracy and execution time under a missing completely at random assumption. Based on the two evaluation criteria the LVCF imputation is very fast with high accuracy among single point imputation. The LSTM deserves the best among the seven imputation methods for energy metering data set, but the tradeoff is computation time compared to LVCF.
2021
978-3-030-66728-3
978-3-030-66729-0
File in questo prodotto:
File Dimensione Formato  
20.pdf

accesso chiuso

Descrizione: Contributo in atti di convegno
Tipologia: Documento in Pre-print
Dimensione 338.87 kB
Formato Adobe PDF
338.87 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1055346
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact