Statistical procedures for missing data imputation techniques have vastly improved, yet selection and suitability of optimal imputation technique for particular applicationdatasetscontext still confusing. This works frames the missing-data problem in building energy measurement systems, review different imputation methods and suggest the optimal imputation technique for missing values for energy metering data set. The main objective of this paper is to show performance of different imputation techniques with respect to accuracy and computation time in energy meter data. Missing values in the energy metering data set are imputed by seven imputation methods such as last value carried forward (LVCF), Mean, Median, Mode, multiple imputation by chain equation (MICE); K-nearest neighbors (K-NN) and long short term memory (LSTM). The performance of each imputation method is compared with respect to accuracy and execution time under a missing completely at random assumption. Based on the two evaluation criteria the LVCF imputation is very fast with high accuracy among single point imputation. The LSTM deserves the best among the seven imputation methods for energy metering data set, but the tradeoff is computation time compared to LVCF.
Performance Comparison of Imputation Methods in Building Energy Data Sets
Dhungana H.;Bellotti F.;Berta R.;De Gloria A.
2021-01-01
Abstract
Statistical procedures for missing data imputation techniques have vastly improved, yet selection and suitability of optimal imputation technique for particular applicationdatasetscontext still confusing. This works frames the missing-data problem in building energy measurement systems, review different imputation methods and suggest the optimal imputation technique for missing values for energy metering data set. The main objective of this paper is to show performance of different imputation techniques with respect to accuracy and computation time in energy meter data. Missing values in the energy metering data set are imputed by seven imputation methods such as last value carried forward (LVCF), Mean, Median, Mode, multiple imputation by chain equation (MICE); K-nearest neighbors (K-NN) and long short term memory (LSTM). The performance of each imputation method is compared with respect to accuracy and execution time under a missing completely at random assumption. Based on the two evaluation criteria the LVCF imputation is very fast with high accuracy among single point imputation. The LSTM deserves the best among the seven imputation methods for energy metering data set, but the tradeoff is computation time compared to LVCF.File | Dimensione | Formato | |
---|---|---|---|
20.pdf
accesso chiuso
Descrizione: Contributo in atti di convegno
Tipologia:
Documento in Pre-print
Dimensione
338.87 kB
Formato
Adobe PDF
|
338.87 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.