Using unsupervised learning to determine risk level for left ventricular diastolic dysfunction2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Kaidi, Ma; Canepa, Marco; Strait, James B.; Shatkay, Hagit

doi:10.1109/BIBM.2014.6999182

Left Ventricular Diastolic Dysfunction (LVDD) is a decompensatory change in the relaxation properties of the heart, the risk for which increases with age. Currently, physicians use a decision-tree-like algorithm to distinguish between discrete LVDD levels. This approach, based on cut-off thresholds, can potentially lead to information loss and possibly to misdiagnosis. This paper aims to explore an alternative diagnostic method to determine LVDD risk level, taking into account a wide variety of attributes available in patient records, without pre-setting cut-off thresholds. Using a large dataset derived from the Baltimore Longitude Study of Aging (BLSA), and adjusting the data for age and gender, we employ the Chi Square test and the information gain criterion to identify attributes that correlate well with the physician-assigned grades; such attributes are referred to as distinguishing attributes. We then apply the expectation maximization (EM) algorithm, as well as the K-Means, in order to cluster records that are represented using distinguishing attributes. While clusters resulting from the K-Means are not stable, three stable and tightly-formed clusters, which are obtained from the EM algorithm, roughly correspond to the physician-assigned categories. Based on the results from the EM algorithm, we can compute a patient's probability to have low, high or no risk for LVDD, and use this probability as a basis for defining a risk score to determine the patient's LVDD severity.