Approximate Subtree Identification in Heterogeneous XML Documents Collections