K-means clustering for content-based document management in intelligence

Gastaldo, Paolo; Decherchi, S; Zunino, Rodolfo

Text-mining methods have become a key feature for homeland-security technologies, as they can help explore effectively increasing masses of digital documents in the search for relevant information. This chapter presents a model for document clustering that arranges unstructured documents into content-based homogeneous groups. The overall paradigm is hybrid because it combines pattern-recognition grouping algorithms with semantic-driven processing. First, a semantic-based metric measures distances between documents, by combining a content-based with a behavioral analysis; the metric considers both lexical properties and the structure and styles that characterize the processed documents. Secondly, the model relies on a Radial Basis Function (RBF) kernel-based mapping for clustering. As a result, the major novelty aspect of the proposed approach is to exploit the implicit mapping of RBF kernel functions to tackle the crucial task of normalizing similarities while embedding semantic information in the whole mechanism. In addition, the present work exploits a real-world benchmark to compare the performance of the conventional k-means algorithm and recent k-means clustering schemes, which apply Johnson-Lindenstrauss-type random projections for a reduction in dimensionality before clustering. Experimental results show that the document clustering framework based on kernel k-means provide an effective tool to generate consistent structures for information access and retrieval.

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

K-means clustering for content-based document management in intelligence

GASTALDO, PAOLO;DECHERCHI S;ZUNINO, RODOLFO

2010-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

K-means clustering for content-based document management in intelligence

GASTALDO, PAOLO;DECHERCHI S;ZUNINO, RODOLFO

2010-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)