SEMIDISCRETE DECOMPOSITION: A BUMP HUNTING TECHNIQUE

Sabine McConnell and David B. Skillicorn
Queens University, Canada

Abstract

Semidiscrete decomposition (SDD) is usually presented as a storage-efficient analogue of singular value decomposition. We show, however, that SDD actually works in a completely different way, and is best thought of as a bump-hunting technique; it is extremely effective at finding outlier clusters in datasets. We suggest that SDD's success in text retrieval applications such as latent semantic indexing is fortuitous, and occurs because such datasets typically contain a large number of small clusters.