Saturday, August 18, 2007
Paper Abstract - as at 19 August 2007
The entrenchment of the Internet into modern society has led to a proliferation of unstructured information. The amount of unstructured information has also been compounded by the broad ubiquitous adoption of information systems. Hence there is an endemic and growing need to extract the knowledge hidden within collections of documents, thus the emergence of text mining. Unsupervised learning schemes allow for the abstraction of patterns in data, and are typically facilitated by clustering algorithms. Recent progress in computational processing has provided greater opportunity to develop and employ clustering algorithms in text mining, leading to a myriad of contributions. Modern search engines continue to move towards automated web page retrieval that is more efficient, and provides the searcher with understandable and relevant search results. This paper focuses on clustering algorithms for web applications. A critical review of current algorithms, their evolution and an evaluation is presented. This analysis forms a framework through which to consider the nature of this progress and some of the limitations of the current clustering technologies. A number of text mining experiments are replicated and extended, with a comprehensive discussion to further illustrate these issues.