Tuesday, July 31, 2007

Clustering Algorithms General Background

Clustering is the process of abstracting patterns from data sets. Moreover, it is the labelling of a data point as belonging to a group of similarly identified data points.


Clustering is facilitated by algorithms, which are well-defined instructions for carrying out some task. Clustering algorithms encompass the preprocessing of the data, extracting clusters from the data, generating descriptions (labels) of the identified clusters and any post-processes like assessing cluster quality.


The use of clustering type schemes has become somewhat ubiquitous in many industries. As an example, clustering is used by retailers to abstract customer types and related behaviours. Consider Joe Bloggs Clothes Shop. The managers of this shop are able to employ a range of clustering algorithms to isolate specific types of shoppers - perhaps within some customer loyalty program. That is, a clustering algorithm may identify a group of female shoppers who made greater than three high valued purchases in the previous summer season range. Coming into the next summer, Joe Bloggs is able to send these customers specific catalogues and promotions that will encourage these shoppers to return to the store and repeat their previous behaviour.


An area in which clustering and clustering algorithms alike were arguably forged is in web applications. Web search is perhaps the most obvious of these applications. A lot of modern day web search engines return results to search queries as ordered lists of site links, and small descriptions of the related site. However this can be a somewhat clumsy and incumberscent approach to search, especially in the case of a general search entry (see previous "rugby" example). An alternative approach is search result clustering, through which results are clustered into groups of similar results.


Clustering and clustering algorithms emerged with the advent of the technologies that facilitate them, obviously computers. Thus as a field of science, it is in its relative infancy. See evolution progression in the following chart.

In recent years there has been significant developments in clustering algorithms for web applications. There has also been an observable emergence of new applications for these algorithms. Recent advances in clustering have been in part due to broad multi-disciplinary contributions in the wake of wide adoption.

1 comment:

Amateur said...

Hi,
It is a nice introduction on clustering. Can you please mention the source for the blog, especially for the last paragraph?
Thank you.

Prashant