Monday, July 23, 2007

The Lingo Algorithm

The Lingo Algorithm was developed by Dawid Weiss and Stanislaw Osinski at Poznan University of Technology, Poland. Their project Carrot2 is an open source framework for building search clustering engines.
Carrot2 is the project pertaining to the development of an open-source search results clustering engine, developed by Weiss and Osinski. (Weiss and Osinski, 2007a) This engine employs the lingo algorithm for clustering in text mining, to produce understandable and diverse thematic clusters. (Ibid) The project has spawned a number of applications utilising this approach to automatic document classification. The release of the Carrot2 open-source standalone graphical user interface (“GUI”) clustering tool provides an API framework through which to replicate lingo algorithm experiments.

Carrot2 is a java-based application, consisting of a number of well-defined modules. These modules cover each of the text mining processes described in a previous posting. The Carrot2 API specification provides a structured presentation of the construct of these modules. It describes the Java interfaces, classes and methods of these modules. (Ibid) Moreover, the API makes available a number of well-explained demos and examples assisting in ease of use. This specification also houses a library of Carrot2 filters. This library is comprehensive and for each algorithm presents a hierarchy of its own composite interface and class summaries. Moreover, Carrot2 application source code provides a basis for the replication of the lingo-based clustering scheme as well as presenting a platform for the implementation of the framework necessary for my experiments.

1 comment:

Dawid Weiss said...

Just a clarification -- the Lingo algorithm was actually invented by Stanislaw Osinski. I'm just lucky enough to be working with this guy (and I indeed set up the Carrot2 project, but these two things shouldn't be confused).