Sunday, October 7, 2007

Custom Lingo Document Clustering Application

The developed custom lingo document clustering application is a hybrid of the Carrot2 and KEA text mining applications. The system uses the information retrieval module of KEA, which allows for the efficient collection of text files from a local directory. The system then largely resembles a Carrot2 lingo-type application. Thus it uses the Porter Stemmer for stemming. It uses an adjusted stop word list, which encompasses all unique stop words used by KEA and Carrot2. This list has 577 unique stop words.
The outputs of this application are document cluster assignments, document assignment scores and cluster descriptions. It is necessary to create a module to collect all relevant information for clustering interpretation and evaluation. The custom output module appends human-defined labels and document descriptive data to the lingo clustering information. This is then written to an output text file in which each line represents a document, and document data is separated by commas. This schema allows for automated importing into excel, as it is consistent with the comma separated value (“CSV”) format. Excel can then be used to analyse this data.
Using the KEA API, Carrot2 API and source code to resource a lot of the peripheral text mining services allows for a stable environment through which to test the effect of changes to the assumed key phrase on the outcomes of clustering.

6 comments:

Anonymous said...

this kind of blog always useful for blog readers, it helps people during research. your post is one of the same for blog readers.

Thesis Papers Writing

Rita Summers said...

Your post is amazing .It helped me a lot in my research. I appreciate your work. I will come here again to see new updates. Thanks for posting.
term paper topics

Charles said...

Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

Text Analytics Software

Data Scraping Tools

SEO Funda said...

TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech

SEO Funda said...

TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech
TeachTech

Unknown said...

Good blog…Variety of information which is helpful to improve my knowledge even more and very thoughtful blog…Thanks for the article!!!

Pixalive
online Social Media App
Social Media App online
online chat
Play free online games
free online games
online games
Kids Games Online
Free Kids Games
popular social media
most popular social media
best social media sites
best social networking apps
social media web app