Sunday, October 14, 2007
There are three principle clustering experiments:
1. 750 Documents: this experiment takes the concepts that have 50 or more documents. Only 15 groups satisfy this condition. I take 50 documents from each of these groups. Thus the 750 documents;
2. 9032 Documents: this is the complete corpus, as communicated on the post on 29 September. This corpus has 65 groups; and
3. 2894 Documents: this is the entire corpus less the two most commonly occuring groups - acq and earn. These account for 6138 documents, thus 2894 remaining and 63 groups.