What am I left with?
How are these clusters distributed?
In the above chart, only clusters with greater than 50 documents are included. That is, clusters whose documents represent greater than 0.6% of the corpus. These are the 'top 15' documents, and in total they account for 91.4% of all documents. The rest are allocated to other: 774 documents. The 'top two' clusters represent 68% of all documents, and thus this dataset is not normally distributed.