Saturday, July 28, 2007
How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans
This 1997 paper uses a series of experiments to compare the judged perceived quality and quantity of knowledge conveyed within short student essays by both humans and a computer. The premise of these experiments is communicated in the paper title, a 'bag of words' used by a computer to assess knowledge.
The computer uses a corpus-based statistical model known as Latent Semantic Analysis (LSA) for "inducing and representing aspects of the meaning of words and passages reflected in their usage" (p. 412). LSA relies on singular value decomposition (SVD) to transform a corpus (collection of textual documents) into a matrix of document-term frequency terms.
Landauer, Laham, Rehder and Schreiner undertake two well-defined experiments in which LSA and human experts were put to work evaluating scientific-type short essays. The result of these experiments were that "LSA-based measures were closely related to human judgments as the latter were to each other and LSA measures predicted external measures of the same knowledge as well or better than did the human judgements" (p.418).