Gatsby Unit | Research

GATSBY COMPUTATIONAL NEUROSCIENCE UNIT

Probabilistic Topic Models for Semantic Memory and Information Retrieval

Mark Steyvers

Department of Cognitive Sciences, University of California Irvine , USA

Topic models take a probabilistic approach to semantic cognition and information retrieval, representing "topics" as probability distributions over words. The semantic content of a document is represented by a context-dependent probability distribution over topics. Such dimensionality-reduced “gist” based representations have been useful to understand and explain the structure of semantic networks such as word association and Roget’s thesaurus. In information retrieval, these representations can lead to better retrieval performance when the search is focused on matching the content of documents irrespective of the exact words used in the search query. One limitation of gist-based representations is that they operate only at one level of abstraction and do not explain how context-specific (i.e., episodic) as well as content-specific (i.e., semantic) information can be simultaneously encoded and retrieved. Similarly, gist-based representations have limitations in information retrieval when the matching operation requires multiple levels of abstraction, at both the topic level as well as the word level. We propose a new probabilistic model that represents documents at both the topic level (“what is this document about?”) and the word level (“what words are unique in this context?”). We show how this can explain Von Restorff effects in memory experiments where single unrelated words are better remembered than words fitting the gist of the list. We also illustrate how such models are useful in information retrieval, matching documents at both the topic level and a specific word level.

Authors:

Mark Steyvers ( University of California , Irvine ) [PRESENTER] Tom Griffiths ( Brown University ) Padhraic Smyth ( University of California , Irvine ) Kelly Addis ( Indiana University )