Generalized vector space model

TheGeneralized vector space modelis a generalization of thevector space modelused ininformation retrieval.Wonget al.[1]presented an analysis of the problems that the pairwise orthogonality assumption of thevector space model(VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM).



GVSM introduces term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, the factor considered a new space, where each term vectortiwas expressed as a linear combination of2nvectorsmrwherer = 1...2n.

For a documentdkand a queryqthe similarity function now becomes:

wheretiandtjare now vectors of a2ndimensional space.

Term correlationcan be implemented in several ways. For an example, Wong et al. uses the term occurrence frequency matrix obtained from automatic indexing as input to their algorithm. The term occurrence and the output is the term correlation between any pair of index terms.

Semantic information on GVSM


There are at least two basic directions for embedding term to term relatedness, other than exact keyword matching, into a retrieval model:

  1. compute semantic correlations between terms
  2. compute frequency co-occurrence statistics from large corpora

Recently Tsatsaronis[2]focused on the first approach.

They measure semantic relatedness (SR) using a thesaurus (O) likeWordNet.It considers the path length, captured by compactness (SCM), and the path depth, captured by semantic path elaboration (SPE). They estimate theinner product by:

wheresiandsjare senses of termstiandtjrespectively, maximizing.

Building also on the first approach, Waitelonis et al.[3]have computed semantic relatedness fromLinked Open Dataresources includingDBpediaas well as theYAGO taxonomy. Thereby they exploits taxonomic relationships among semantic entities in documents and queries afternamed entity linking.


