A context vector model for information retrieval

Billhardt, HolgerBorrajo Millán, DanielMaojo, Víctor2010-02-082010-02-082002Journal of the American Society for Information Science and Technology, 2002, vol. 53, n. 3, p. 236-249https://hdl.handle.net/10016/6790In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with IDF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods.application/pdfeng© Wiley PeriodicalsVector space modelsDocument retrievalVector analysisCo-occurrence analysisContextual informationA context vector model for information retrievalresearch articleInformática10.1002/asi.10032open access2363249Journal of the American Society for Information Science and Technology53