RT Journal Article
T1 A context vector model for information retrieval
A1 Billhardt, Holger
A1 Borrajo Millán, Daniel
A1 Maojo, Víctor
AB In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with IDF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods.
PB Wiley & Sons
YR 2002
FD 2002
LK https://hdl.handle.net/10016/6790
UL https://hdl.handle.net/10016/6790
LA eng
DS e-Archivo
RD 18 may. 2024