This paper introduces an extension of the p-median problem and its application to clustering,
in which the distance/dissimilarity function between units is calculated as the distance sum on
the q most important variables. These variables are to be chosen froThis paper introduces an extension of the p-median problem and its application to clustering,
in which the distance/dissimilarity function between units is calculated as the distance sum on
the q most important variables. These variables are to be chosen from a set of m elements, so a
new combinatorial feature has been added to the problem, that we call the p-median model
with distance selection. This problem has its origin in cluster analysis, often applied to
sociological surveys, where it is common practice for a researcher to select the q statistical
variables they predict will be the most important in discriminating the statistical units before
applying the clustering algorithm. Here we show how this selection can be formulated as a
non-linear mixed integer optimization mode and we show how this model can be linearized in
several different ways. These linearizations are compared in a computational study and the
results outline that the radius formulation of the p-median is the most efficient model for
solving this problem.[+][-]