xmlui.dri2xhtml.METS-1.0.item-contributor-funder:
Ministerio de Economía y Competitividad (España)
Sponsor:
This work has been partially funded by the Spanish Ministry of Education, Culture and Sports FPU Grant FPU13/04904, and the National Grants TEC2014-53390-P, TEC2014-61729-EXP and TEC2017-84395-P of the Spanish Ministry of Economy and Competitiveness.
Project:
Gobierno de España. TEC2014-53390-P Gobierno de España. TEC2014-61729-EXP Gobierno de España. TEC2017-84395-P
Modern computer vision techniques have to deal
with vast amounts of visual data, which implies a computational effort that has often to be accomplished in broad and challenging scenarios. The interest in efficiently solving these image
and video applicationModern computer vision techniques have to deal
with vast amounts of visual data, which implies a computational effort that has often to be accomplished in broad and challenging scenarios. The interest in efficiently solving these image
and video applications has led researchers to develop methods
to expertly drive the corresponding processing to conspicuous
regions that either depend on the context or are based on specific
requirements. In this paper, we propose a general hierarchical
probabilistic framework, independent of the application scenario,
and relied on the most outstanding psychological studies about
attention and eye movements which support that guidance is
not based directly on the information provided by early visual
processes but on a contextual representation that arose from
them. The approach defines the task of context-driven visual
attention as a mixture of latent sub-tasks, which are, in turn,
modeled as a combination of specific distributions associated
to low-, mid-, and high-level spatio-temporal features. Learning
from fixations gathered from human observers, we incorporate
an intermediate level between feature extraction and visual
attention estimation that enables to obtain comprehensively
guiding representations. The experiments show how our proposal
successfully learns particularly adapted hierarchical explanations
of visual attention in diverse video genres, outperforming several
leading models in the literature.[+][-]