Publication:
Interpretable global-local dynamics for the prediction of eye fixations in autonomous driving scenarios

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorMartinez Cebrian, Javier
dc.contributor.authorFernández Torres, Miguel Ángel
dc.contributor.authorDíaz de María, Fernando
dc.date.accessioned2021-12-16T09:39:06Z
dc.date.available2021-12-16T09:39:06Z
dc.date.issued2020-12-01
dc.description.abstractHuman eye movements while driving reveal that visual attention largely depends on the context in which it occurs. Furthermore, an autonomous vehicle which performs this function would be more reliable if its outputs were understandable. Capsule Networks have been presented as a great opportunity to explore new horizons in the Computer Vision field, due to their capability to structure and relate latent information. In this article, we present a hierarchical approach for the prediction of eye fixations in autonomous driving scenarios. Context-driven visual attention can be modeled by considering different conditions which, in turn, are represented as combinations of several spatio-temporal features. With the aim of learning these conditions, we have built an encoder-decoder network which merges visual features' information using a global-local definition of capsules. Two types of capsules are distinguished: representational capsules for features and discriminative capsules for conditions. The latter and the use of eye fixations recorded with wearable eye tracking glasses allow the model to learn both to predict contextual conditions and to estimate visual attention, by means of a multi-task loss function. Experiments show how our approach is able to express either frame-level (global) or pixel-wise (local) relationships between features and contextual conditions, allowing for interpretability while maintaining or improving the performance of black-box related systems in the literature. Indeed, our proposal offers an improvement of 29% in terms of information gain with respect to the best performance reported in the literature.en
dc.description.sponsorshipThe authors would like to thank the authors from DR(eye)VE Project [49] for the support provided during this work, as well as the Multimedia Processing Group from the Universidad Carlos III de Madrid for their entire personal and academic implication.en
dc.format.extent18
dc.identifier.bibliographicCitationMartinez-Cebrian, J., Fernandez-Torres, M. A. & Diaz-De-Maria, F. (2020). Interpretable Global-Local Dynamics for the Prediction of Eye Fixations in Autonomous Driving Scenarios. IEEE Access, 8, 217068–217085.en
dc.identifier.doihttps://doi.org/10.1109/ACCESS.2020.3041606
dc.identifier.issn2169-3536
dc.identifier.publicationfirstpage217068
dc.identifier.publicationlastpage217085
dc.identifier.publicationtitleIEEE Accessen
dc.identifier.publicationvolume8
dc.identifier.urihttps://hdl.handle.net/10016/33780
dc.identifier.uxxiAR/0000027338
dc.language.isoeng
dc.publisherIEEEen
dc.rights© The authors, 2020. This work is licensed under a Creative Commons Attribution 4.0 License.en
dc.rightsAtribución 3.0 España*
dc.rights.accessRightsopen accessen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherTop-down visual attentionen
dc.subject.otherEye fixation predictionen
dc.subject.otherContext-based learningen
dc.subject.otherInterpretabilityen
dc.subject.otherCapsule networksen
dc.subject.otherConvolutional neural networksen
dc.subject.otherAutonomous drivingen
dc.titleInterpretable global-local dynamics for the prediction of eye fixations in autonomous driving scenariosen
dc.typeresearch article*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Interpretable_IEEEA_2020.pdf
Size:
2.3 MB
Format:
Adobe Portable Document Format