A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

e-Archivo Repository

Show simple item record

dc.contributor.author Pham, Huy-Hieu
dc.contributor.author Salmane, Houssam
dc.contributor.author Khoudour, Louahdi
dc.contributor.author Crouzil, Alain
dc.contributor.author Velastin Carroza, Sergio Alejandro
dc.contributor.author Zegers, Pablo
dc.date.accessioned 2021-05-26T12:04:10Z
dc.date.available 2021-05-26T12:04:10Z
dc.date.issued 2020-03-25
dc.identifier.bibliographicCitation Pham HH, Salmane H, Khoudour L, Crouzil A, Velastin SA, Zegers P. A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera. Sensors. 2020; 20(7):1825
dc.identifier.issn 1424-8220
dc.identifier.uri http://hdl.handle.net/10016/32764
dc.description.abstract We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems.
dc.description.sponsorship Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstrationunder grant agreement N 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander.
dc.language.iso eng
dc.publisher MDPI
dc.rights © 2020 by the authors
dc.rights Atribución 3.0 España
dc.rights.uri http://creativecommons.org/licenses/by/3.0/es/
dc.subject.other Human action recognition
dc.subject.other 3D pose estimation
dc.subject.other RGB sensors
dc.subject.other Deep learning
dc.title A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
dc.type article
dc.subject.eciencia Informática
dc.identifier.doi https://doi.org/10.3390/s20071825
dc.rights.accessRights openAccess
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/600371
dc.relation.projectID Gobierno de España. COFUND2013-51509
dc.type.version publishedVersion
dc.identifier.publicationfirstpage 1825-1
dc.identifier.publicationissue 7
dc.identifier.publicationlastpage 1825-15
dc.identifier.publicationtitle Sensors
dc.identifier.publicationvolume 20
dc.identifier.uxxi AR/0000027935
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Economía, Industria y Competitividad (España)
dc.contributor.funder Ministerio de Educación, Cultura y Deporte (España)
dc.contributor.funder Banco Santander
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record