DI - GIAA - Comunicaciones en Congresos y otros eventos

Permanent URI for this collection

http://dspace.uc3m.es/handle/10016/7458

Browse

Now showing 1 - 20 of 132

Fall Detection using Human Skeleton Features
(IET Digital Library, 2021-03-17) Ramirez, Heilym; Velastin Carroza, Sergio Alejandro; Fabregas, Ernesto; Meza, Ignacio; Makris, Dimitrios; Farias, Gonzalo
Falls is one of the leading causes of death and serious injury in people, especially the elderly. In addition, the falls accidents have a direct financial cost for health systems and, indirectly, for the productivity of society. Among the most important problems in fall detection systems is privacy, limitations of operating devices, and the comparison of machine learn-ing techniques for detection. This article presents a fall detection system by means of a k-Nearest Neighbor (KNN) classifier based on camera-vision using pose detection of the human skeleton for the features extraction. The proposed method is evaluated with UP-FALL dataset, surpassing on the results of other fall detection systems that use the same database. This method achieves a 98.84% accuracy andF1-Score of 97.41%.
Multiple Object Tracking for Robust Quantitative Analysis of Passenger Motion While Boarding and Alighting a Metropolitan Train
(IET Digital Library, 2021-03-17) Gómez Meza, José Sebastián; Delpiano, José; Velastin Carroza, Sergio Alejandro; Fernández, Rodrigo; Seriani Awad, Sebastián
To achieve significant improvements in public transport it is necessary to develop an autonomous system that locates and counts passengers in real time in scenarios with a high level of occlusion, providing tools to efficiently solve problems such as reduction and stabilization in travel times, greater fluency, better control of fleets and less congestion. A deep learning method based in transfer learning is used to accomplish this: You Only Look Once (YOLO) version 3 and Faster R-CNN Inception version 2 architectures are fine tuned using PAMELA-UANDES dataset, which contains annotated images of the boarding and alighting of passengers on a subway platform from a superior perspective. The locations given by the detector are passed through a multiple object tracking system implemented based on a Markov decision process that associates subjects in consecutive frames and assigns identities considering overlaps between past detections and predicted positions using a Kalman filter.
Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection
(IET Digital Library, 2021-03-17) Mai, Nguyen-Anh-Minh; Duthon, Pierre; Khoudour, Louahdi; Crouzil, Alain; Velastin Carroza, Sergio Alejandro
The ability to accurately detect and localize objects is recognized as being the most important for the perception of self-driving cars. From 2D to 3D object detection, the most difficult is to determine the distance from the ego-vehicle to objects. Expensive technology like LiDAR can provide a precise and accurate depth information, so most studies have tended to focus on this sensor showing a performance gap between LiDAR-based methods and camera-based methods. Although many authors have investigated how to fuse LiDAR with RGB cameras, as far as we know there are no studies to fuse LiDAR and stereo in a deep neural network for the 3D object detection task. This paper presents SLS-Fusion, a new approach to fuse data from 4-beam LiDAR and a stereo camera via a neural network for depth estimation to achieve better dense depth maps and thereby improves 3D object detection performance. Since 4-beam LiDAR is cheaper than the well-known 64-beam LiDAR, this approach is also classified as a low-cost sensors-based method. Through evaluation on the KITTI benchmark, it is shown that the proposed method significantly improves depth estimation performance compared to a baseline method. Also when applying it to 3D object detection, a new state of the art on low-cost sensor based method is achieved.
Analysis of real data with sensors and estimation outputs in configurable UAV platforms
(Ieee Computer Society., 2017-10-10) García Herrero, Jesús; Molina López, José Manuel; Trincado Castán, Jorge; Sánchez, Jorge
Abstract: This paper presents a methodology to assess the performance of sensor fusion in UAVs based on PixHawk flight controller and peripherals to create ad-hoc unmanned vehicles and his adequacy to create different projects based on his architecture. The selected platform is described with stress on available sensors and data processing software, and the experimental methodology is proposed to characterize sensor data fusion output.
Silhouette-based human action recognition with a multi-class support vector machine
(Institution Of Engineering And Technology (IET), 2018-05) González, Luis; Velastin Carroza, Sergio Alejandro; Acuña Leiva, Gonzalo; European Commission; Ministerio de Economía y Competitividad (España); Ministerio de Educación, Cultura y Deporte (España)
Computer vision systems have become increasingly popular, being used to solve a wide range of problems. In this paper, a computer vision algorithm with a support vector machine (SVM) classifier is presented. The work focuses on the recognition of human actions through computer vision, using a multi-camera dataset of human actions called MuHAVi. The algorithm uses a method to extract features, based on silhouettes. The challenge is that in MuHAVi these silhouettes are noisy and in many cases include shadows. As there are many actions that need to be recognised, we take a multiclass classification ap-proach that combines binary SVM classifiers. The results are compared with previous results on the same dataset and show a significant improvement, especially for recognising actions on a different view, obtaining overall accuracy of 85.5% and of 93.5% for leave-one-camera-out and leave-one-actor-out tests respectively.
Detection of People Boarding/Alighting a Metropolitan Train using Computer Vision
(Institution Of Engineering And Technology (IET), 2018-05) Belloc, M.; Velastin Carroza, Sergio Alejandro; Fernández, R.; Jara, M.; European Commission; Ministerio de Economía y Competitividad (España)
Pedestrian detection and tracking have seen a major progress in the last two decades. Nevertheless there are always appli-cation areas which either require further improvement or that have not been sufficiently explored or where production level performance (accuracy and computing efficiency) has not been demonstrated. One such area is that of pedestrian monitoring and counting in metropolitan railways platforms. In this paper we first present a new partly annotated dataset of a full-size laboratory observation of people boarding and alighting from a public transport vehicle. We then present baseline results for automatic detection of such passengers, based on computer vi-sion, that could open the way to compute variables of interest to traffic engineers and vehicle designers such as counts and flows and how they are related to vehicle and platform layout.
Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN
(The Institution of Engineering and Technology, 2018-05-22) Espinosa, Jorge E.; Velastin Carroza, Sergio Alejandro; Branch, John W.; European Commission
This paper introduces a Deep Learning Convolutional Neutral Network model based on Faster-RCNN for motorcycle detection and classification on urban environments. The model is evaluated in occluded scenarios where more than 60% of the vehicles present a degree of occlusion. For training and evaluation, we introduce a new dataset of 7500 annotated images, captured under real traffic scenes, using a drone mounted camera. Several tests were carried out to design the network, achieving promising results of 75% in average precision (AP), even with the high number of occluded motorbikes, the low angle of capture and the moving camera. The model is also evaluated on low occlusions datasets, reaching results of up to 92% in AP.
Evaluation Framework for Crowd Behaviour Simulation and Analysis based on Real Videos and Scene Reconstruction
(IEEE, 2017-01-23) Jablonski, Konrad; Argyriou, Vasileios; Greenhill, Darrel; Velastin Carroza, Sergio Alejandro
Crowd simulation has been regarded as an important research topic in computer graphics, computer vision, and related areas. Various approaches have been proposed to simulate real life scenarios. In this paper, a novel framework that evaluates the accuracy and the realism of crowd simulation algorithms is presented. The framework is based on the concept of recreating real video scenes in 3D environments and applying crowd and pedestrian simulation algorithms to the agents using a plug-in architecture. The real videos are compared with recorded videos of the simulated scene and novel Human Visual System (HVS) based similarity features and metrics are introduced in order to compare and evaluate simulation methods. The experiments show that the proposed framework provides efficient methods to evaluate crowd and pedestrian simulation algorithms with high accuracy and low cost.
Characterisation of the spatial sensitivity of classifiers in pedestrian detection
(IEEE, 2015-09) Quinteros, Daniel; Velastin Carroza, Sergio Alejandro; Acuña Leiva, Gonzalo; European Commission; Ministerio de Economía y Competitividad (España); Ministerio de Educación, Cultura y Deporte (España)
In this paper, a study of the spatial sensitivity in the pedestrian detection context is carried out by a comparison of two descriptor-classifier combinations, using the well-known sliding window approach and looking for a well-tuned response of the detector. By well-tuned, we mean that multiple detections are minimised so as to facilitate the usual non-maximal suppression stage. So, to guide the evaluation we introduce the concept of spatial sensitivity so that a pedestrian detection algorithm with good spatial sensitivity can reduce the number of classifications in the pedestrian neighbourhood, ideally to one. To characterise spacial sensitivity we propose and use a new metric to measure it. Finally we carry out a statistical analysis (ANOVA) to validate the results obtained from the metric usage.
Multi-view Human Action Recognition using Histograms of Oriented Gradients (HOG) Description of Motion History Images (MHIs)
(IEEE, 2015-12) Murtaza, Fiza; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro; European Commission; Ministerio de Economía y Competitividad (España)
In this paper, a silhouette-based view-independent human action recognition scheme is proposed for multi-camera dataset. To overcome the high-dimensionality issue, incurred due to multi-camera data, the low-dimensional representation based on Motion History Image (MHI) was extracted. A single MHI is computed for each view/action video. For efficient description of MHIs Histograms of Oriented Gradients (HOG) are employed. Finally the classification of HOG based description of MHIs is based on Nearest Neighbor (NN) classifier. The proposed method does not employ feature fusion for multi-view data and therefore this method does not require a fixed number of cameras setup during training and testing stages. The proposed method is suitable for multi-view as well as single view dataset as no feature fusion is used. Experimentation results on multi-view MuHAVi-14 and MuHAVi-8 datasets give high accuracy rates of 92.65% and 99.26% respectively using Leave-One-Sequence-Out (LOSO) cross validation technique as compared to similar state-of-the-art approaches. The proposed method is computationally efficient and hence suitable for real-time action recognition systems.
People Counting in Videos by Fusing Temporal Cues from Spatial Context-Aware Convolutional Neural Networks
(Springer, 2016-11-03) Sourtzinos, Panos; Velastin Carroza, Sergio Alejandro; Jara, Miguel; Zegers, Pablo; Makris, Dimitrios
We present an efficient method for people counting in video sequences from fixed cameras by utilising the responses of spatially context-aware convolutional neural networks (CNN) in the temporal domain. For stationary cameras, the background information remains fairly static, while foreground characteristics, such as size and orientation may depend on their image location, thus the use of whole frames for training a CNN improves the differentiation between background and foreground pixels. Foreground density representing the presence of people in the environment can then be associated with people counts. Moreover the fusion, of the responses of count estimations, in the temporal domain, can further enhance the accuracy of the final count. Our methodology was tested using the publicly available Mall dataset and achieved a mean deviation error of 0.091.
DA-VLAD: Discriminative action vector of locally aggregated descriptors for action recognition
(IEEE, 2018-09-06) Murtaza, Fiza; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro; European Commission; Ministerio de Economía y Competitividad (España)
In this paper, we propose a novel encoding method for the representation of human action videos, that we call Discriminative Action Vector of Locally Aggregated Descriptors (DA-VLAD). DA-VLAD is motivated by the fact that there are many unnecessary and overlapping frames that cause non-discriminative codewords during the training process. DA-VLAD deals with this issue by extracting class-specific clusters and learning the discriminative power of these codewords in the form of informative weights. We use these discriminative action weights with standard VLAD encoding as a contribution of each codeword. DA-VLAD reduces the inter-class similarity efficiently by diminishing the effect of common codewords among multiple action classes during the encoding process. We present the effectiveness of DA-VLAD on two challenging action recognition datasets: UCF101 and HMDB51, improving the state-of-the-art with accuracies of 95.1% and 80.1% respectively.
3D-Hog Embedding Frameworks for Single and Multi-Viewpoints Action Recognition Based on Human Silhouettes
(IEEE, 2018-09-13) Angelini, Federico; Fu, Zeyu; Velastin Carroza, Sergio Alejandro; Chambers, Jonathon A.; Naqvi, Syed Mohsen
Given the high demand for automated systems for human action recognition, great efforts have been undertaken in recent decades to progress the field. In this paper, we present frameworks for single and multi-viewpoints action recognition based on Space-Time Volume (STV) of human silhouettes and 3D-Histogram of Oriented Gradient (3D-HOG) embedding. We exploit fast-computational approaches involving Principal Component Analysis (PCA) over the local feature spaces for compactly describing actions as combinations of local gestures and L 2 -Regularized Logistic Regression (L 2 -RLR) for learning the action model from local features. Outperforming results on Weizmann and i3DPost datasets confirm efficacy of the proposed approaches as compared to the baseline method and other works, in terms of accuracy and robustness to appearance changes.
An Optimized and Fast Scheme for Real-time Human Detection using Raspberry Pi
(IEEE, 2016-11-30) Noman, Mubashir; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro
Real-time human detection is a challenging task due to appearance variance, occlusion and rapidly changing content; therefore it requires efficient hardware and optimized software. This paper presents a real-time human detection scheme on a Raspberry Pi. An efficient algorithm for human detection is proposed by processing regions of interest (ROI) based upon foreground estimation. Different number of scales have been considered for computing Histogram of Oriented Gradients (HOG) features for the selected ROI. Support vector machine (SVM) is employed for classification of HOG feature vectors into detected and non-detected human regions. Detected human regions are further filtered by analyzing the area of overlapping regions. Considering the limited capabilities of Raspberry Pi, the proposed scheme is evaluated using six different testing schemes on Town Centre and CAVIAR datasets. Out of these six testing schemes, Single Window with two Scales (SW2S) processes 3 frames per second with acceptable less accuracy than the original HOG. The proposed algorithm is about 8 times faster than the original multi-scale HOG and recommended to be used for real-time human detection on a Raspberry Pi.
Feature Similarity and Frequency-Based Weighted Visual Words Codebook Learning Scheme for Human Action Recognition
(Springer, 2017-11-20) Nazir, Saima; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro; European Commission; Ministerio de Economía y Competitividad (España)
Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weighted class specific dictionary learning scheme to reflect the importance of visual words for a particular action class. Weighted class specific dictionary learning enriches the scheme to learn a sparse representation for a particular action class. To evaluate our scheme on realistic and complex scenarios, we have tested it on UCF Sports and UCF11 benchmark datasets. This paper reports experimental results that outperform recent state-of-the-art methods for the UCF Sports and the UCF11 dataset i.e. 98.93% and 93.88% in terms of average accuracy respectively. To the best of our knowledge, this contribution is first to apply a weighted class specific dictionary learning method on realistic human action recognition datasets.
Inter and Intra Class Correlation Analysis (IIcCA) for Human Action Recognition in Realistic Scenarios
(The Institution Of Engineering And Technology, 2017-07-11) Nazir, Saima; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro; European Commission; Ministerio de Economía y Competitividad (España); Ministerio de Educación, Cultura y Deporte (España)
Human action recognition in realistic scenarios is an important yet challenging task. In this paper we propose a new method, Inter and Intra class correlation analysis (IICCA), to handle inter and intra class variations observed in realistic scenarios. Our contribution includes learning a class specific visual representation that efficiently represents a particular action class and has a high discriminative power with respect to other action classes. We use statistical measures to extract visual words that are highly intra correlated and less inter correlated. We evaluated and compared our approach with state-of-the-art work using a realistic benchmark human action recognition dataset.
Shadow Detection for Vehicle Detection in Urban Environments
(Springer, 2017-07-02) Hanif, Muhammad; Hussain, Fawad; Yousaf, Muhammad Haroon; Velastin Carroza, Sergio Alejandro; Chen, Zezhi
Finding an accurate and computationally efficient vehicle detection and classification algorithm for urban environment is challenging due to large video datasets and complexity of the task. Many algorithms have been proposed but there is no efficient algorithm due to various real-time issues. This paper proposes an algorithm which addresses shadow detection (which causes vehicles misdetection and misclassification) and incorporates solution of other challenges such as camera vibration, blurred image, illumination and weather changing effects. For accurate vehicles detection and classification, a combination of self-adaptive GMM and multi-dimensional Gaussian density transform has been used for modeling the distribution of color image data. RGB and HSV color space based shadow detection is proposed. Measurement-based feature and intensity based pyramid histogram of orientation gradient are used for classification into four main vehicle categories. The proposed method achieved 96.39% accuracy, while tested on Chile (MTT) dataset recorded at different times and weather conditions and hence suitable for urban traffic environment
Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks
(IEEE, 2018-09-06) Hieu Pham, Huy; Khoudour, Louahdi; Crouzil, Alain; Zegers, Pablo; Velastin Carroza, Sergio Alejandro
We propose a novel skeleton-based representation for 3D action recognition in videos using Deep Convolutional Neural Networks (D-CNNs). Two key issues have been addressed: First, how to construct a robust representation that easily captures the spatial-temporal evolutions of motions from skeleton sequences. Second, how to design D-CNNs capable of learning discriminative features from the new representation in a effective manner. To address these tasks, a skeleton-based representation, namely, SPMF (Skeleton Pose-Motion Feature) is proposed. The SPMFs are built from two of the most important properties of a human action: postures and their motions. Therefore, they are able to effectively represent complex actions. For learning and recognition tasks, we design and optimize new D-CNNs based on the idea of Inception Residual networks to predict actions from SPMFs. Our method is evaluated on two challenging datasets including MSR Action3D and NTU-RGB+D. Experimental results indicated that the proposed method surpasses state-of-the-art methods whilst requiring less computation.
Motorcycle Classification in Urban Scenarios using Convolutional Neural Networks for Feature Extraction
(Institution Of Engineering And Technology (IET), 2017-07-12) Espinosa, Jorge E.; Velastin Carroza, Sergio Alejandro; Branch, John W.; Ministerio de Economía y Competitividad (España); European Commission
This paper presents a motorcycle classification system for urban scenarios using Convolutional Neural Network (CNN). Significant results on image classification has been achieved using CNNs at the expense of a high computational cost for training with thousands or even millions of examples. Nevertheless, features can be extracted from CNNs already trained. In this work AlexNet, included in the framework CaffeNet, is used to extract features from frames taken on a real urban scenario. The extracted features from the CNN are used to train a support vector machine (SVM) classifier to discriminate motorcycles from other road users. The obtained results show a mean accuracy of 99.40% and 99.29% on a classification task of three and five classes respectively. Further experiments are performed on a validation set of images showing a satisfactory classification.
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks
(The Institution Of Engineering And Technology, 2017-07-11) Pham, Huy-Hieu; Khoudour, Louahdi; Crouzil, Alain; Zegers, Pablo; Velastin Carroza, Sergio Alejandro
Automatic human action recognition is indispensable for almost artificial intelligent systems such as video surveillance, human-computer interfaces, video retrieval, etc. Despite a lot of progresses, recognizing actions in a unknown video is still a challenging task in computer vision. Recently, deep learning algorithms has proved its great potential in many vision-related recognition tasks. In this paper, we propose the use of Deep Residual Neural Networks (ResNets) to learn and recognize human action from skeleton data provided by Kinect sensor. Firstly, the body joint coordinates are transformed into 3D-arrays and saved in RGB images space. Five different deep learning models based on ResNet have been designed to extract image features and classify them into classes. Experiments are conducted on two public video datasets for human action recognition containing various challenges. The results show that our method achieves the state-of-the-art performance comparing with existing approaches

Browse

Recent Submissions