DTSC - GPM - Comunicaciones en congresos y otros eventos

Permanent URI for this collection

http://dspace.uc3m.es/handle/10016/2314

Browse

Now showing 1 - 20 of 34

A probabilistic topic approach for context-aware visual attention modeling
(IEEE, 2015-06-30) Fernández Torres, Miguel Ángel; González Díaz, Iván; Díaz de María, Fernando; Ministerio de Economía y Competitividad (España)
The modeling of visual attention has gained much interest during the last few years since it allows to efficiently drive complex visual processes to particular areas of images or video frames. Although the literature concerning bottom-up saliency models is vast, we still lack of generic approaches modeling top-down task and context-driven visual attention. Indeed, many top-down models simply modulate the weights associated to low-level descriptors to learn more accurate representations of visual attention than those ones of the generic fusion schemes in bottom-up techniques. In this paper we propose a hierarchical generic probabilistic framework that decomposes the complex process of context-driven visual attention into a mixture of latent subtasks, each of them being in turn modeled as a combination of specific distributions of low-level descriptors. The inclusion of this intermediate level bridges the gap between low-level features and visual attention and enables more comprehensive representations of the later. Our experiments on a dataset in which videos are organized by genre demonstrate that, by learning specific distributions for each video category, we can notably enhance the system performance.
Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features
(International Speech Communication Association (ISCA), 2015) Zlotnik, Alexander; Montero Martínez, Juan Manuel; San Segundo Hernández, Rubén; Gallardo Antolín, Ascensión; European Commission; Ministerio de Economía y Competitividad (España)
The Interspeech ComParE 2015 PC Sub-Challenge consists of automatically determining the degree of Parkinson's condition using exclusively the patient's voice. In this paper, we face this problem as a regression task and in order to succeed, we propose the use of an ensemble learning method, Random Forest (RF), in combination with features of different nature: acoustic characteristics, features derived from the output of an Automatic Speech Recognition system (ASR) and non-intrusive intelligibility measures. The system outperforms the baseline results achieving a relative improvement higher than 19% in the development set.
A saliency-based attention LSTM model for cognitive load classification from speech
(International Speech Communication Association (ISCA), 2019) Gallardo Antolín, Ascensión; Montero Martínez, Juan Manuel; Ministerio de Economía y Competitividad (España)
Cognitive Load (CL) refers to the amount of mental demand that a given task imposes on an individual's cognitive system and it can affect his/her productivity in very high load situations. In this paper, we propose an automatic system capable of classifying the CL level of a speaker by analyzing his/her voice. Our research on this topic goes into two main directions. In the first one, we focus on the use of Long Short-Term Memory (LSTM) networks with different weighted pooling strategies for CL level classification. In the second contribution, for overcoming the need of a large amount of training data, we propose a novel attention mechanism that uses the Kalinli's auditory saliency model. Experiments show that our proposal outperforms significantly both, a baseline system based on Support Vector Machines (SVM) and a LSTM-based system with logistic regression attention model.
Automatic learning of image representations combining content and metadata
(IEEE, 2018-10-07) Martínez Cortés, Tomás; González Díaz, Iván; Díaz de María, Fernando; Ministerio de Economía y Competitividad (España)
Content-based image representation is a very challenging task if we restrict to their visual content. However, associated metadata (such as tags or geolocation) become a valuable source of complementary information that may help to enhance the current system performance. In this paper, we propose an automatic training framework that uses both image visual contents and metadata to fine tune deep Convolutional Neural Networks (CNNs), providing better image descriptors adapted to certain locations, such as cities or regions. Specifically, we propose to estimate some weak labels by combining visual- and location-related information and incorporate them to a novel loss-function over pairs of images. Our experiments on a landmark discovery task show that this novel training procedure enhances the performance up to a 55% over well-established CNN-based models and is free from overfitting
Speaker recognition under stress conditions
(2018-11) Rituerto González, Esther; Gallardo Antolín, Ascensión; Peláez Moreno, Carmen; Ministerio de Economía y Competitividad (España)
Speaker recognition systems exhibit a decrease in performance when the input speech is not in optimal circumstances, for example when the user is under emotional or stress conditions. The objective of this paper is measuring the effects of stress on speech to ultimately try to mitigate its consequences on a speaker recognition task. On this paper, we develop a stress-robust speaker identification system using data selection and augmentation by means of the manipulation of the original speech utterances. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we concluded that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples, improves the performance of the system.
Towards Galois Connections over Positive Semifields
(Springer, 2016-06-11) Peláez Moreno, Carmen; Valverde Albacete, Francisco José
In this paper we try to extend the Galois connection construction of K-Formal Concept Analysis to handle semifields which are not idempotent. Important examples of such algebras are the extended non-negative reals and the extended non-negative rationals, but we provide a construction that suggests that such semifields are much more abundant than suspected. This would broaden enormously the scope and applications of K-Formal Concept Analysis.
Towards the algebraization of Formal Concept Analysis over complete dioids
(2014) Valverde Albacete, Francisco José; Peláez Moreno, Carmen; Universidad de Zaragoza
Complete dioids are already complete residuated lattices. Formal contexts with entries in them generate Concept Lattices with the help of the polar maps. Previous work has already established the spectral nature of some formal concepts for contexts over certain kinds of dioids. This paper tries to raise the awareness that linear algebra over exotic semirings should be one place to look to understand the properties of FCA over L-lattices.
Speech Denoising Using Non-Negative Matrix Factorization with Kullback-Leibler Divergence and Sparseness Constraints
(Springer, 2012) Gallardo Antolín, Ascensión; Ludeña Choez, Jimmy D.
A speech denoising method based on Non-Negative Matrix Factorization (NMF) is presented in this paper. With respect to previous related works, this paper makes two contributions. First, our method does not assume a priori knowledge about the nature of the noise. Second, it combines the use of the Kullback-Leibler divergence with sparseness constraints on the activation matrix, improving the performance of similar techniques that minimize the Euclidean distance and/or do not consider any sparsification. We evaluate the proposed method for both, speech enhancement and automatic speech recognitions tasks, and compare it to conventional spectral subtraction, showing improvements in speech quality and recognition accuracy, respectively, for different noisy conditions.
Morphological processing of a dynamic compressive gammachirp filterbank for automatic speech recognition
(Universidad Autonoma De Madrid. Escuela Politécnica Superior, 2012) Cadore, Joyner; Peláez Moreno, Carmen; Gallardo Antolín, Ascensión
The Dynamic Compressive Gammachirp is presented for producing auditory-inspired feature extraction in Automatic Speech Recognition. The proposed acoustic features combine spectral subtraction and two-dimensional non-linear filtering technique most usually employed for image processing: morphological filtering. These features have been proven to be more robust to noisy speech than those based on simpler auditory filterbanks like the classical mel-scaled triangular filterbank, the Gammatone filterbank and the passive Gammachirp in a noisy Isolet database.
A Bayesian model for brain tumor classification using clinical-based features
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2014) Martínez -Cortés, Tomás; Fernández Torres, Miguel Ángel; Jiménez Moreno, Amaya; González Díaz, Iván; Díaz de María, Fernando; Guzmán de Villoria, Juan Adán; Fernández, Pilar
This paper tackles the problem of automatic brain tumor classification from Magnetic Resonance Imaging (MRI) where, traditionally, general-purpose texture and shape features extracted from the Region of Interest (tumor) have become the usual parameterization of the problem. Two main contributions are made in this context. First, a novel set of clinical-based features that intend to model intuitions and expert knowledge of physicians is suggested. Second, a system is proposed that is able to fuse multiple individual scores (based on a particular MRI sequence and a pathological indicator present in that sequence) by using a Bayesian model that produces a global system decision. This approximation provides a quite flexible solution able to handle missing data, which becomes a very likely case in a realistic scenario where the number clinical tests varies from one patient to another. Furthermore, the Bayesian model provides extra information concerning the uncertainty of the final decision. Our experimental results prove that the use of clinical-based feature leads to a significant increment of performance in terms of Area Under the Curve (AUC) when compared to a state-of-the art reference. Furthermore, the proposed Bayesian fusion model clearly outperforms other fusion schemes, especially when few diagnostic tests are available.
Mid-level feature set for specific event and anomaly detection in crowded scenes
(IEEE, 2013) Calle Silos, Fernando de la; González Díaz, Iván; Díaz de María, Fernando
In this paper we propose a system for automatic detection of specific events and abnormal behaviors in crowded scenes. In particular, we focus on the parametrization by proposing a set of mid-level spatio-temporal features that successfully model the characteristic motion of typical events in crowd behaviors. Furthermore, due to the fact that some features are more suitable than others to model specific events of interest, we also present an automatic process for feature selection. Our experiments prove that the suggested feature set works successfully for both explicit event detection and distance-based anomaly detection tasks. The results on PETS for explicit event detection are generally better than those previously reported. Regarding anomaly detection, the proposed method performance is comparable to those of state-of-the-art method for PETS and substantially better than that reported for Web dataset.
A generative model for concurrent image retrieval and ROI segmentation
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2012) González Díaz, Iván; Baz-Hormigos, Carlos E.; Berdonces, Moisés; Díaz de María, Fernando
This paper proposes a probabilistic generative model that concurrently tackles the problems of image retrieval and detection of the region-of-interest (ROI). By introducing a latent variable that classifies the matches as true or false, we specifically focus on the application of geometric constrains to the keypoint matching process and the achievement of robust estimates of the geometric transformation between two images showing the same object. Our experiments in a challenging image retrieval database demonstrate that our approach outperforms the most prevalent approach for geometrically constrained matching, and compares favorably to other state-of-the-art methods. Furthermore, the proposed technique concurrently provides very good segmentations of the region of interest.
Standard compliant flicker reduction method with PSNR loss control
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2013-05-26) Jiménez Moreno, Amaya; Martínez Enríquez, Eduardo; Díaz de María, Fernando
Flicker is a common video coding artifact that occurs especially at low and medium bit rates. In this paper we propose a temporal filter-based method to reduce flicker. The proposed method has been designed to be compliant with conventional video coding standards, i.e., to generate a bitstream that is decodable by any standard decoder implementation. The aim of the proposed method is to make the luminance changes between consecutive frames smoother on a block-by-block basis. To this end, a selective temporal low-pass filtering is proposed that smooths these luminance changes on flicker-prone blocks. Furthermore, since the low-pass filtering can incur in a noticeable blurring effect, an adaptive algorithm that allows for limiting the PSNR loss -and thus the blur-has also been designed. The proposed method has been extensively assessed on the reference software of the H.264/AVC video coding standard and compared to a state-of-the-art method. The experimental results show the effectiveness of the proposed method and prove that its performance is superior to that of the state-of-the-art method.
Perceptually-aware bilateral filtering for quality improvement in low bit rate video coding
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2012) Frutos-López, Manuel de; Medina Chanca, Helen; Sanz Rodríguez-Escalona, Sergio; Peláez Moreno, Carmen; Díaz de María, Fernando
Perceptual coding has become of great interest in modern video coding due to the need for higher compression rates. Many previous works have been carried out to incorporate perceptual information to hybrid video encoders, either modifying the quantization parameter according to a certain perceptual resource allocation map or preprocessing video sequences for removing information that is not perceptually relevant. The first strategy is limited by the presence of blocking artifacts and the second one lacks of adaptation to video content. In this paper, a novel and simple approach is proposed, which performs a smart filtering prior to the encoding process preserving both the structural and motion information. The experiments prove that the use of proposed method implemented on an H.264 encoder significantly improves its perceptual quality for low bit rates.
A simplified subjective video quality assessment method based on signal detection theory
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2012) Frutos-López, Manuel de; Mejía-Ocaña, Ana Belén; Sanz Rodríguez-Escalona, Sergio; Peláez Moreno, Carmen; Díaz de María, Fernando; Pizlo, Zigmunt
A simplified protocol and associated metrics based on Signal Detection Theory (SDT) for subjective Video Quality Assessment (VQA) is proposed with the aim of filling the gap existing between the lack of discrimination abilities of objective quality estimates, specially when perceptually motivated processing methods are involved and the costly normative subjective quality tests. The proposed protocol employs a reduced number of assessors and provides a quality ranking of the methods being evaluated. It is intended for providing the rapid experimental turn around necessary for developing algorithms. We have validated our proposal performing the test on a well-known result for the video coding community: namely, that the inclusion of an in-loop deblocking filter provides a quality enhancement. The results obtained corroborate this fact. A software interface to design and administrate the test is also made publicly available.
A Proposal for New Evaluation Metrics and Result Visualization Technique for Sentiment Analysis Tasks
(Springer, 2013) Valverde Albacete, Francisco José; Carrillo de Albornoz, Jorge; Peláez Moreno, Carmen
In this paper we propound the use of a number of entropybased metrics and a visualization tool for the intrinsic evaluation of Sentiment and Reputation Analysis tasks. We provide a theoretical justification for their use and discuss how they complement other accuracybased metrics. We apply the proposed techniques to the analysis of TASS-SEPLN and RepLab 2012 results and show how the metric is effective for system comparison purposes, for system development and postmortem evaluation.
NMF-Based Spectral Analysis for Acoustic Event Classification Tasks
(Springer, 2013) Gallardo Antolín, Ascensión; Ludeña Choez, Jimmy D.
In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC). First, we study the spectral contents of different acoustic events by applying Non-Negative Matrix Factorization (NMF) on their spectral magnitude and compare them with the structure of speech spectra. Second, from the findings of this study, we propose a new parameterization for AEC, which is an extension of the conventional Mel Frequency Cepstrum Coefficients (MFCC) and is based on the high pass filtering of acoustic event spectra. Also, the influence of different frequency scales on the classification rate of the whole system is studied. The evaluation of the proposed features for AEC shows that relative error reductions about 12% at segment level and about 11% at target event level with respect to the conventional MFCC are achieved.
Deep Maxout Networks applied to Noise-Robust Speech Recognition
(Springer, 2014) Calle Silos, Fernando de la; Gallardo Antolín, Ascensión; Peláez Moreno, Carmen
Deep Neural Networks (DNN) have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models (GMM). However, not many works have addressed the robustness of these systems under noisy conditions. Recently, the machine learning community has proposed new methods to improve the accuracy of DNNs by using techniques such as dropout and maxout. In this paper, we investigate Deep Maxout Networks (DMN) for acoustic modeling in a noisy automatic speech recognition environment. Experiments show that DMNs improve substantially the recognition accuracy over DNNs and other traditional techniques in both clean and noisy conditions on the TIMIT dataset.
On Concept Lattices as Information Channels
(Ceur Workshop Proceedings, 2014-10) Valverde Albacete, Francisco José; Peláez Moreno, Carmen; Peñas, Anselmo
This paper explores the idea that a concept lattice is an information channel between objects and attributes. For this purpose we study the behaviour of incidences in L-formal contexts where L is the range of an information-theoretic entropy function. Examples of such data abound in machine learning and data mining, e.g. confusion matrices of multi-class classifers or document-term matrices. We use a wellmotivated information-theoretic heuristic, the maximization of mutual information, that in our conclusions provides a favour of feature selection providing and information-theory explanation of an established practice in Data Mining, Natural Language Processing and Information Retrieval applications, viz. stop-wording and frequency thresholding. We also introduce a post-clustering class identi cation in the presence of confusions and a favour of term selection for a multi-label document classifcation task.
Spectral Lattices of reducible matrices over completed idempotent semifields
(Ceur Workshop Proceedings, 2013) Valverde Albacete, Francisco José; Peláez Moreno, Carmen
Previous work has shown a relation between L-valued extensions of FCA and the spectra of some matrices related to L-valued contexts. We investigate the spectra of reducible matrices over completed idempotent semifields in the framework of dioids, naturally-ordered semirings, that encompass several of those extensions. Considering special sets of eigenvectors also brings out complete lattices in the picture and we argue that such structure may be more important than standard eigenspace structure for matrices over completed idempotent semifields.

Browse

Recent Submissions