DTSC - GPM - Capítulos de Monografías

Permanent URI for this collection

https://hdl.handle.net/10016/1592

Browse

Now showing 1 - 20 of 23

Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features
(International Speech Communication Association (ISCA), 2015) Zlotnik, Alexander; Montero Martínez, Juan Manuel; San Segundo Hernández, Rubén; Gallardo Antolín, Ascensión; European Commission; Ministerio de Economía y Competitividad (España)
The Interspeech ComParE 2015 PC Sub-Challenge consists of automatically determining the degree of Parkinson's condition using exclusively the patient's voice. In this paper, we face this problem as a regression task and in order to succeed, we propose the use of an ensemble learning method, Random Forest (RF), in combination with features of different nature: acoustic characteristics, features derived from the output of an Automatic Speech Recognition system (ASR) and non-intrusive intelligibility measures. The system outperforms the baseline results achieving a relative improvement higher than 19% in the development set.
A saliency-based attention LSTM model for cognitive load classification from speech
(International Speech Communication Association (ISCA), 2019) Gallardo Antolín, Ascensión; Montero Martínez, Juan Manuel; Ministerio de Economía y Competitividad (España)
Cognitive Load (CL) refers to the amount of mental demand that a given task imposes on an individual's cognitive system and it can affect his/her productivity in very high load situations. In this paper, we propose an automatic system capable of classifying the CL level of a speaker by analyzing his/her voice. Our research on this topic goes into two main directions. In the first one, we focus on the use of Long Short-Term Memory (LSTM) networks with different weighted pooling strategies for CL level classification. In the second contribution, for overcoming the need of a large amount of training data, we propose a novel attention mechanism that uses the Kalinli's auditory saliency model. Experiments show that our proposal outperforms significantly both, a baseline system based on Support Vector Machines (SVM) and a LSTM-based system with logistic regression attention model.
Towards the algebraization of Formal Concept Analysis over complete dioids
(2014) Valverde Albacete, Francisco José; Peláez Moreno, Carmen; Universidad de Zaragoza
Complete dioids are already complete residuated lattices. Formal contexts with entries in them generate Concept Lattices with the help of the polar maps. Previous work has already established the spectral nature of some formal concepts for contexts over certain kinds of dioids. This paper tries to raise the awareness that linear algebra over exotic semirings should be one place to look to understand the properties of FCA over L-lattices.
Speech Denoising Using Non-Negative Matrix Factorization with Kullback-Leibler Divergence and Sparseness Constraints
(Springer, 2012) Gallardo Antolín, Ascensión; Ludeña Choez, Jimmy D.
A speech denoising method based on Non-Negative Matrix Factorization (NMF) is presented in this paper. With respect to previous related works, this paper makes two contributions. First, our method does not assume a priori knowledge about the nature of the noise. Second, it combines the use of the Kullback-Leibler divergence with sparseness constraints on the activation matrix, improving the performance of similar techniques that minimize the Euclidean distance and/or do not consider any sparsification. We evaluate the proposed method for both, speech enhancement and automatic speech recognitions tasks, and compare it to conventional spectral subtraction, showing improvements in speech quality and recognition accuracy, respectively, for different noisy conditions.
A generative model for concurrent image retrieval and ROI segmentation
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2012) González Díaz, Iván; Baz-Hormigos, Carlos E.; Berdonces, Moisés; Díaz de María, Fernando
This paper proposes a probabilistic generative model that concurrently tackles the problems of image retrieval and detection of the region-of-interest (ROI). By introducing a latent variable that classifies the matches as true or false, we specifically focus on the application of geometric constrains to the keypoint matching process and the achievement of robust estimates of the geometric transformation between two images showing the same object. Our experiments in a challenging image retrieval database demonstrate that our approach outperforms the most prevalent approach for geometrically constrained matching, and compares favorably to other state-of-the-art methods. Furthermore, the proposed technique concurrently provides very good segmentations of the region of interest.
Standard compliant flicker reduction method with PSNR loss control
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2013-05-26) Jiménez Moreno, Amaya; Martínez Enríquez, Eduardo; Díaz de María, Fernando
Flicker is a common video coding artifact that occurs especially at low and medium bit rates. In this paper we propose a temporal filter-based method to reduce flicker. The proposed method has been designed to be compliant with conventional video coding standards, i.e., to generate a bitstream that is decodable by any standard decoder implementation. The aim of the proposed method is to make the luminance changes between consecutive frames smoother on a block-by-block basis. To this end, a selective temporal low-pass filtering is proposed that smooths these luminance changes on flicker-prone blocks. Furthermore, since the low-pass filtering can incur in a noticeable blurring effect, an adaptive algorithm that allows for limiting the PSNR loss -and thus the blur-has also been designed. The proposed method has been extensively assessed on the reference software of the H.264/AVC video coding standard and compared to a state-of-the-art method. The experimental results show the effectiveness of the proposed method and prove that its performance is superior to that of the state-of-the-art method.
A simplified subjective video quality assessment method based on signal detection theory
(Ieee - The Institute Of Electrical And Electronics Engineers, Inc, 2012) Frutos-López, Manuel de; Mejía-Ocaña, Ana Belén; Sanz Rodríguez-Escalona, Sergio; Peláez Moreno, Carmen; Díaz de María, Fernando; Pizlo, Zigmunt
A simplified protocol and associated metrics based on Signal Detection Theory (SDT) for subjective Video Quality Assessment (VQA) is proposed with the aim of filling the gap existing between the lack of discrimination abilities of objective quality estimates, specially when perceptually motivated processing methods are involved and the costly normative subjective quality tests. The proposed protocol employs a reduced number of assessors and provides a quality ranking of the methods being evaluated. It is intended for providing the rapid experimental turn around necessary for developing algorithms. We have validated our proposal performing the test on a well-known result for the video coding community: namely, that the inclusion of an in-loop deblocking filter provides a quality enhancement. The results obtained corroborate this fact. A software interface to design and administrate the test is also made publicly available.
A Proposal for New Evaluation Metrics and Result Visualization Technique for Sentiment Analysis Tasks
(Springer, 2013) Valverde Albacete, Francisco José; Carrillo de Albornoz, Jorge; Peláez Moreno, Carmen
In this paper we propound the use of a number of entropybased metrics and a visualization tool for the intrinsic evaluation of Sentiment and Reputation Analysis tasks. We provide a theoretical justification for their use and discuss how they complement other accuracybased metrics. We apply the proposed techniques to the analysis of TASS-SEPLN and RepLab 2012 results and show how the metric is effective for system comparison purposes, for system development and postmortem evaluation.
NMF-Based Spectral Analysis for Acoustic Event Classification Tasks
(Springer, 2013) Gallardo Antolín, Ascensión; Ludeña Choez, Jimmy D.
In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC). First, we study the spectral contents of different acoustic events by applying Non-Negative Matrix Factorization (NMF) on their spectral magnitude and compare them with the structure of speech spectra. Second, from the findings of this study, we propose a new parameterization for AEC, which is an extension of the conventional Mel Frequency Cepstrum Coefficients (MFCC) and is based on the high pass filtering of acoustic event spectra. Also, the influence of different frequency scales on the classification rate of the whole system is studied. The evaluation of the proposed features for AEC shows that relative error reductions about 12% at segment level and about 11% at target event level with respect to the conventional MFCC are achieved.
Deep Maxout Networks applied to Noise-Robust Speech Recognition
(Springer, 2014) Calle Silos, Fernando de la; Gallardo Antolín, Ascensión; Peláez Moreno, Carmen
Deep Neural Networks (DNN) have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models (GMM). However, not many works have addressed the robustness of these systems under noisy conditions. Recently, the machine learning community has proposed new methods to improve the accuracy of DNNs by using techniques such as dropout and maxout. In this paper, we investigate Deep Maxout Networks (DMN) for acoustic modeling in a noisy automatic speech recognition environment. Experiments show that DMNs improve substantially the recognition accuracy over DNNs and other traditional techniques in both clean and noisy conditions on the TIMIT dataset.
On Concept Lattices as Information Channels
(Ceur Workshop Proceedings, 2014-10) Valverde Albacete, Francisco José; Peláez Moreno, Carmen; Peñas, Anselmo
This paper explores the idea that a concept lattice is an information channel between objects and attributes. For this purpose we study the behaviour of incidences in L-formal contexts where L is the range of an information-theoretic entropy function. Examples of such data abound in machine learning and data mining, e.g. confusion matrices of multi-class classifers or document-term matrices. We use a wellmotivated information-theoretic heuristic, the maximization of mutual information, that in our conclusions provides a favour of feature selection providing and information-theory explanation of an established practice in Data Mining, Natural Language Processing and Information Retrieval applications, viz. stop-wording and frequency thresholding. We also introduce a post-clustering class identi cation in the presence of confusions and a favour of term selection for a multi-label document classifcation task.
Spectral Lattices of reducible matrices over completed idempotent semifields
(Ceur Workshop Proceedings, 2013) Valverde Albacete, Francisco José; Peláez Moreno, Carmen
Previous work has shown a relation between L-valued extensions of FCA and the spectra of some matrices related to L-valued contexts. We investigate the spectra of reducible matrices over completed idempotent semifields in the framework of dioids, naturally-ordered semirings, that encompass several of those extensions. Considering special sets of eigenvectors also brings out complete lattices in the picture and we argue that such structure may be more important than standard eigenspace structure for matrices over completed idempotent semifields.
Systems vs. Methods: an Analysis of the Affordances of Formal Concept Analysis for Information Retrieval
(Ceur Workshop Proceedings, 2013) Valverde Albacete, Francisco José; Peláez Moreno, Carmen
We review previous work using Formal Concept Analysis (FCA) to build Information Retrieval (IR) applications seeking a wider adoption of the FCA paradigm in IR. We conclude that although a number of systems have been built with such paradigm (FCA in IR), the most effective contribution would be to help establish IR on rmer grounds (FCA for IR). Since such an approach is only incipient, we contribute to the general discussion by discussing a ordances and challenges of FCA for IR.
ASR Feature Extraction with Morphologically-Filtered Power-Normalized Cochleograms
(International Speech Communication Association, 2014) Calle Silos, Fernando de la; Valverde Albacete, Francisco José; Gallardo Antolín, Ascensión; Peláez Moreno, Carmen
In this paper we present advances in the modeling of the masking behavior of the Human Auditory System to enhance the robustness of the feature extraction stage in Automatic Speech Recognition. The solution adopted is based on a non-linear filtering of a spectro-temporal representation applied simultaneously on both the frequency and time domains, by processing it using mathematical morphology operations as if it were an image. A particularly important component of this architecture is the so called structuring element: biologically-based considerations are addressed in the present contribution to design an element that closely resembles the masking phenomena taking place in the cochlea. The second feature of this contribution is the choice of underlying spectro-temporal representation. The best results were achieved by the representation introduced as part of the Power Normalized Cepstral Coefficients together with a spectral subtraction step. On the Aurora 2 noisy continuous digits task, we report relative error reductions of 18.7% compared to PNCC and 39.5% compared to MFCC.
Combining audio-visual features for viewers' perception classification of Youtube car commercials
(International Speech Communication Association., 2014) Fernández Martínez, Fernando; Hernández García, Alejandro; Gallardo Antolín, Ascensión; Díaz de María, Fernando
In this paper, we present a computational model capable of predicting the viewer perception of Youtube car TV commercials by using a set of low-level audio and visual descriptors. Our research goal relies on the hypothesis that these descriptors could reflect to some extent the objective value of the videos and, in turn, the average viewer's perception. To that end, and as a novel approach to this problem, we automatically annotate our video corpus, grouped into 2 classes corresponding to differ-ent satisfaction levels, by means of a regular k-means algorithm applied to the video metadata related to users feedback. Evaluation results show that simple linear logistic regression models based on the 10 best visual descriptors and on the 10 best audio descriptors individually perform reasonably well, achieving a classification accuracy of roughly 70% and 75%, respectively. Combination of audio and visual descriptors yields better performance, roughly 86% for the top-20 selected from the entire descriptor set, but tipping the balance in favor of the audio ones (i.e. 17 vs 3). Audio content bigger influence in this domain is also evidenced by a side analysis of the video comments.
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
(International Speech Communication Association, 2014) Gallardo Antolín, Ascensión; Montero, Juan Manuel; King, Simon
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
NMF-based temporal feature integration for acoustic event classification
(International Speech Communication Association, 2013) Gallardo Antolín, Ascensión; Ludeña Choez, Jimmy D.
In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients (FC) and Non-Negative Matrix Factorization (NMF). FC aims to capture the dynamic structure in the short-term features by means of the summarization of the periodogram of each short-term feature dimension in several frequency bands using a predefined filter bank. As the commonly used filter bank has been devised for other tasks (such as music genre classification), it can be suboptimal for AEC. In order to overcome this drawback, we propose an unsupervised method based on NMF for learning the filters which collect the most relevant temporal information in the short-time features for AEC. The experiments show that the features obtained with this method achieve significant improvements in the classification performance of a Support Vector Machine (SVM) based AEC system in comparison with the baseline FC features.
Filter optimization and complexity reduction for video coding using graph-based transforms
(IEEE, 2013) Martínez Enríquez, Eduardo; Díaz de María, Fernando; Cid Sueiro, Jesús; Ortega Gómez, Román Antonio
The basis functions of lifting transform on graphs are completely determined by finding a bipartition of the graph and defining the prediction and update filters to be used. In this work we consider the design of prediction filters that minimize the quadratic prediction error and therefore the energy of the detail coefficients, which will give rise to higher energy compaction. Then, to determine the graph bipartition, we propose a distributed maximum-cut algorithm that significantly reduces the computational cost with respect to the centralized version used in our previous work. The proposed techniques show improvements in coding performance and computational cost as compared to our previous work.
Detecting Features from Confusion Matrices using Generalized Formal Concept Analysis
(Springer, 2010) Peláez Moreno, Carmen; Valverde Albacete, Francisco José
We claim that the confusion matrices of multiclass problems can be analyzed by means of a generalization of Formal Concept Analysis to obtain symbolic information about the feature sets of the underlying classification task.We prove our claims by analyzing the confusion matrices of human speech perception experiments and comparing our results to those elicited by experts.
Spectral lattices of (R)over-bar(max),(+)-Formal contexts
(Springer, 2008-02-15) Valverde Albacete, Francisco José; Peláez Moreno, Carmen
In [13] a generalisation of Formal Concept Analysis was introduced with data mining applications in mind, K-Formal Concept Analysis, where incidences take values in certain kinds of semirings, instead of the standard Boolean carrier set. Subsequently, the structural lattice of such generalised contexts was introduced in [15], to provide a limited equivalent to the main theorem of K-Formal Concept Analysis, resting on a crucial parameter, the degree of existence of the object-attribute pairs phi. In this paper we introduce the spectral lattice of a concrete instance of K-Formal Concept Analysis, as a further means to clarify the structural and the K-Concept Lattices and the choice of p. Specifically, we develop techniques to obtain the join- and meet-irreducibles of a (R) over bar (max),+-Concept Lattice independently of phi and try to clarify its relation to the corresponding structural lattice.

Browse

Recent Submissions