Publication:
Combining audio-visual features for viewers' perception classification of Youtube car commercials

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorFernández Martínez, Fernandoes
dc.contributor.authorHernández García, Alejandroes
dc.contributor.authorGallardo Antolín, Ascensiónes
dc.contributor.authorDíaz de María, Fernandoes
dc.date.accessioned2015-07-30T09:52:10Z
dc.date.available2015-07-30T09:52:10Z
dc.date.issued2014
dc.descriptionProccedings of: 2nd International Workshop on Speech, Language and Audio in Multimedia. Penang, Malaysia, 11-12 September 2014.en
dc.description.abstractIn this paper, we present a computational model capable of predicting the viewer perception of Youtube car TV commercials by using a set of low-level audio and visual descriptors. Our research goal relies on the hypothesis that these descriptors could reflect to some extent the objective value of the videos and, in turn, the average viewer's perception. To that end, and as a novel approach to this problem, we automatically annotate our video corpus, grouped into 2 classes corresponding to differ-ent satisfaction levels, by means of a regular k-means algorithm applied to the video metadata related to users feedback. Evaluation results show that simple linear logistic regression models based on the 10 best visual descriptors and on the 10 best audio descriptors individually perform reasonably well, achieving a classification accuracy of roughly 70% and 75%, respectively. Combination of audio and visual descriptors yields better performance, roughly 86% for the top-20 selected from the entire descriptor set, but tipping the balance in favor of the audio ones (i.e. 17 vs 3). Audio content bigger influence in this domain is also evidenced by a side analysis of the video comments.en
dc.description.statusPublicadoen
dc.format.extent5es
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationTien-Ping Tan et al. (eds.) (2014). Proceeding of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM 2014), 11-12 September 2014, Penang, Malaysia. (pp. 14-18). International Speech Communication Association.en
dc.identifier.isbn978-967-394-199-5
dc.identifier.publicationfirstpage14
dc.identifier.publicationlastpage18
dc.identifier.publicationtitleProceeding of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM 2014), 11-12 September 2014, Penang, Malaysia.en
dc.identifier.urihttps://hdl.handle.net/10016/21479
dc.identifier.uxxiCC/0000022421
dc.language.isoengen
dc.publisherInternational Speech Communication Association.en
dc.relation.eventdate2014-09-11
dc.relation.eventnumber2
dc.relation.eventplacePenang, Malasiaen
dc.relation.eventtitleWorkshop on Speech, Language and Audio in Multimedia (SLAM 2014)en
dc.relation.publisherversionhttp://www.isca-speech.org/archive/slam_2014/papers/slm4_014.pdfen
dc.rights© 2014 ISCAen
dc.rights.accessRightsopen accessen
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherSubjective assessmenten
dc.subject.otherVideo aestheticses
dc.subject.otherMusic Information Retrievalen
dc.subject.otherVideo metadataen
dc.titleCombining audio-visual features for viewers' perception classification of Youtube car commercialsen
dc.typeconference proceedings*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
combining_SLAM_2014.pdf
Size:
738.77 KB
Format:
Adobe Portable Document Format