RT Dissertation/Thesis
T1 Training deep retrieval models with noisy datasets
A1 Martínez Cortés, Tomás
AB In this thesis we study loss functions that allow to train Convolutional NeuralNetworks (CNNs) under noisy datasets for the particular task of Content-Based Image Retrieval (CBIR). In particular, we propose two novel losses to fitmodels that generate global image representations. First, a Soft-Matching (SM)loss, exploiting both image content and meta data, is used to specialized generalCNNs to particular cities or regions using weakly annotated datasets. Second,a Bag Exponential (BE) loss inspired by the Multiple Instance Learning (MIL)framework is employed to train CNNs for CBIR under noisy datasets.The first part of the thesis introduces a novel training framework that, relyingon image content and meta data, learns location-adapted deep models thatprovide fine-tuned image descriptors for specific visual contents. Our networks,which start from a baseline model originally learned for a different task, are specializedusing a custom pairwise loss function, our proposed SM loss, that usesweak labels based on image content and meta data.The experimental results show that the proposed location-adapted CNNsachieve an improvement of up to a 55% over the baseline networks on a landmarkdiscovery task. This implies that the models successfully learn the visualclues and peculiarities of the region for which they are trained, and generateimage descriptors that are better location-adapted. In addition, for those landmarksthat are not present on the training set or even other cities, our proposedmodels perform at least as well as the baseline network, which indicates a goodresilience against overfitting.The second part of the thesis introduces the BE Loss function to train CNNsfor image retrieval borrowing inspiration from the MIL framework. The losscombines the use of an exponential function acting as a soft margin, and a MILbasedmechanism working with bags of positive and negative pairs of images.The method allows to train deep retrieval networks under noisy datasets, byweighing the influence of the different samples at loss level, which increases theperformance of the generated global descriptors. The rationale behind the improvementis that we are handling noise in an end-to-end manner and, therefore,avoiding its negative influence as well as the unintentional biases due to fixedpre-processing cleaning procedures. In addition, our method is general enoughto suit other scenarios requiring different weights for the training instances (e.g.boosting the influence of hard positives during training). The proposed bag exponentialfunction can bee seen as a back door to guide the learning processaccording to a certain objective in a end-to-end manner, allowing the model toapproach such an objective smoothly and progressively.Our results show that our loss allows CNN-based retrieval systems to betrained with noisy training sets and achieve state-of-the-art performance. Furthermore,we have found that it is better to use training sets that are highlycorrelated with the final task, even if they are noisy, than training with a clean set that is only weakly related with the topic at hand. From our point of view,this result represents a big leap in the applicability of retrieval systems and helpto reduce the effort needed to set-up new CBIR applications: e.g. by allowinga fast automatic generation of noisy training datasets and then using our bagexponential loss to deal with noise. Moreover, we also consider that this resultopens a new line of research for CNN-based image retrieval: let the models decidenot only on the best features to solve the task but also on the most relevantsamples to do it.
YR 2021
FD 2021-02
LK https://hdl.handle.net/10016/32942
UL https://hdl.handle.net/10016/32942
LA eng
DS e-Archivo
RD 2 may. 2024