Publication:
Subsampling and aggregation: a solution to the scalability problem in distance based prediction for mixed-type data

dc.affiliation.dptoUC3M. Departamento de Estadísticaes
dc.contributor.authorBaíllo, Amparo
dc.contributor.authorGrané Chávez, Aurea
dc.contributor.funderMinisterio de Ciencia y Tecnología (España)es
dc.date.accessioned2022-06-20T10:35:54Z
dc.date.available2022-06-20T10:35:54Z
dc.date.issued2021-09-13
dc.description.abstractThe distance-based linear model (DB-LM) extends the classical linear regression to the framework of mixed-type predictors or when the only available information is a distance matrix between regressors (as it sometimes happens with big data). The main drawback of these DB methods is their computational cost, particularly due to the eigendecomposition of the Gram matrix. In this context, ensemble regression techniques provide a useful alternative to fitting the model to the whole sample. This work analyzes the performance of three subsampling and aggregation techniques in DB regression on two specific large, real datasets. We also analyze, via simulations, the performance of bagging and DB logistic regression in the classification problem with mixed-type features and large sample sizes.en
dc.description.sponsorshipA. Baíllo is supported by the Spanish MCyT grant PID2019-109387GB-I00.en
dc.format.extent17es
dc.identifier.bibliographicCitationBaíllo, A., & Grané, A. (2021). Subsampling and Aggregation: A Solution to the Scalability Problem in Distance-Based Prediction for Mixed-Type Data. Mathematics, 9(18), 2247.es
dc.identifier.doihttps://doi.org/10.3390/math9182247
dc.identifier.issn2227-7390
dc.identifier.publicationfirstpage1
dc.identifier.publicationlastpage17
dc.identifier.publicationtitleMathematicsen
dc.identifier.publicationvolume9
dc.identifier.urihttps://hdl.handle.net/10016/35186
dc.identifier.uxxiAR/0000028437
dc.language.isoeng
dc.publisherMDPIen
dc.relation.projectIDGobierno de España. PID2019-109387GB-I00es
dc.rights© 2021 by the authorsen
dc.rightsAtribución 3.0 España*
dc.rights.accessRightsopen accessen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subject.ecienciaMatemáticases
dc.subject.otherBig dataen
dc.subject.otherClassificationen
dc.subject.otherDissimilaritiesen
dc.subject.otherEnsembleen
dc.subject.otherGeneralized linear modelen
dc.subject.otherGoweren
dc.subject.otherS metricen
dc.subject.otherMachine learningen
dc.titleSubsampling and aggregation: a solution to the scalability problem in distance based prediction for mixed-type dataen
dc.typeresearch article*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Subsampling_M_2021.pdf
Size:
586.83 KB
Format:
Adobe Portable Document Format