Use of the p-values as a size-dependent function to address practical differences when analyzing large datasets

e-Archivo Repository

Show simple item record

dc.contributor.author Gómez de Mariscal, Estíbaliz
dc.contributor.author Guerrero Lozano, Vanesa
dc.contributor.author Sneider, Alexandra
dc.contributor.author Jayatilaka, Hasini
dc.contributor.author Phillip, Jude M.
dc.contributor.author Wirtz, Denis
dc.contributor.author Muñoz Barrutia, María Arrate
dc.date.accessioned 2022-02-01T10:17:51Z
dc.date.available 2022-02-01T10:17:51Z
dc.date.issued 2021-10-22
dc.identifier.bibliographicCitation Gómez-de-Mariscal, E., Guerrero, V., Sneider, A., Jayatilaka, H., Phillip, J. M., Wirtz, D. & Muñoz-Barrutia, A. (2021). Use of the p-values as a size-dependent function to address practical differences when analyzing large datasets. Scientific Reports, 11: 20942.
dc.identifier.issn 2045-2322
dc.identifier.uri http://hdl.handle.net/10016/34003
dc.description.abstract Biomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended none hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold, commonly set to 0.05, to assess the evidence against the hypothesis of having non-significant differences among groups, or the none hypothesis. Because the estimated p-value tends to decrease when the sample size is increased, applying this methodology to datasets with large sample sizes results in the rejection of the none hypothesis, making it not meaningful in this specific situation. We propose a new approach to detect differences based on the dependence of the p-value on the sample size. We introduce new descriptive parameters that overcome the effect of the size in the p-value interpretation in the framework of datasets with large sample sizes, reducing the uncertainty in the decision about the existence of biological differences between the compared experiments. The methodology enables the graphical and quantitative characterization of the differences between the compared experiments guiding the researchers in the decision process. An in-depth study of the methodology is carried out on simulated and experimental data. Code availability at https://github.com/BIIG-UC3M/pMoSS.
dc.description.sponsorship This work was supported by Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación, under Grants TEC2015-73064-EXP, TEC2016-78052, and PID2019-109820RB-I00, MCIN/AEI/10.13039/501100011033/, co-fnanced by European Regional Development Fund (ERDF), "A way of making Europe" (AMB); BBVA Foundation under a 2017 Leonardo Grant for Researchers and Cultural Creators (AMB); the US National Institutes of Health under Grants UO1AG060903 (DW, JMP), P30AG021334 (JMP) and U54CA143868 (DW); the National Science Foundation Graduate Research Fellowship under Grant No. 1746891 (AS, DW). We also want to acknowledge the support of NVIDIA Corporation with the donation of the Titan X (Pascal) GPU used for this research. We thank Claire Jordan Brooks, Prof. Joachim Goedhart (University of Amsterdam), Laura Nicolás-Sáenz, Pedro Macías-Gordaliza and Prof. Naomi Altman (Pennsylvania State University) for their constructive comments and fruitful discussions.
dc.format.extent 13
dc.language.iso eng
dc.publisher Nature Research
dc.rights © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
dc.rights Atribución 3.0 España
dc.rights.uri http://creativecommons.org/licenses/by/3.0/es/
dc.title Use of the p-values as a size-dependent function to address practical differences when analyzing large datasets
dc.type article
dc.subject.eciencia Biología y Biomedicina
dc.subject.eciencia Estadística
dc.identifier.doi https://doi.org/10.1038/s41598-021-00199-5
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. TEC2016-78052-R
dc.relation.projectID Gobierno de España. TEC2015-73064-EXP
dc.relation.projectID Gobierno de España. PID2019-109820RB-I00
dc.type.version publishedVersion
dc.identifier.publicationfirstpage 1
dc.identifier.publicationissue 20942
dc.identifier.publicationlastpage 13
dc.identifier.publicationtitle Scientific Reports
dc.identifier.publicationvolume 11
dc.identifier.uxxi AR/0000028911
dc.contributor.funder Ministerio de Ciencia, Innovación y Universidades (España)
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record