DES - Working Papers. Statistics and Econometrics. WS

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 645
  • Publication
    A stochastic volatility model for volatility asymmetry and propagation
    (Universidad Carlos III de Madrid, 2024-05-07) Marín Díazaraque, Juan Miguel; Romero, Eva; Lopes Moreira Da Veiga, María Helena; Agencia Estatal de Investigación (España)
    In this paper, we propose a novel asymmetric stochastic volatility model that uses a heterogeneous autoregressive process to capture the persistence and decay of volatility asymmetry over time, which is different from traditional approaches. We analyze the properties of the model in terms of volatility asymmetry and propagation using a recently introduced concept in the field and find that the new model can generate both volatility asymmetry and propagation effects. We also introduce Data Cloning for parameter estimation, which provides robustness and computational efficiency compared to conventional techniques. Our empirical analysis shows that the new proposal outperforms a recent competitor in terms of in-sample fit and out-of-sample volatility prediction across different financial return series, making it a more effective tool for capturing the dynamics of volatility asymmetry in financial markets.
  • Publication
    A Bayesian semi-parametric approach to stochastic frontier models with inefficiency heterogeneity
    (2024-04-23) Deng, Yaguo; Wiper, Michael Peter; Lopes Moreira Da Veiga, María Helena; Universidad Carlos III de Madrid. Departamento de Estadística
    In this chapter, we present a semiparametric Bayesian approach for stochastic frontier (SF) models that incorporates exogenous covariates into the inefficiency component by using a Dirichlet process model for conditional distributions. We highlight the advantages of our method by contrasting it with traditional SF models and parametric Bayesian SF models using two different applications in the agricultural sector. In the first application, the accounting data of 2,500 dairy farms from five countries are analyzed. In the second case study, data from forty-three smallholder rice producers in the Tarlac region of the Philippines from 1990 to 1997 are analyzed. Our empirical results suggest that the semi-parametric Bayesian stochastic frontier model outperforms its counterparts in predictive efficiency, highlighting its robustness and utility in different agricultural contexts.
  • Publication
    Clustering and forecasting of day-ahead electricity supply curves using a market-based distance
    (2024) Li, Zehang; Alonso Fernández, Andrés Modesto; Elías, Antonio; Morales, Juan M.; Universidad Carlos III de Madrid. Departamento de Estadística; European Commission; Ministerio de Ciencia e Innovación (España); European Commission
    Gathering knowledge of supply curves in electricity markets is critical to both energy producers and regulators. Indeed, power producers strategically plan their generation of electricity considering various scenarios to maximize profit, leveraging the characteristics of these curves. For their part, regulators need to forecast the supply curves to monitor the market’s performance and identify market distortions. However, the prevailing approaches in the technical literature for analyzing, clustering, and predicting these curves are based on structural assumptions that electricity supply curves do not satisfy in practice, namely, boundedness and smoothness. Furthermore, any attempt to satisfactorily cluster the supply curves observed in a market must take into account the market’s specific features. Against this background, this article introduces a hierarchical clustering method based on a novel weighted-distance that is specially tailored to non bounded and non-smooth supply curves and embeds information on the price distribution of offers, thus overcoming the drawbacks of conventional clustering techniques. Once the clusters have been obtained, a supervised classification procedure is used to characterize them as a function of relevant market variables. Additionally, the proposed distance is used in a learning procedure by which explanatory information is exploited to forecast the supply curves in a day-ahead electricity market. This procedure combines the idea of nearest neighbors with a machine-learning method. The prediction performance of our proposal is extensively evaluated and compared against two nearest-neighbor benchmarks and existing competing methods. To this end, supply curves from the markets of Spain, Pennsylvania-New Jersey-Maryland (PJM), and West Australia are considered.
  • Publication
    A Quantile Neural Network Framework for Twostage Stochastic Optimization
    (Universidad Carlos III de Madrid, 2024-03-19) Alcántara Mata, Antonio; Ruiz Mora, Carlos; Tsay, Calvin; Universidad Carlos III de Madrid. Departamento de Estadística
    Two-stage stochastic programming is a popular framework for optimization under uncertainty, where decision variables are split between first-stage decisions, and second-stage (or recourse) decisions, with the latter being adjusted after uncertainty is realized. These problems are often formulated using Sample Average Approximation (SAA), where uncertainty is modeled as a finite set of scenarios, resulting in a large “monolithic” problem, i.e., where the model is repeated for each scenario. The resulting models can be challenging to solve, and several problem-specific decomposition approaches have been proposed. An alternative approach is to approximate the expected second-stage objective value using a surrogate model, which can then be embedded in the first-stage problem to produce good heuristic solutions. In this work, we propose to instead model the distribution of the second-stage objective, specifically using a quantile neural network. Embedding this distributional approximation enables capturing uncertainty and is not limited to expected-value optimization, e.g., the proposed approach enables optimization of the Conditional Value at Risk (CVaR). We discuss optimization formulations for embedding the quantile neural network and demonstrate the effectiveness of the proposed framework using several computational case studies including a set of mixed-integer optimization problems.
  • Publication
    Observability analysis for structural system identification based on state estimation
    (2023-11-23) Alahmad, Ahmad; Mínguez Solana, Roberto; Porras, Rocío; Lozano Galant, José Antonio; Turmo, José; Universidad Carlos III de Madrid. Departamento de Estadística
    The concept of observability analysis (OA) has garnered substantial attention in the field of Structural System Identification. Its primary aim is to identify a specific set of structural characteristics, such as Young's modulus, area, inertia, and possibly their combinations (e.g., flexural or axial stiffness). These characteristics can be uniquely determined when provided with a suitable subset of deflections, forces, and/or moments at the nodes of the structure. This problem is particularly intricate within the realm of Structural System Identification, mainly due to the presence of nonlinear unknown variables, such as the product of vertical deflection and flexural stiffness, in accordance with modern methodologies. Consequently, the mechanical and geometrical properties of the structure are intricately linked with node deflections and/or rotations. The paper at hand serves a dual purpose: firstly, it introduces the concept of State Estimation (SE), specially tailored for the identification of structural systems; and secondly, it presents a novel OA method grounded in SE principles, designed to overcome the aforementioned challenges. Computational experiments shed light on the algorithm's potential for practical Structural System Identification applications, demonstrating significant advantages over the existing state-of-the-art methods found in the literature. It is noteworthy that these advantages could potentially be further amplified by addressing the SE problem, which constitutes a subject for future research. Solving this problem would help address the additional challenge of developing efficient techniques that can accommodate redundancy and uncertainty when estimating the current state of the structure.
  • Publication
    Deep Learning and Bayesian Calibration Approach to Hourly Passenger Occupancy Prediction in Beijing Metro: A Study Exploiting Cellular Data and Metro Conditions
    (2023-11-07) Sun, He; Cabras, Stefano; Universidad Carlos III de Madrid. Departamento de Estadística
    In In burgeoning urban landscapes, the proliferation of the populace necessitates swift and accurate urban transit solutions to cater to the citizens' commuting requirements. A pivotal aspect of fostering optimized traffic management and ensuring resilient responses to unanticipated passenger surges is precisely forecasting hourly occupancy levels within urban subway systems. This study embarks on delineating a two-tiered model designed to address this imperative adeptly: 1. Preliminary Phase - Employing a Feed Forward Neural Network (FFNN): In the initial phase, a Feed Forward Neural Network (FFNN) is employed to gauge the occupancy levels across various subway stations. The FFNN, a class of artificial neural networks, is well-suited for this task because it can learn from the data and make predictions or decisions without being explicitly programmed to perform the task. Through a series of interconnected nodes, known as neurons, arranged in layers, the FFNN processes the input data, adjusts its weights based on the error of its predictions, and optimizes the network for accurate forecasting. For the random process of occupation levels in time and space, this phase encapsulates the so-called process filtration, wherein the underlying patterns and dynamics of subway occupancy are captured and represented in a structured format, ready for subsequent analysis. The estimates garnered from this phase are pivotal and form the foundation for the subsequent modelling stage. 2. Subsequent Phase - Implementing a Bayesian Proportional-Odds Model with Hourly Random Effects: With the estimates from the FFNN at disposal, the study transitions to the subsequent phase wherein a Bayesian Proportional-Odds Model is utilized. This model is particularly adept for scenarios where the response variable is ordinal, as in the case of occupancy levels (Low, Medium, High). The Bayesian framework, underpinned by the principles of probability, facilitates the incorporation of prior probabilities on model parameters and updates this knowledge with observed data to make informed predictions. The unique feature of this model is the incorporation of a random effect for hours, which acknowledges the inherent variability across different hours of the day. This is paramount in urban transit systems where passenger influx varies significantly with the hour. The synergy of these two models facilitates calibrated estimations of occupancy levels, both conditionally (relative to the sample) and unconditionally (on a detached test set). This dual-phase methodology furnishes analysts with a robust and reliable insight into the quality of predictions propounded by this model. This, in turn, avails a data-driven foundation for making informed decisions in real-time traffic management, emergency response planning, and overall operational optimization of urban subway systems. The model expounded in this study is presently under scrutiny for potential deployment by the Beijing Metro Group Ltd. This initiative reflects a practical stride towards embracing sophisticated analytical models to ameliorate urban transit management, thereby contributing to the broader objective of fostering sustainable and efficient urban living environments amidst the surging urban populace.
  • Publication
    Economic activity and C02 emissions in Spain
    (2023-07-24) Juan, Aranzazu de; Poncela, Maria Pilar; Ruiz Ortega, Esther; Universidad Carlos III de Madrid. Departamento de Estadística
    Carbon dioxide (CO2) emissions, largely by-products of energy consumption, account for the largest share of greenhouse gases (GHG). The addition of GHG to the atmosphere disturbs the earth's radiative balance, leading to an increase in the earth's surface temperature and to related effects on climate, sea level rise, ocean acidification and world agriculture, among other effects. Forecasting and designing policies to curb CO2 emissions globally is gaining interest. In this paper, we look at the relationship between CO2 emissions and economic activity using Spanish data from 1964 to 2020. We consider a structural (contemporaneous) equation between selected indicators of economic activity and CO2 emissions, that we further augment with dynamic common factors extracted from a large macroeconomic database. We show that the way the common factors are extracted is crucial to exploit their information content. In particular, when using standard methods to extract the common factors from large data sets, once private consumption and maritime transportation are considered, the information contained in the macroeconomic data set has only negligible explanatory power for emissions. However, if we extract the common factors oriented towards CO2 emissions, they add valuable information not contained in the individual economic indicators.
  • Publication
    Effects of extreme temperature on the European equity market
    (2023-07-24) Bellocca, Gian Pietro Enzo; Alessi, Lucia; Poncela Blanco, Maria Pilar; Ruiz Ortega, Esther; Universidad Carlos III de Madrid. Departamento de Estadística
    The increasing frequency and severity of extreme temperatures are potential threats to financial stability. Indeed, physical risk related to these extreme phenomena can affect the whole financial system and, in particular, the equity market. In this study,we analyze the impact of extreme temperature exposure on firms' performance in Europe over the XXI century. We show that extreme temperatures can affect firms' profitability depending on their industry and the quarter of the year. Our results are of interest for both investors operating in the equity market and for regulators in charge of securing financial stability.
  • Publication
    Modelling intervals of minimum/maximum temperatures in the Iberian Peninsula
    (2023-07-24) González-Rivera, Gloria; Rodríguez Caballero, Carlos Vladimir; Ruiz Ortega, Esther; Universidad Carlos III de Madrid. Departamento de Estadística
    In this paper, we propose to model intervals of minimum/maximum temperatures observed at a given location by fitting unobserved component models to bivariate systems of center and log-range temperatures. In doing so, the center and logrange temperature are decomposed into potentially stochastic trends, seasonal and transitory components. We contribute to the debate on whether the trend and seasonal components are better represented by stochastic or deterministic components. The methodology is implemented to intervals of minimum/maximum temperatures observed monthly in four locations in the Iberian Peninsula, namely, Barcelona, Coruña, Madrid and Seville. We show that, at each location, the center temperature can be represented by a smooth integrated random walk with time-varying slope while the log-range seems to be better represented by a stochastic level. We also show that center and log-range temperature are unrelated. The methodology is then extended to model simultaneously minimum/maximum temperatures observed at several locations. We fit a multi-level dynamic factor model to extract potential commonalities among center (log-range) temperature while also allowing for heterogeneity in different areas. The model is fitted to intervals of minimum/maximum temperatures observed at a large number of locations in the Iberian Peninsula.
  • Publication
    Penalized function-on-function partial leastsquares regression
    (2023-07-05) Hernandez Roig, Harold Antonio; Aguilera Morillo, María del Carmen; Aguilera, Ana M.; Preda, Cristian; Universidad Carlos III de Madrid. Departamento de Estadística
    This paper deals with the "function-on-function'" or "fully functional" linear regression problem. We address the problem by proposing a novel penalized Function-on-Function Partial Least-Squares (pFFPLS) approach that imposes smoothness on the PLS weights. Our proposal introduces an appropriate finite-dimensional functional space with an associated set of bases on which to represent the data and controls smoothness with a roughness penalty operator. Penalizing the PLS weights imposes smoothness on the resulting coefficient function, improving its interpretability. In a simulation study, we demonstrate the advantages of pFFPLS compared to non-penalized FFPLS. Our comparisons indicate a higher accuracy of pFFPLS when predicting the response and estimating the true coefficient function from which the data were generated. We also illustrate the advantages of our proposal with two case studies involving two well-known datasets from the functional data analysis literature. In the first one, we predict log precipitation curves from the yearly temperature profiles recorded in 35 weather stations in Canada. In the second case study, we predict the hip angle profiles during a gait cycle of children from their corresponding knee angle profiles.
  • Publication
    Tall big data time series of high frequency: stylized facts and econometric modelling
    (2023-07-04) Espasa, Antoni; Carlomagno Real, Guillermo; Universidad Carlos III de Madrid. Departamento de Estadística
    The paper starts commenting on the hard tasks of data treatment -mainly, cleaning, classification, and aggregation- that are required at the beginning of any analysis with big data. Subsequently, it focuses on non-financial big data time series of high frequency that for many problems are aggregated at daily, hourly, or higher frequency levels of several minutes. Then, the paper discusses possible stylized facts present in these data. In this respect, it studies relevant seasonality: daily, weekly, monthly, and annually, and analyses how, for the data in question, these cycles could be affected by weather variables and by factors due to the annual composition of the calendar. Consequently, the paper investigates the possible main characteristics of the mentioned cycles and the types of responses to the exogenous weather and calendar factors that data could show. The shorter cycles could change along the annual cycle and interact with the exogenous variables. The modelling strategy could require regime-switching, dynamic, non-linear structures, and interactions between the factors considered. Then the paper analyses the construction of explanatory variables that could be useful for taking into account all the above peculiarities. We propose the use of the automated procedure, Autometrics, to discover -in words of Prof Hendry- a parsimonious model not dominated by any other, which is able to explain all the characteristics of the data. The model can be used for structural analysis, forecasting, and, when it is the case, to build real-time quantitative macroeconomic leading indicators. Finally, the paper includes an application to the daily series of jobless claims in Chile.
  • Publication
    Adaptive posterior distributions for covariance matrix learning in Bayesian inversion problems for multioutput signals
    (2023-05-30) Curbelo Benitez, Ernesto Angel; Martino, Luca; Llorente Fernandez, Fernando; Delgado Gómez, David; Universidad Carlos III de Madrid. Departamento de Estadística
    In this work, we propose an adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the inference procedure is divided in two parts and the variables of interest are split in two blocks. We assume that the observations are generated from a complex multivariate non-linear function perturbed by correlated Gaussian noise. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the first part of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the non-linear model and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. Two numerical examples are presented that show the benefits of the proposed approach.
  • Publication
    Modelling physical activity profiles in COPD patients: a new approach to variable-domain functional regression models
    (2023-05-05) Hernandez Amaro, Pavel; Durbán Reguera, María Luz; Aguilera Morillo, María del Carmen; Esteban Gonzalez, Cristobal; Arostegui, Inma; Universidad Carlos III de Madrid. Departamento de Estadística
    Motivated by the increasingly common technology for collecting data, like cellphones, smartwatches, etc, functional data analysis has been intensively studied in recent decades, and along with it, functional regression models. However, the majority of functional data methods in general and functional regression models, in particular, are based on the fact that the observed datapresent the same domain. When the data have variable domain it needs to be aligned or registered in order to be fitted with the usual modeling techniques adding computational burden. To avoid this, a model that contemplates the variable domain features of the data is needed, but this type of models are scarce and its estimation method presents some limitations. In this article, we propose a new scalar-on-function regression model for variable domain functional data that eludes the need for alignment and a new estimation methodology that we extend to other variable domain regression models.
  • Publication
    Data cloning for a threshold asymmetric stochastic volatility model
    (2023-02-14) Marín Díazaraque, Juan Miguel; Lopes Moreira Da Veiga, María Helena; Universidad Carlos III de Madrid. Departamento de Estadística
    In this paper, we propose a new asymmetric stochastic volatility model whose asymmetry parameter can change depending on the intensity of the shock and is modeled as a threshold function whose threshold depends on past returns. We study the model in terms of leverage and propagation using a new concept that has recently appeared in the literature. We find that the new model can generate more leverage and propagation than a well-known asymmetric volatility model. We also propose to estimate the parameters of the model by cloning data. We compare the estimates in finite samples of data cloning and a Bayesian approach and find that data cloning is often more accurate. Data cloning is a general technique for computing maximum likelihood estimators and their asymptotic variances using a Markov chain Monte Carlo (MCMC) method. The empirical application shows that the new model often improves the fit compared to the benchmark model. Finally, the new proposal together with data cloning estimation often leads to more accurate 1-day and 10-day volatility forecasts, especially for return series with high volatility.
  • Publication
    Risk Management of Energy Communities with Hydrogen Production and Storage Technologies
    (2023-01-16) Feng, Wenxiu; Ruiz Mora, Carlos; Universidad Carlos III de Madrid. Departamento de Estadística
    The distributed integration of renewable energy sources plays a central role in the decarbonization of economies. In this regard, energy communities arise as a promising entity to coordinate groups of proactive consumers (prosumers) and incentivize the investment on clean technologies. However, the uncertain nature of renewable energy generation, residential loads, and trading tariffs pose important challenges, both at the operational and economic levels. We study how this management can be directly undertaken by an arbitrageur that, making use of an adequate price tariff system, serves as an intermediary with the central electricity market to coordinate different types of prosumers under risk aversion. In particular, we consider a sequential futures and spot market where the aggregated shortage or excess of energy within the community can be traded. We aim to study the impact of the integration of hydrogen production and storage systems, together with a parallel hydrogen market, on the community operation. These interactions are modeled as a game theoretical setting in the form of a stochastic two-stage bilevel optimization problem, which is latter reformulated without approximation as a single-level mixed-integer linear problem (MILP). An extensive set of numerical experiments based on real data is performed to study the operation of the energy community under different technical and economical conditions. Results indicate that the optimal involvement in futures and spot markets is highly conditioned by the community's risk aversion and self-sufficiency levels. Moreover, the external hydrogen market has a direct effect on the community's internal price-tariff system, and depending on the market conditions, may worsen the utility of individual prosumers.
  • Publication
    Ignoring cross-correlated idiosyncratic components when extracting factors in dynamic factor models
    (2022-12-12) Fresoli, Diego Eduardo; Poncela Blanco, Maria Pilar; Ruiz Ortega, Esther; Universidad Carlos III de Madrid. Departamento de Estadística; Ministerio de Ciencia y Tecnología (España)
    In economics, Principal Components, its generalized version that takes into account heteroscedasticity, and Kalman filter and smoothing procedures are among the most popular procedures for factor extraction in the context of Dynamic Factor Models. This paper analyses the consequences on point and interval factor estimation of using these procedures when the idiosyncratic components are wrongly assumed to be cross-sectionally uncorrelated. We show that not taking into account the presence of cross-sectional dependence increases the uncertainty of point estimates of the factors. Furthermore, the Mean Square Errors computed using the usual expressions based on asymptotic approximations, are underestimated and may lead to prediction intervals with extremely low coverages.
  • Publication
    Measuring efficiency of Peruvian universities: a stochastic frontier analysis
    (2023-01-10) Orosco Gavilán, Juan Carlos; Lopes Moreira Da Veiga, María Helena; Wiper, Michael Peter; Universidad Carlos III de Madrid. Departamento de Estadística
    In comparison with other regions such as Europe or the USA, there have been relatively few studies of efficiency in the higher education sector in South America. The main objective of this paper is to examine the teaching efficiency of Peruvian, public universities over the period 2011-2018, usingstochastic frontier analysis. Our results suggest that efficiency depends on both the operating time of the university and on the scientific production. We also show that the majority of universities studied maintain their efficiency levels over time, whereas, most of the young universities started off as very inefficient, but have improved their efficiency over time.
  • Publication
    A Neural Network-Based Distributional Constraint Learning Methodology for Mixed-Integer Stochastic Optimization
    (2022-11-21) Alcántara Mata, Antonio; Ruiz Mora, Carlos; Universidad Carlos III de Madrid. Departamento de Estadística
    The use of machine learning methods helps to improve decision making in different fields. In particular, the idea of bridging predictions (machine learning models) and prescriptions (optimization problems) is gaining attention within the scientific community. One of the main ideas to address this trade-off is the so-called Constraint Learning (CL) methodology, where the structures of the machine learning model can be treated as a set of constraints to be embedded within the optimization problem, establishing therelationship between a direct decision variable x and a response variable y. However, most CL approaches have focused on making point predictions for a certain variable, not taking into account the statistical and external uncertainty faced in the modeling process. In this paper, we extend the CL methodology to deal with uncertainty in the response variable y. The novel Distributional Constraint Learning (DCL) methodology makes use of a piece-wise linearizable neural network-based model to estimate the parametersof the conditional distribution of y (dependent on decisions x and contextualinformation), which can be embedded within mixed-integer optimization problems. In particular, we formulate a stochastic optimization problem by sampling random values from the estimated distribution by using a linear set of constraints. In this sense, DCL combines both the high predictive performance of the neural network method and the possibility of generating scenarios to account for uncertainty within a tractable optimization model. The behavior of the proposed methodology is tested in a real-worldproblem in the context of electricity systems, where a Virtual Power Plant seeks to optimize its operation, subject to different forms of uncertainty, and with price-responsive consumers.
  • Publication
    Contagion in sequential financial markets: an experimental analysis
    (2022-10-04) Peeters, Ronald; Lopes Moreira Da Veiga, María Helena; Vorstaz, Marc; Universidad Carlos III de Madrid. Departamento de Estadística; Ministerio de Ciencia e Innovación (España)
    Within an experimental financial market, we study how information about the true dividend of an asset, which is available to some traders, is absorbed in the asset’s price when all traders have access to prices of another different asset. We consider two treatments: in one, the dividends of the two assets are independent; in the other, the dividend of the own asset depends positively on the dividend of the other asset. Since there is no aggregate uncertainty in the own market, observed prices in the other market should not affect own prices according to the rational expectations equilibrium. We find that own prices reasonably converge in both treatments towards the rational expectations equilibrium if the dividend of the own asset is high. In contrast, if the dividend of the own asset is low, we find that own prices are substantially higher (and therefore further away from rational expectations equilibrium) when asset prices are correlated. The prior information equilibrium predicts this treatment effect. Hence, a correlated asset structure can potentially obstruct the information transmission from the informed to the uninformed traders.
  • Publication
    Prescriptive selection of machine learning hyperparameters with applications in power markets: retailer's optimal trading
    (2022-10-03) Corredera, Alberto; Ruiz Mora, Carlos; Universidad Carlos III de Madrid. Departamento de Estadística
    We present a data-driven framework for optimal scenario selection in stochastic optimization with applications in power markets. The proposed methodology relies in the existence of auxiliary information and the use of machine learning techniques to narrow the set of possible realizations (scenarios) of the variables of interest. In particular, we implement a novel validation algorithm that allows optimizing each machine learning hyperparameter to further improve the prescriptive power of the resulting set of scenarios. Supervised machine learning techniques are examined, including kNN and decision trees, and the validation process is adapted to work with time-dependent datasets. Moreover, we extend the proposed methodology to work with unsupervised techniques with promising results. We test the proposed methodology in a realistic power market application: optimal trading strategy in forward and spot markets for an electricity retailer under uncertain spot prices. Results indicate that the retailer can greatly benefit from the proposed data-driven methodology and improve its market performance. Moreover, we perform an extensive set of numerical simulations to analyze under which conditions the best machine learning hyperparameters, in terms of prescriptive performance, differ from those that provide the best predictive accuracy.