Uncertainty-Aware AI: Conformal Prediction versus Reinforcement Learning for Optimal Trade Execution
DOI:
https://doi.org/10.19139/soic-2310-5070-4159Keywords:
Optimal Execution, VWAP, Conformal Prediction, Uncertainty Quantification, Reinforcement Learning, Markov Decision Process, Market MicrostructureAbstract
Optimal trade execution is the problem of working a large parent order through a trading session at the lowest possible cost relative to a benchmark. It is sequential, and it is taken under uncertainty, so in principle it looks like an ideal candidate for learning-based control. In practice, a lot of the gains people report do not survive a second look by someone else. We go back to the volume-weighted average price (VWAP) execution problem and study it inside a controlled simulator that we can reproduce in full, one with stochastic volatility and AR(1) return momentum. Everything sits inside a single Markov decision process. Within that, we compare the usual schedules (TWAP, Almgren–Chriss, VWAP-tracking) with a proximal-policy-optimisation (PPO) agent and with forecast-driven policies that rest on a normalised split-conformal predictor of next-interval returns. The predictor is well calibrated. It reaches 90.2% empirical coverage at a 90% nominal level. Acting on its point forecast lowers mean slippage below VWAP-tracking, but it adds cost variance in the process. Gating the same bets by the half-width of the conformal interval behaves very differently: it produces a tunable, monotone reduction in cost variability, from 19.1 bps down to 10.0, at almost no cost in the mean, and it sweeps out an explicit cost–risk frontier set by one threshold. The PPO agent, trained over three seeds, is high-variance. It does not beat the simple schedules in any reliable way. So, at least in this setting, a distribution-free conformal gate is a more reproducible and more interpretable path to uncertainty-aware execution than an off-the-shelf reinforcementlearning agent. We release the code and the simulator. On real intraday data (thirty US large-caps, 5-minute bars) the predictor’s coverage holds up almost exactly, at 90.7% empirical against the 90% level, and the variance reduction from gating is still there. The forecast edge, though, is thin, because real high-frequency returns are barely predictable to begin with.Downloads
Published
2026-06-27
How to Cite
Irshad, A., & Biswas, S. (2026). Uncertainty-Aware AI: Conformal Prediction versus Reinforcement Learning for Optimal Trade Execution. Statistics, Optimization & Information Computing, 16(2), 1334–1349. https://doi.org/10.19139/soic-2310-5070-4159
License
Copyright (c) 2026 Asadullah Irshad, Shaon Biswas

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).