Uncertainty-Aware AI: Conformal Prediction versus Reinforcement Learning for Optimal Trade Execution

Authors

  • Asadullah Irshad Centre of Excellence for Data Science, Artificial Intelligence and Modelling (DAIM), University of Hull, UK
  • Shaon Biswas Centre of Excellence for Data Science, Artificial Intelligence and Modelling (DAIM), University of Hull, UK

DOI:

https://doi.org/10.19139/soic-2310-5070-4159

Keywords:

Optimal Execution, VWAP, Conformal Prediction, Uncertainty Quantification, Reinforcement Learning, Markov Decision Process, Market Microstructure

Abstract

Optimal trade execution is the problem of working a large parent order through a trading session at the lowest possible cost relative to a benchmark. It is sequential, and it is taken under uncertainty, so in principle it looks like an ideal candidate for learning-based control. In practice, a lot of the gains people report do not survive a second look by someone else. We go back to the volume-weighted average price (VWAP) execution problem and study it inside a controlled simulator that we can reproduce in full, one with stochastic volatility and AR(1) return momentum. Everything sits inside a single Markov decision process. Within that, we compare the usual schedules (TWAP, Almgren–Chriss, VWAP-tracking) with a proximal-policy-optimisation (PPO) agent and with forecast-driven policies that rest on a normalised split-conformal predictor of next-interval returns. The predictor is well calibrated. It reaches 90.2% empirical coverage at a 90% nominal level. Acting on its point forecast lowers mean slippage below VWAP-tracking, but it adds cost variance in the process. Gating the same bets by the half-width of the conformal interval behaves very differently: it produces a tunable, monotone reduction in cost variability, from 19.1 bps down to 10.0, at almost no cost in the mean, and it sweeps out an explicit cost–risk frontier set by one threshold. The PPO agent, trained over three seeds, is high-variance. It does not beat the simple schedules in any reliable way. So, at least in this setting, a distribution-free conformal gate is a more reproducible and more interpretable path to uncertainty-aware execution than an off-the-shelf reinforcementlearning agent. We release the code and the simulator. On real intraday data (thirty US large-caps, 5-minute bars) the predictor’s coverage holds up almost exactly, at 90.7% empirical against the 90% level, and the variance reduction from gating is still there. The forecast edge, though, is thin, because real high-frequency returns are barely predictable to begin with.

Downloads

Published

2026-06-27

How to Cite

Irshad, A., & Biswas, S. (2026). Uncertainty-Aware AI: Conformal Prediction versus Reinforcement Learning for Optimal Trade Execution. Statistics, Optimization & Information Computing, 16(2), 1334–1349. https://doi.org/10.19139/soic-2310-5070-4159

Issue

Section

Research Articles

Categories