LLM-augmented data systems enable semantic querying over structured and unstructured data, but executing queries with LLM-powered operators introduces a fundamental runtime-accuracy trade-off. In this paper, we present Stretto, a new execution engine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator implementations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime-accuracy trade-offs. Experiments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.
The stretto execution engine for LLM-augmented data systems
Submitted to ArXiV, 4 February 2026
Type:
Report
Date:
2026-02-04
Department:
Data Science
Eurecom Ref:
8622
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Submitted to ArXiV, 4 February 2026 and is available at :
See also:
PERMALINK : https://www.eurecom.fr/publication/8622