Microarchitectural Attacks on the Stack Engine

Today’s computing performance is a result of an amalgamation of optimization of frequent edge-cases. While it is possible to gain performance by adding more raw resources to the hardware, very careful software design is required to actually use these resources to improve latency or throughput. In some cases, a CPU rewrites the software in the frontend, the part of the CPU that loads instructions and then sends it to the backend for processing.

Our paper uncovers and abuses a stateful frontend optimization for stack operations as a side channel to leak the structure of a prior execution.

Performance Problem and Solution

x86 uses a traditional stack with push to insert a single element and pop to remove the last inserted value. This closely resembles machine models from theoretical computer science. However, these instructions cause a problem in out-of-order CPUs: when two or more push occur consecutively, the later instructions need to wait on the prior instructions to complete, breaking instruction-level parallelism and leaving large parts of the resources of the CPU core unused.

To fix this problem, the frontend of practically every x86 CPU since the mid-2000s rewrites these stack operations to keep the stack pointer immutable and apply displacements (offsets) relative to the stack pointer to load and store operations. The current offset is tracked in the frontend and updated as instructions are passed to the backend for execution.

The original idea for this was formulated by Bekerman et al. over 2 decades ago. The optimization itself is relatively safe, but newer features do not take this optimization into account for their threat model.

Attacking the Stack Engine

Because the tracked offset is limited in size and not all operations can be performed in the frontend, the tracked offset must be written to the backend in certain conditions, resulting in one (or sometimes two) additional operations to be sent to the backend. Using the right surgical techniques, these additional operations can be measured to deduce information about the stack usage of previously-executed code.
This breaks the assumption that memory isolation and register-zeroing as used with memory protection keys would suffice. We show that it is possible to extract information across protection-key-isolated domains in the case of a JSON library which uses a recursive descent parsing strategy. Because the depth reached when parsing depends on the input, we can observe the structure of otherwise inaccessible inputs through the stack engine.

We show that this observed structure suffices to identify 5 out of 120 distinct patients in the FHIR dataset with a high success rate. Also, any other code that uses data-dependent function calls and ideally recursion is affected, e.g. protobuf, as we show in the paper.

Mitigations

Disabling the stack engine altogether solves the leakage, but results in an overhead of roughly 5% in execution time and additional operations. Context switching resets the stack engine and is therefore also safe. For in-process isolation techniques (like Intel MPK), writing a value to the RSP directly or running xor rsp, 0 resets the stack engine and clears the internal states. As a general rule, data-invariant control flow and other constant-time programming techniques also provide a good defense against this and other attacks.

Resources

A paper about our reverse engineering of the stack engine and the developed attacks will be presented at MICRO 2025. We are committed to reproducible research and provide a ready-to-run container image and the source code for our experiments on zenodo.

FAQ

Is my machine affected?

We’ve built a PoC tuned for AMD Zen 5 machines, where we are confident that all models have this abusable hardware feature. However, recent versions of the AGESA (contained in the UEFI BIOS images) disable the add/sub support on Zen 5 by default on boot. This makes the attack infeasible for virtually all real-world applications. We uncover that Intel uses a similar mechanism on their performance cores since Intel Alder Lake (12th gen). Because of the performance impact, we believe that this or a similar feature will continue to exist in future x86 CPUs.
Is my security affected by this?

Probably not (yet). As in-process isolation techniques see increased adoption due to their performance benefits, attacks on microarchitectural features like the stack engine might become more frequent. Until then, shared resources (e.g. in the cloud) are affected by a variety of other, more easily exploitable security issues and side channels.