HartBreaker is the first general-purpose hardware fuzzer that tests the communication channels of multi-hart RISC-V CPUs, including shared memory and inter-processor interrupts. To make this possible, HartBreaker addresses a fundamental obstacle that has so far kept hardware fuzzing confined to single-core designs: the execution of a multi-hart program is inherently non-deterministic, so the deterministic golden-model comparison on which existing fuzzers rely does not work. HartBreaker overcomes this through a new technique we call determinism anchors, which allow non-deterministic behavior to occur freely at the instruction level while guaranteeing that the program as a whole still completes in a predictable state on correct hardware. Applied to five well-tested open-source RISC-V CPUs (Rocket, BOOM, Toooba, NaxRiscv, and XiangShan), HartBreaker discovered five previously unknown concurrency bugs.
HartBreaker is fully open-source and available on GitHub, with a user-friendly setup and extensive documentation. It is also permanently archived on Zenodo.
What is multi-hart fuzzing, and why is it hard?
A hart (hardware thread) is an independent execution unit in a RISC-V CPU. Modern processors contain several harts that communicate through shared memory and inter-processor interrupts (IPIs). Implementing these communication channels in hardware correctly is notoriously hard, because the underlying memory consistency model (RVWMO in RISC-V) deliberately allows many subtle reorderings of memory operations, and because interrupts arrive at times that the receiving hart cannot predict.
Hardware fuzzing has emerged as one of the most effective techniques for finding hardware bugs in single-core CPUs. The idea is simple: generate random programs, run them on the design under test and on a reference model in parallel, and flag any divergence as a candidate bug. This works because the program, the reference model, and the correctly implemented hardware all produce the same final state. Multi-hart execution breaks this assumption. The same program, run twice on the same correct CPU, can legitimately produce different intermediate states depending on how memory operations interleave or when an interrupt happens to fire. There is no single “expected” trace to compare against. Extending existing CPU fuzzers to this setting is therefore non-trivial, and until now it had not been done.
Limitations of current multi-hart testing
Two main approaches have been adopted to verify multi-hart CPUs correctness, but both fall short:
- Litmus tests: Litmus tests are small concurrent programs that target specific scenarios in a memory consistency model. They are precise, but they are small: only usually 2-4 memory operations, and their instruction diversity is extremely narrow. Compared to programs generated by state-of-the-art single-hart fuzzers, litmus tests cover only a small fraction of the available instructions: no exceptions, floating-point operations, privilege switches, and only a thin slice of ALU and branch instructions. As a result, litmus tests rarely reach the complex microarchitectural conditions under which real bugs hide.
- Formal methods: Formal approaches exhaustively explore microarchitectural states, but they require a manually constructed abstract model of the hardware for complex CPUs with caches and out-of-order execution, which is difficult to derive.
A fuzzer that combines the instruction diversity of modern hardware fuzzers with a verification strategy that can handle non-determinism has been missing: this is the gap HartBreaker fills.
Sources of Non-Determinism
To handle non-determinism, we first need to understand where it comes from. We identify two root causes:
- Data-flow non-determinism: A load returns a value that depends on the interleaving of concurrent stores from other harts. The immediate control flow is unaffected, but the data in a register becomes unpredictable.
- Control-flow non-determinism: An asynchronous event diverts execution to a trap handler at an instant that the receiving hart cannot anticipate. The program counter, rather than just a data value, becomes unpredictable.
Modeling these effects exposes why fuzzing multi-hart CPUs is hard. Each non-deterministic instruction can lead to several possible successor states, so the number of execution paths through a program grows exponentially with the number of such instructions. A generator that tried to anticipate every legal outcome ahead of time would therefore cannot scale. Worse, an RTL simulator follows only one path per run, so all the paths the generator reasoned about but the simulator did not take become dead code. Avoiding this path-explosion problem, avoiding it is what makes high-throughput multi-hart fuzzing possible.
Determinism Anchors
The central technique behind HartBreaker is the determinism anchor: a short, carefully constructed sequence of instructions that lets non-determinism happen freely within a bounded region of a program, but guarantees that execution returns to a known state before that region ends. This solves the path-explosion problem: each non-deterministic section is observed and then steered back to a known state, so divergence never accumulates and the generator never has to account for more than one path forward.
- Control-flow anchors confine the effect of asynchronous events such as IPIs to a landing zone. The receiving hart enters the landing zone, signals to the sending hart that it is ready, and then loops over instructions that recompute the same architectural state on every iteration. Whenever the interrupt happens to arrive, the hart leaves the landing zone in the same architectural state regardless of when the interrupt fired.
- Data-flow anchors prevent the unpredictable values produced by concurrent memory operations from contaminating control-flow decisions, while preserving their visibility to the verifier. Registers that have been written by a non-deterministic load are flagged and forbidden from being used as source operands until a single deterministic instruction resets them (e.g., the register is used as a destination register). We preserves the syntactic dependency chain between memory operations by allowing the reset to result for a zeroing operation (an XOR with itself, for example). The actual data flow remains non-deterministic and observable through the simulator’s commit log.
- Synchronization anchors bound the size of the region a verifier must analyze at once. They periodically synchronize all harts and fence all preceding memory operations, ensuring that any reorderings observed by the consistency solver are confined to a single section of the program.

Verifying Non-Deterministic Executions
Control-flow anchors make bug detection straightforward: a violation of the architectural invariance propagates into the control flow and shows up as a timeout. Data-flow anchors block that mechanism by design, so we need a different way to detect memory-consistency violations. Our insight is that even though we cannot statically predict what a HartBreaker test program will produce, we can translate any observed execution into an equivalent litmus test, and feed that litmus test to an existing memory consistency solver such as Dartagnan. We extract three pieces of information for each test case:
- The store values, which are deterministic and can be obtained from an instruction set simulator.
- The syntactic dependencies between memory operations, which are statically defined at generation time and can be extracted from the generated assembly.
- The load return values that were actually observed, which we collect from the simulator’s commit log.
These three ingredients are enough to construct a litmus test that captures the exact ordering relations of the original execution. The solver then checks whether this outcome is permitted under RVWMO. We ensure the verification complexity remains bounded using synchronization anchors.
Discovered Bugs
We evaluated HartBreaker on five open-source RISC-V CPUs: Rocket, BOOM, Toooba, NaxRiscv, and XiangShan. It discovered five previously unknown concurrency bugs:
- Illegal load-load reordering in BOOM v4 (B1)
- Illegal load-load reordering in NaxRiscv (N1)
- CLINT access-size restriction in NaxRiscv (N2)
- IPI evaluation timing bug in Toooba (T1)
- Out-of-order MIP read in XiangShan (X1)
Several of these bugs require highly specific microarchitectural conditions to surface. B1 and N1 each depend on a precise interplay of stalled address computation, replay scheduling, and coherency probe timing across multiple harts. These conditions are exactly what existing tooling, either litmus tests with their narrow instruction repertoire, or single-hart fuzzers with their deterministic verification, cannot exercise.
Paper and code
You can read more details about HartBreaker in the ISCA’26 paper that we wrote about it. HartBreaker is open-source and available on GitHub. We hope it proves useful for anyone designing, testing, or studying multi-core RISC-V CPUs.
Frequently Asked Questions
Why not just generate random instructions one after the other and check that no violation was triggered?
Why not just generate random instructions and check that no violation was triggered? Because there’s no simple “violation” to check for. The number of legal outcomes grows exponentially with the non-deterministic instructions, and deciding whether any single observed trace is even legal is NP-complete.
Why translate executions into litmus tests rather than build a custom solver? Translating to litmus tests lets us reuse mature solvers like Dartagnan, which scales well. This works because validating that one observed trace is legal under RVWMO is far easier than enumerating all the reachable states.
Can a test case that triggered a bug be reduced to a minimal example? Not yet. The bugs depend on precise microarchitectural timing, so editing the instruction stream usually makes them vanish.
Can I contribute to HARTBREAKER? Yes please, open a pull request on GitHub and we will take a look.
Acknowledgements
This work was supported by the Swiss State Secretariat for Education, Research and Innovation under contract number MB22.00057 (ERC-StG PROMISE). We also want to thank Hernán Ponce de León for his valuable insights on Dartagnan during the development of HartBreaker.
