Pathfinder: Constructing Cycle-accurate Taint Graphs for Analyzing Information Flow Traces

TL;DR

Pathfinder is a debugging tool that reduces the manual effort of verification engineers when analyzing Information Flow Tracking (IFT) violation traces. Pathfinder constructs a clock-cycle accurate Taint Graph, highlighting only the relevant information of where, when, and why information was propagated in a given violation trace. Pathfinder is open-source and accepts taint tracking or self-composition (miter) traces obtained from formal verification or simulation.

The paper will be presented at the ICCAD 2025 conference. You can find the source code here.

Motivation

Security vulnerabilities, such as information leakage or integrity violations, in hardware designs can be detected with hardware Information Flow Tracking (IFT). For example, 𝜇CFI and MileSan use the taint tracking logic CellIFT to detect microarchitectural side channels.

IFT methods output Proof of Concept (PoC) violation traces that show how a vulnerability can be triggered. Understanding the root cause of the vulnerability requires manual analysis of the violation trace (typically a waveform), which often consists of thousands of Hardware Description Language (HDL) signals and can span over many clock cycles. Pathfinder alleviates verification engineers and hardware security researchers from this manual burden by generating a graphical Taint Graph (TG) that extracts the path that information took between a given information source and sink signal, or by finding all sources and sinks. It further augments the graph with valuable information, showing where, when, and why information was propagating.

Unification of taint tracking and self-composition

Taint tracking tracks information flow by instrumenting a hardware design with dedicated taint logic that calculates how information propagates through the design. Self-composition (also called miter circuits) tracks information flows by comparing the differences between two copies of the same design. In our paper, we prove that self-composition is at least as precise as taint tracking, meaning every flow existing in a self-composed design also exists as a taint flow if the same design is instrumented with taint logic. This allows us to translate differences between the two copies of a self-composed design into taint, and use the same graphical representation (Taint Graphs) for violation traces obtained from either of these methods.

Temporal Information Flow Graph (TIFG)

We introduce the Temporal Information Flow Graph (TIFG), which is generated by our Yosys pass from an HDL design. As shown in the figure below, it creates a graph with HDL signals as nodes, and directed edges between them that indicate local potential information flows. For example, there exists an information flow from signal ‘key’ to signal ‘SHIFTreg’ in all cases where signal ‘Tj_Trig’ (a hardware Trojan trigger) is true. To enable the cycle-accurate generation of Taint Graphs, the pass notes the clock cycle delay onto the edges. To further ease debugging, it also marks potential control flow edges.

The TIFG can be used on its own or serve as preparation for generating Taint Graphs.

Taint Graph (TG)

Taint Graphs visualize the clock cycle-accurate information flow existing in a concrete simulation or formal verification counterexample trace. A TG can either find all paths originating in an information source signal or leading to an information sink signal. The latter is helpful if multiple potential information sources (for example, microarchitectural buffers) exist, and the goal is to find out which of them leaked information to a destination. Finally, if an information source and sink are given, e.g., when verifying IFT properties, Pathfinder can identify the path taken by the information through the hardware design. For example, it can reveal through which unexpected paths (modules, buffers, …) information propagated before reaching the sink. So far, the TG represents where information was propagating to, and when. When analyzing violation traces, the most pressing question is why it was propagating. To answer this question, we introduce flow control signals.

Flow control signals

Taint propagation depends on the signal values of untainted signals. Untainted signals that directly interfere with the taint path can sometimes block an information flow. In the example below, which visualizes a concrete trace for the TIFG shown earlier, the value of signal ‘TSC.Tj_Trig’ controls the existence of the information flow from signal ‘TSC.key’ to ‘TSC.SHIFTReg’.

We refer to these signals as flow control signals, and the conditions they must satisfy to block information flows are known as flow control conditions. Pathfinder optionally visualizes them in the Taint Graphs.

Case studies

The figure below shows a formal verification violation trace obtained by 𝜇CFI.

In this trace, information propagates from an instruction’s operand stored in signal u_ex.u_csr.decode_op1 (containing data of a RISC-V addi instruction) into the Control and Status Register (CSR) write data (u_ex.u_csr.wr_data). This happens inadvertently, because the higher bits in the flow control signal u_ex.u_csr.decode_ir (which are the immediate value bits of the addi instruction) coincidentally match the machine trap-vector base-address register, and some other (not shown) flow control conditions are true as well. In a correctly designed system, these conditions should be false.

We further used Pathfinder to identify the leaking microarchitectural buffers detected by MileSan and AutoCC.

Take a look at the paper for more case studies, e.g., analyzing a Spectre vulnerability.

Frequently Asked Questions

What kinds of violation traces does Pathfinder accept?

Pathfinder can extract a Taint Graph from signal traces, obtained from formal verification, simulation, or any other method that generates a Value Change Dump (VCD) file.

Which information flow tracking methods can Pathfinder analyze traces from?

Pathfinder supports traces of any taint tracking logic (e.g., CellIFT), of arbitrary precision. It detects taint signals by their (customizable) signal name suffix and associates them with corresponding design signals. Further, it supports self-composition (miter) traces by translating differences between the same signal in the two design copies into taint. This is sound according to our unification proof in the paper.

Can I use the Temporal Information Flow Graph (TIFG) for other purposes?

Definitely, the TIFG is provided as a dot and a CSV file. It can be visually examined or processed for any further purposes.

Can Pathfinder detect vulnerabilities?

No, Pathfinder does not aim to find vulnerabilities. It is a graphical aid for analyzing violations found by IFT methods.

How scalable is Pathfinder?

TIFG and TG construction for large designs (e.g., large CPU cores) takes up to a few minutes each. Still, even longer runtimes would be justified by large reductions in manual signal inspection burden.

Is Pathfinder precise?

Pathfinder extracts IFT traces obtained by some IFT method. Thus, it is as precise as this method.

What are flow control signals?

These signals interfere with the taint path and are thus often able to control the existence of an information flow. Their design signal values provide valuable insights into why information was propagating.

Is this tool already usable in practice?

Yes, we have used it in various projects (for example, 𝜇CFI and MileSan). You can find the source code here.

Acknowledgements

This work was supported by the Swiss State Secretariat for Education, Research and Innovation under contract number MB22.00057 (ERC-StG PROMISE).