HybriDIFT: Scalable Memory-Aware Dynamic Information Flow Tracking for Hardware

Dynamic information flow tracking (DIFT) is gaining traction, but instrumenting complex designs, such as advanced CPUs, still faces performance and scalability challenges.

We make two key observations:

Key observation 1: Instrumenting deep memories with state-of-the-art DIFT mechanisms is expensive.
Key observation 2: Precise implicit flow DIFT in memories seems useless.

State-of-the-art DIFT mechanisms operate at relatively low abstraction levels (GLIFT: gate level, RTLIFT: low-level Verilog constructs, CellIFT: macrocell level). Consequently, they are unaware of memories, which are considered a module-level construct, and thus strive to instrument the implicit data flows within memories.

To address this issue, we propose HybriDIFT. This new hybrid mechanism specifically instruments memory modules more efficiently by precisely tracking explicit flows and imprecisely tracking implicit flows. For all other modules, HybriDIFT relies on lower-level mechanisms, typically CellIFT. Under Verilator, HybriDIFT accelerates build time by 1.06× to 3.5× and simulation by 2.6× to 5.1×.

HybriDIFT is open-source and the paper is accepted at ICCAD 2024.

Frequently Asked Questions

How does HybriDIFT instrument a given memory module?

HybriDIFT uses a cost-effective approximation of the implicit flow-tracking logic.

How does HybriDIFT detect memory modules?

HybriDIFT can rely either on automatic detection or on manual annotations. The prototype for automation that we provide might require improvements when applied to new designs. In most use cases, it is simple to manually annotate the memories to specify what memory signal corresponds to what role in the protocol. We provide examples of both options in the open-source repository.

Does HybriDIFT have false negatives?

In principle, HybriDIFT cannot have false negatives if implemented as specified in the paper.

What is the cost of the precise memory instrumentation?

For CellIFT (and GLIFT, which is far more expensive), the cost depends on the depth of the memories. For HybriDIFT, the evolution of the cost with the memory depth is indistinguishable from non-instrumented designs. Structurally, regarding explicit flows, all IFT mechanisms, including HybriDIFT, will replicate the memory module.

HybriDIFT approximates implicit information flows in memories. What use cases does this approximation exclude?

The approximation made by HybriDIFT will mostly prevent making precise decisions that would structurally require accessing the whole memory or accessing it twice in the same cycle.

For example, assume that some memory module contains the words 0b0101, 0b0111, 0b0101 and 0b0100 at addresses 0, 1, 2, 3 respectively, and is being read at address zero, and assume that exactly the two least significant bits of the address are tainted. Precisely calculating the output taint would require reading all the memory words accessible modulo taint, i.e., these 4 words, and then comparing them together to see which bits are not identical in all these words.

As of today, we do not know of any real use case that would require this level of precision.

How do I use HybriDIFT with my design?

There are multiple ways, the fundamental idea is exposed in the paper. The experimental hybridift.py script in the open-source repository can be used with automatic memory detection or with manual memory annotations to perform the module-level instrumentation. We suggest looking at Dockerfile in the open-source repository to see some examples.

Acknowledgments

This work was partly supported by a Microsoft Swiss JRC grant and by the Swiss State Secretariat for Education, Research, and Innovation under contract number MB22.00057 (ERC-StG PROMISE).