First publicly disclosed in 2012, Rowhammer is still an unsolved security problem inside DRAM today, putting our devices at severe risk of being hacked while running up-to-date software. In fact, the Rowhammer vulnerability is worsening: the Rowhammer threshold (i.e., the number of DRAM activations required to trigger a bit flip) and the blast diameter (i.e., the amount of affected rows) are ever-increasing. This makes current mitigations unsuitable for future devices.
With REGA, we propose the first fully in-DRAM mitigation capable of protecting devices independently from their blast diameter. REGA scales gracefully, allowing it to protect devices with very low Rowhammer thresholds of 261 activations with negligible 3.7% performance overhead. This is the first time that such a low threshold has ever been considered. It is ~36 times lower than the minimum threshold reported.
We achieved this by parallelizing internal DRAM operations. We collaborated with the DRAM vendor Zentel Japan, which allowed us to design a high-fidelity circuit-level DRAM model, called REM, that we used to validate REGA.
Fig. 1: Rowhammer threshold for devices produced in different years (2016-2035). The orange lines show the minimum supported Rowhammer threshold for state-of-the-art mitigations and for REGA.
What is Rowhammer?
Rowhammer is a DRAM vulnerability caused by the interference between different rows that store data. This vulnerability enables attackers to change values in memory they are not supposed to have access to, ultimately enabling privilege escalation and escaping from sandboxes. To stop Rowhammer, DRAM vendors implemented mitigations known as Target Row Refresh (TRR). However, our recent work showed that these mitigations are vulnerable. Moreover, the number of rows that can generate interference has increased, leading to new forms of attacks. Given these trends, current mitigations can no longer protect future devices without significant costs.
To address these problems, we designed REGA, which given our estimation, will provide security for approximately the next 10 years.
How does REGA work?
REGA parallelizes normal READ/WRITE operations with row refreshes. To allow memory operations while performing row refreshes, REGA adds small buffering sense amplifiers to the DRAM. This allows the implementation of simple and effective Rowhammer mitigations, such as REGAm, that refreshes rows in parallel with workloads.
Fig. 2: REGA’s concept. Stage 1 (logical ACT) performs the row’s activation by latching the logical values from the requested row. In Stage 2 (parallel REF), a parallel refresh to the shadow row is performed. Lastly, in Stage 3 (logical REF), the originally requested row is recharged to the up-to-date logical value.
How does REGAm work?
REGAm is a simple but powerful Rowhammer mitigation. Every T activations to the same subarray, V different rows are refreshed. The rows are chosen in a round-robin fashion inside a subarray. This way, all the possible victims will be protected. This design allows REGAm to be independent of the blast diameter.
Fig. 3: REGAm parallel refreshes. The bitline is precharged ❶, the shadow row is activated ❷, and the original sense amplifier refreshes it ❸. The same can be repeated for more shadow rows ❹. Lastly, the charge of the requested row is restored ❺.
What are the overheads of REGAm?
We evaluated our Rowhammer mitigation REGAm. It has a constant 2.1% area overhead and can protect DDR5 devices with Rowhammer thresholds as small as 261, 517, and 1029 with 23.9%, 11.5%, and 4.7% more power, respectively, and 3.7%, 0.8%, and 0% performance overhead, respectively. If required, REGAm can protect even more vulnerable devices.
We compared the power overhead to ProTRR, assuming a fixed die size.
What is new compared to previous mitigations?
The new in-DRAM mitigation REGAm that we designed based on REGA is the first mitigation that can protect against current and future Rowhammer attacks. It protects devices independently from their blast diameter and covers devices with very low Rowhammer thresholds (as small as 133).
REGAm does not use states or counters, removing the heavily limiting condition of knowing all the possible interactions between different rows a-priori.
Following, we provide answers to the most frequently asked questions about our work.
- What is the difference between REGA and ProTRR?
ProTRR is an optimal in-DRAM mitigation based on TRR or RFM. However, TRR and RFM are not sustainable in the long term. The periodic nature of additional refreshes require ProTRR to keep track of many victims, especially for smaller thresholds and more victims per aggressor. Instead, REGA is stateless and has a constant area overhead, independently from the threshold and blast diameter.
- Is REGAm compatible with the JEDEC standard?
REGAm is fully compatible with the main DDR JEDEC standard. To use V greater than 2 in DDR5 (i.e., when the threshold is smaller than 517), the SPD standard needs to be slightly adjusted.
- Why is it important to have a fully in-DRAM Rowhammer mitigation?
Rowhammer is a DRAM problem, and as such, it should be resolved in DRAM. Forcing CPUs to fix Rowhammer will result in fragmented mitigations in both CPU and DRAM, making it very difficult to reason about security.
- Is REGA vendor-specific?
No, although our model is based on a specific vendor, we also evaluated cases of worse transistor sizes, still obtaining correct operation. Moreover, REM uses lower voltages than usual, making adhering to timings harder than usual (not easier).
- Does REGA work for ECC devices?
Yes, it does. The timing parameter of interest for REGA is tRAS, which is the same for ECC devices.
- Is the REGA model (REM) open-source?
Yes! You can find it on GitHub.
We thank our anonymous reviewers for their valuable feedback. This work was supported by the Swiss National Science Foundation under NCCR Automation, grant agreement 51NF40 180545, the Swiss State Secretariat for Education, Research and Innovation under contract number MB22.00057 (ERC-StG PROMISE), and a Microsoft Swiss JRC grant.
- 15.05.2023: We have found a configuration mistake in our gem5 simulations, leading to effectively running all our simulations without the L3 cache enabled, thus showing the worst-case results for REGA. We have added new results with L3 enabled in the appendix. We updated the paper on our website. The previous version can be found here.