The article describes several vulnerabilities in the KeepKey hardware wallet. Flaws in the supervisor interface can be exploited to bypass important security restrictions on firmware operations.
Using these flaws, malicious firmware code can elevate privileges, permanently make the device inoperable or overwrite the trusted bootloader code to compromise the hardware wallet across reboots or storage wipes.
The new discovery has implications for code execution attacks such as CVE-2021-31616, attacks with some level of physical access as well as the general trust expectations for the wallet system integrity after the installation of unofficial firmware.
I’m a freelance Security Consultant and currently available for new projects.
If you are looking for assistance to secure your projects or organization, contact me.
The following article is highly technical, so here is a slightly less-technical summary.
The KeepKey hardware wallet has some basic protections in place to limit what some parts of its software can do.
This gives trust in the device by making it harder to backdoor permanently via malware, similar to modern smartphone systems.
The new flaws in KeepKey protections that I discovered basically allow a “Jailbreak” of the KeepKey. The main program on the device can break out of the protective cage it is in.
This may be useful for some power users who want more control over their device, but it’s also useful for attackers who temporarily made it onto the device somehow or have physical access and can install custom firmware. They can use these flaws to permanently corrupt the core device software.
A device with malicious core software no longer has to follow the normal rules.
It could generate new mnemonic secrets that an attacker has access to, lie to you about installing updates or attack your computer via USB. It can also erase itself and stop working at any time.
This is clearly a bad situation for trusting the device with funds, and the extra annoying part is that it is difficult to find out if a device is malicious, for example if you buy a new one tomorrow from a less-trustworthy seller. Unfortunately, the hologram stickers won’t help you and wiping the device storage or reinstalling the firmware is not enough.
My main recommendation is to swiftly install the new security patches.
However, if you have previously used firmware v7.0.3 on computers or websites you don’t fully trust, it may be a good time to read up on CVE-2021-31616, check your funds and change your mnemonic seed or device.
Be extra careful about new devices that you buy, as this vulnerability makes it cheaper for attacker to corrupt them.
This article focuses on breaking the security supervisor code implementation of the KeepKey hardware wallet.
To understand the context, first a little primer on what this software component is supposed to be doing.
The ARM Cortex M3 microcontroller series does not have any multi-tasking capability or sophisticated process security concepts that one may expect from larger processors.
Instead, the available hardware-assisted protections consist of a two-level privilege concept for code separation at runtime which is enforced through hardware-assisted privilege level handling and memory protection settings.
The Trezor and KeepKey system designs use this privilege system to limit potential actions of malicious firmware, especially for the flash write operations, with the goal to harden the overall system or at least make security issues observable to the user. This is done through a software root-of-trust concept based on a trusted bootloader, combined with cryptographically signed firmware releases. The bootloader controls firmware updates, checks firmware signatures on device startup, and provides the code for the supervisor component that is active after boot.
Essential configuration steps during startup:
The Memory Protection Unit (MPU) for the lower-privileged operation mode is configured to disallow access to flash controller related memory areas, the flash, the bootloader RAM section and other memory. This limits the internal access of the firmware.
The code drops permissions by switching into a lower-privileged mode before starting the main firmware (for custom firmware) or briefly after the start of main firmware execution (for signed firmware). From this state, hardware protections ensure that the firmware is not able to directly re-enter the privileged mode or change the MPU configuration. This helps to limit the impact of code issues or compromises of the main firmware during normal operations.
On the KeepKey, the supervisor logic mainly focuses on guarding flash operations. All flash writes of the firmware are proxied through the supervisor code via custom interrupts. The svc_handler_main() is tasked with the role of a gatekeeper for potentially dangerous accesses.
However, I’ve discovered that this code is broken in several ways, which completely undermines the sandbox design and allows the firmware to break out of it.
During security research in February 2022, I took a closer look at the supervise.c code and found several flaws. They are clustered into several sections with similar issue patterns.
Insufficient Protection of Flash Sector Erase Functionality (VULN-22004)
The ARM Cortex M onboard flash is divided into a number of differently sized flash sectors. On the STM32F205 chip that the KeepKey uses, they have the id 0 to 11. Sector numbers go up to 23 on other STM32 chip series.
For technical reasons, the supervisor function call parameters of svc_* functions are typically passed as unsigned 32-bit integer variables during the interrupt handling. As a result, despite the limited numerical range that is actually required to describe the target sector, svhandler_flash_erase_sector() accepts and internally uses the full 32-bit uint32_t sector for describing the flash sector ID that should be erased.
This choice of parameter type is problematic.
The defensive code checks on the flash erase are designed to reject the three specific sector numbers of 0, 5 and 6 that correspond to important flash areas for the bootloader and for the microcontroller configuration that are exclusively controlled by the bootloader.
Aside from the three numbers on the blocklist, they allow the main firmware to request erasures of all other sectors.
svhandler_flash_erase_sector() treats the sector number as an unsigned 32 bit number, and incorrectly expects the flash library function to count the same way. Instead, the difference in sector integer type leads to a well-defined but lossy unsigned integer conversion of the sector number down to the uint8_t type before it is handed over to the library function.
This conversion maps multiple larger numbers into the forbidden sector numbers 0, 5 and 6.
An attacker can use this to completely bypass the defensive checks shown previously. For example, a deletion request for the sector 256 passes the checks but then actually asks the library to erase the forbidden sector 0.
Using this flaw, malicious firmware can request the erasure of any flash sector.
Similar Code Problem on Trezor One
During analysis of the erase problem, I found a similar problem in the Trezor One code.
It uses a uint16_t sector variable that theoretically has the same integer conversion problem during the flash_erase_sector(sector, FLASH_CR_PROGRAM_X32) call.
However, the Trezor code uses an allowlist approach for the sector checks, which doesn’t let any problematic values through:
Sectors 2 and 3 don’t have a conversion problem, therefore the Trezor One is not practically affected via this issue.
Insufficient Protection of Flash Block Write Functionality (VULN-22005)
The KeepKey supervisor interface has two functions for flash writes:
svhandler_flash_pgm_word() for writing individual 32-bit words to flash
svhandler_flash_pgm_blk() for writing larger blocks of memory to flash
VULN-22005 concerns the block write functionality.
The code has existing defenses that detect overflows of the address calculation.
It also checks that the beginAddr and beginAddr + length pointers are not in the forbidden memory regions of sectors 0 or 5 & 6.
However, these defenses have are incomplete.
They do not prevent a situation where beginAddr points in front of the forbidden region and beginAddr + length points behind it.
In other words, whole bootloader sections can be overwritten as long as at least one extra byte behind and in front of them is also overwritten.
Using this flaw, malicious firmware can modify protected flash memory in bulk.
Limitations of this Attack
Similarly to svhandler_flash_pgm_word(), the block write has the typical limitations when writing data to physical flash memory, which means it can only change flash memory bits from 1 to 0. If this were the only vulnerability a malicious firmware had access to, modifications would be limited to flipping bits in one direction in the existing flash data contents. However, this attack can be combined with vulnerability VULN-22004 from the previous section, which makes the data limitation go away. By first erasing the targeted flash region and then overwriting it, memory content can be modified arbitrarily.
During practical testing, writing into sector 0 using the svhandler_flash_pgm_blk() does not work. The attack requires at least one write operation in front of the targeted sector. However, the required flash write in front of sector 0 is not seen as valid by the microcontroller and the operation gets stuck. The memory in front of sector 0 is “reserved” according to datasheet. It may be possible to circumvent this problem by using some other undocumented edge case behavior. However, I haven’t explored this edge case further after the discovery of another attack that doesn’t share this limitation.
Writing over the combined sector block 5+6 works as described, see the proof-of-concept.
Unrestricted Memory and Flash Overwrite via Supervisor Functions (VULN-22006)
While looking into additional problems of VULN-22005, I noticed that the arbitrary pointer “write data from the source to the destination” construction of svhandler_flash_pgm_blk() and “write this value to the destination” of svhandler_flash_pgm_word() are very powerful primitives.
The blocklist-based defense has shown to be incomplete, are there other ways to misuse them?
After digging a bit deeper, I realized that one needs to view these functions as privileged memory write gadgets (both functions) or a privileged memory read gadget (via svhandler_flash_pgm_blk()).
This is because the STM32 uses memory-mapped IO to write to the flash and has one continuous memory region.
In other words, the microprocessor generally treats flash content as normal memory and writes to it word-wise with direct assignments, or smaller writes if necessary. Therefore, the libopencm3 flash functions can essentially be used to write or read any other data in the STM32 address space if they’re called with target pointers outside of flash space.
For example, the flash_program_word() essentially prepares the flash write, unlocks the flash and then does a simple write:
Crucially, the MMIO32(address) = data; succeeds even if it’s not in flash related memory space. The svhandler_flash_pgm_blk() works similarly and can also be used to copy secret information out of protected memory.
Since this write operation happens in the context of the privileged bootloader code, it does not falls under the restrictive MPU protections for the unprivileged thread.
This is a huge problem for the supervisor integrity. The supervisor operates on its own little memory stack that’s protected by the MPU from interference by the main firmware:
The memory region protection falls apart if the main firmware can make the privileged thread corrupt its own stack with targeted writes.
This has a significant impact on the bootloader code integrity at runtime. Practical impact may be limited a bit by stack protection and other defenses, but those can likely be circumvented through additional writes.
Additionally, in the global address space of the STM32, important device control registers are memory-mapped to special positions. The unprivileged firmware can access them with through the same flaw, for example the flash controller:
This can have additional impact, although the MPU still protects some parts of the flash, so there is a remaining barrier against direct modifications of sector 0.
How can we break the remaining defenses?
The explicit memory region defense logic of the mentioned flash write functions assumes that there is only one canonical way to address and overwrite the protected flash sections.
However, this assumption is wrong: as the STM32F205 datasheet hints at on page 66, other memory regions such as 0x0000 0000 to 0x000F FFFF can alias into the flash memory range. Here is a helpful visual overview of relevant memory regions.
What does this mean? Depending on the microcontroller system configuration, the lower memory ranges map directly into flash memory, just as the “main” flash memory section starting at 0x08000000 does.
The main difference is that the supervisor flash functions forbid access to the protected sectors in the 0x080.... regions due to the address comparisons, but they completely allow all writes to the 0x000.... region.
Bingo! We’ve just broken the remaining bootloader and trusted boot code integrity defenses.
At this point, I would like to give some credits to Thomas Roth and the rest of the wallet.fail team. They published this memory alias based attack concept as part of the F00DBABE attack in 2018, see the talk section of their classic 35C3 presentation. I half-remembered, half re-discovered this on my own for the KeepKey, but their work is clearly a direct inspiration for the attack idea.
By making the privileged thread write into the aliased flash region, the write protections for sectors 0, 5 and 6 are circumvented without the strict need for special offsets or complete sector overwrites.
This allows more targeted overwrites of individual areas than the previously described VULN-22005 vulnerability.
As a result of this attack, the complete flash memory can be replaced with arbitrary contents, which breaks the core security model of the KeepKey root of trust.
Please read the following section carefully.
By the nature of the KeepKey hardware wallet design, access to SWD and other debug interfaces is permanently disabled on production devices and production firmware.
This is done with the explicit goal to prevent read or write access to the flash.
As a result, there is no intended or straightforward way to recover from problems with the boot-related flash memory.
Testing the issues discussed in this article directly requires erasing or modifying flash content in those essential sectors, so there is a good chance that you’ll permanently turn your test device into a dead device. No, it’s not resting - it’s stone dead! 🦜.
To prevent any devices from passing on due to catastrophic flash writes, it is required to both
Have a custom KeepKey with an unlocked STM32F205 microcontroller that is not in RDP2 state.
Use custom compiled variants of bootloader and firmware which do not lock it.
A custom KeepKey devkit can be built by SMD rework, specifically by replacing the TQFP64 chip with a new chip in factory configuration and programming the custom bootloader and firmware variants.
In this configuration, a hardware debugger like the STLINK-V3 can be connected and used to restore flash contents externally as well as controlling the execution. Note that the MPU and thread privilege mechanisms are still active, the unit is just at RDP0 debug protection level.
The POC section describes testing steps with such a setup.
The following proof-of-concept steps will be deadly to your device unless you have working hardware debugger access. You have been warned.
POC for VULN-22004 and VULN-22005
This is a combined proof-of-concept for two issues.
For VULN-22004, the sector number 261 is used to target sector 261 % 256 = 5.
POC VULN-22006 - Attacking Privileged SRAM Region
For comparison, the following call with firmware-level access would lead to a memory exception due to the MPU:
POC VULN-22006 - Attacking Privileged Flash Region
The discovered KeepKey issues apply to all recent bootloader versions since the problems in supervisor.c have been present for multiple years.
The coordinated disclosure went similarly to the VULN-22003 disclosure that started slightly earlier in February with the same vendor.
I received a lot of good feedback and confirmation in a technical call about two weeks into the disclosure.
Unfortunately, there was a significant gap in the communication in April where I was unable to reach them via multiple communication channels. As a result, I did not have a chance to comment on their patch set before the release or coordinate with them on a publication date.
It’s good to see that they still released a firmware fix and public acknowledgment within the 90-day timeframe.
I have been able to re-establish communications in May.
I’m looking forward to the full vendor advisory, which has not been released at the time of writing.
I’m not aware of other hardware wallets with practical security impacts.
Please note that I’ve included SatoshiLabs in the disclosure communication due to the Trezor One product to ensure that there are no practical vulnerabilities on the Trezor side where some the code originated from after finding a minor code issue. Ultimately, the Trezor One did not have any practical issues and we did not switch to a full multi-vendor format for the coordinated disclosure. This approach was discussed with both vendors.
A Note About Research Affiliation and Work Time
I want to emphasize that the main work for this security research was done on my own time and initiative.
In particular, the original research that led to the discovery of the issue was not sponsored by SatoshiLabs.
With agreement by ShapeShift, I spend some paid hours on extended background research to evaluate the potential security impacts of related issues on the Trezor project for SatoshiLabs.
Confidential disclosure to ShapeShift, with CC to SatoshiLabs
Technical call with ShapeShift, ShapeShift acknowledges the issues
ShapeShift releases patched bootloader version bl_v2.1.4 together with firmware v7.3.2
ShapeShift publishes a short advisory summary via the GitHub tag description