KeepKey receive buffer vulnerability (VULN-1814)

The article describes the buffer overflow vulnerability in the USB receive buffer of the KeepKey hardware wallet that was fixed with bootloader v1.1.0 and firmware v5.5.0 in June 2018.

Please note: As with other articles, this is going to be a technical deep-dive into the specific details that are relevant for the issue.
Correspondingly, the article is written for technical readers with some experience in the area of IT security.

Research context
Technical background
The vulnerability
- The fix
- Attack scenario and security implications
Responsible disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Research context

I discovered this issue through manual code auditing during the disclosure of the Trezor One RX buffer overflow. The two vulnerabilities are triggered differently but are located in similar parts of the USB handling code.

Technical background

Since the KeepKey firmware is based on a fork of the Trezor One code, the two project’s low-level USB interfaces are very similar. This is also true for the packet encoding scheme, message fragmentation and general protobuf usage.

As with the Trezor One, the first USB packet of a message sequence contains a header with the relevant protobuf message type as well as a uint32_t size field with the intended overall message size (9 byte header in total, 55 byte payload). For fragmented messages, the state machine in the KeepKey will then switch to a mode where it handles subsequent packets (1 byte header, 63 byte payload) until the required number of bytes is received and the fully reassembled protobuf message is available. At this point the message will be evaluated and decoded.

Unfortunately, the packet handling logic of the KeepKey contains a dangerous overflow.

The vulnerability

The relevant variables are declared as static inside the affected usb_rx_helper function:

static void usb_rx_helper(UsbMessage *msg, MessageMapType type)
{
    static TrezorFrameHeaderFirst last_frame_header = { .id = 0xffff, .len = 0 };
    static uint8_t content_buf[MAX_FRAME_SIZE];
    static uint32_t content_pos = 0, content_size = 0;
    static bool mid_frame = false;

msg_dispatch.c

last_frame_header stores the overall message length field and protobuf message id given in the first USB packet
content_pos describes how much data has already been received
mid_frame is true if the state machine is still expecting more data via fragmented packets
content_buf is the receive buffer that has the capacity to store 12kB of raw message data

During the handling of the initial packet of a packet flow, the total amount of expected data is set to the attacker-controllable uint32_t length value:

last_frame_header.len = __builtin_bswap32(frame->header.len);

msg_dispatch.c

Notably, this value is never checked against the MAX_FRAME_SIZE limit that defines how much data the content_buf can actually hold. Instead, the authors of the code went with a guard condition around the dangerous copy operation itself. The guard condition is designed to skip any memcpy operation once the received data from the fragmentation packets exceeds the size limits of the target buffer:

if(sizeof(content_buf) >= content_pos)
{
    if(content_size == content_pos)
    {
        memcpy(content_buf, contents, content_pos);
    }
    else
    {
        memcpy(content_buf + (content_pos - (msg->len - 1)), contents,
               msg->len - 1);
    }
}

msg_dispatch.c

This construction has the unintended side effect that packet flows with expected data sizes far greater than MAX_FRAME_SIZE can be initiated without problems and the state machine will not error out even after the receive buffer is full (although the copy operations are then skipped).

Although content_pos > sizeof(content_buf) will skip the previously mentioned memcpy section, the mid_frame variable stays true and any packet that fulfills the basic frame->usb_header.hid_type == '?' requirement will still trigger the following code path:

else if(mid_frame)
{
    contents = frame_fragment->contents;
    content_pos += msg->len - 1;
    content_size = msg->len - 1;
}

msg_dispatch.c

As a result, an attacker can

initiate a packet flow with a last_frame_header.len value of 2^32 - 1
send enough fragmentation packets via USB so that the content_pos += msg->len - 1; addition will cause an integer overflow

This happens after sending around ~68.2 million (68174084) fragmentation packets of 63 bytes message payload to the device, which corresponds to a total data transfer of ~4GB.

Immediately after the integer overflow, the guard check around the memcpy operation is no longer effective and attacker-controlled data from one specific packet will be written in front of the content_buf buffer in an out of bounds write. This happens in the following operation since content_pos is smaller than msg->len - 1 after the integer overflow:

memcpy(content_buf + (content_pos - (msg->len - 1)), contents,
       msg->len - 1);

msg_dispatch.c

In this situation the content_pos counter is 68174084⋅63+55 modulo 2^32 == 51 where it is expected to be at least 63, so 12 bytes of memory are written in front of the buffer memory area.

The fix

ShapeShift quickly solved the practical out of bounds write issues through a patch that changes the bounds checking before the memcpy operation. On the 32-bit ARM Cortex-M3 processor of the KeepKey, this works as intended. However, the code relies on size_t having the behavior of the uint32_t type, which is not portable code since this is only true on certain 32-bit systems. For regular 64-bit desktop systems (like x86_64), the buffer overflow still happens.

I brought this up with ShapeShift during my initial review of the patch and this behavior was later fixed with subsequent patches.

Attack scenario and security implications

As described in detail in the attack scenario section of the Trezor One RX buffer overflow article, purely software-based attacks on hardware wallets can be particularly serious due to the potential scalability of the attack via malware.

All of the relevant characteristics of the Trezor One receive buffer overflow apply here as well. For a practical attack, the main difference is the large number of packets. In practical terms: attackers are likely limited to sending several hundred USB HID packets per second, which corresponds to several days of runtime for the attack.

Due to this factor, it’s more likely that this vulnerability will be of use to an attacker with extended physical access (for example after the theft of a KeepKey device) than a “regular” malware-based attack.

It is noteworthy that this vulnerability is exposed in the KeepKey bootloader as well. This increases the attack surface and potentially allows attacks with higher privileges than via the firmware depending on the security configuration of the ARM Cortex memory protection unit (MPU).

Existing countermeasures and mitigating factors

The actual issue is based on an unsigned integer overflow, which is well-defined behavior in C and not seen as an error
To my knowledge, the stack canary protections will not be able to detect the out of bounds write in this memory region
Existing MPU protections will not prevent the out of bounds write, but they might make additional exploitation steps harder

Proof of concept

My initial disclosure information contained a software-based proof of concept that shows AddressSanitizer exceptions in the affected program flow. After the disclosure, ShapeShift has included a functionally similar unit test into their codebase, which should help to understand the attack.

Thoughts on practical exploitation

Although the POC shows that the core vulnerability is reachable, a number of additional factors must be present for a practical exploit to perform actions like obtaining sensitive information (PIN, mnemonic seed) or achieving arbitrary code execution. One of the relevant questions is whether the affected memory region in front of content_buf contains trusted data that is used in other parts of the program during the consecutive program flow. If this is the case, chances are high that this can be leveraged for other memory corruptions or interesting behavior.

Note that the memory layout and functions of individual firmware and bootloader versions may have a large influence on what is exploitable or not, as observed with different firmware variations of the Trezor One.

It is a significant advantage for the attacker that large, arbitrary data structures can be written into the receive buffer itself (which is at a known location in memory) and that there are no constraints on the content of the out of bounds write.

Additionally, it is likely possible to trigger the overflow of the content_pos integer multiple times to expand the size of the out of bounds write (further increasing the necessary USB packets). Whether or not this is possible without unintended crashes depends on the memory contents of the overwritten area in front of content_buf.

Affected versions

The last changes to the msg_dispatch.c date back to 2016, so it is likely that many firmware versions between 2016 - 2018 are generally affected.

Responsible disclosure

I responsibly disclosed the issue to ShapeShift in early June 2018 during the disclosure of the Trezor One receive buffer overflow that affected only SatoshiLabs.

ShapeShift reacted quickly and provided confirmation as well as their planned patch, which was helpful for a fast progress on the issue.

Although the two buffer overflow vulnerabilities were caused by different code, they were similar enough that we decided to coordinate the disclosure of both issues to happen at the same time to prevent any issues from being re-discovered independently and misused before a firmware update was available. Despite some minor date adjustments, this worked out fairly well. In the end, the patched firmware was released by ShapeShift within ~20 days, as described in the timeline.

Relevant product

product	source	fixed version	patch	vendor references
ShapeShift KeepKey	GitHub	fw 5.5.0, bl1.1.0	1, 2	disclosure post, VULN-1814

Detailed timeline

Date	info
2018-06-06	Disclosure to ShapeShift including POC
2018-06-06	ShapeShift confirms the issue
2018-06-07	ShapeShift provides proposed patch for review
2018-06-08	Feedback to ShapeShift on proposed patch (portability issue)
2018-06-08	Planned coordinated release date: June 21th
2018-06-18	Planned coordinated release date: June 27th or earlier
2018-06-21	Planned coordinated release date: June 25th
2018-06-25	Patched bootloader v1.1.0 and firmware v5.5.0 are released
2018-06-25	Public ShapeShift blog post on the security update

Bug bounty

ShapeShift provided a bug bounty for VULN-1814.

Contents

Consulting

Research context

Technical background

The vulnerability

The fix

Attack scenario and security implications

Existing countermeasures and mitigating factors

Proof of concept

Thoughts on practical exploitation

Affected versions

Responsible disclosure

Relevant product

Detailed timeline

Bug bounty