invd blog

Modding a Flashforge Guider II 3D Printer - Part 2

2024-09-19T15:20:00+02:00

This article series documents a journey of modifying and improving a 3D printer.

In this article, I’ll outline some problems that I’ve encountered with the Guider II construction during modification. As it is often the case, once you look closely enough at a complex engineering design, some bugs and problems become visible, which I’ve collected here.

Hardware Flaws and Bad Design Decisions

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Hardware Flaws and Bad Design Decisions

Overdriven Diagnostics LEDs

There are several different voltage rails on the printer mainboard PCB, for example 3.3V, 5V, 24V, and multiple 24V switched for different heating outputs. The PCB designers included nice testpoints and green LEDs connected to these rails, which is really helpful to see which outputs are active and debug the electrical operation of a partially dismantled printer (the LEDs are normally not visible from outside the enclosure).

Unfortunately, on my Guider2 CoreBoard v0.5 mainboard, the PCB designers seem to have made a serious copy-paste or calculation mistake with the SMD LEDs and current-limiting SMD series resistors. As far as I can see, they’ve used plain 1000 Ohm resistors (marked 102) for those LEDs regardless of the voltage of the connected rail. This is fine for 3.3V and 5V, but leads to massive current and wasted heat on the 24V-connected LEDs.

LEDs on 3.3V, 5V, 24V rails

LED on 24V rail

With about 1.8V of voltage drop over the green LED, the 1000 Ohm series resistor results in around 22mA of current through the LED. The LED is very bright and probably beyond the design specification at the expected high ambient temperatures, reducing the absolute lifespan of the LED. However, the main problem is in the series resistor. It dissipates the rest of the electrical energy not taken by the LED. This is straightforward to calculate: 22V * 0.022A ~= 0.5W, almost half a Watt of heat on that SMD resistor part! For context, the resistors are roughly in the 0603 standard SMD format (imperial), which means that they’re designed to take about 0.1W max under worst conditions and are driven five times as hard in reality. Given this situation, I think it’s only thanks to the two mainboard fans, which are blowing in the direction of the hot LEDs, that they’re not quickly failing left and right. I haven’t found accounts of failing Guider II mainboards due to this. Still, in the worst-case situation that one of those resistors shorts out due to overheating and the LED gets the full voltage, like 24V straight from the power supply, it’s going to be a bad situation.

Personally, I’ve left the LEDs circuits as they are while I was using the original mainboard, and mainly ensured they get plenty of cooling. Potential options for fixing this problem involves desoldering the LEDs or resistors to disable their function, or replacing the resistors with higher resistor values to properly fix their operation. (If you’re planning to do this, ensure you have a decent amount of SMD soldering experience and suitable equipment. The mainboard also has a mains AC connection and high current 24V, which is why you should be very careful about any soldering modifications)

Hot Optical Sensors

Unlike many other 3D printers, the X-, Y-, and Z- endstops use optical light barrier sensors combined with a special interruption bracket on the moving axis to detect the axis end positions. I like this fundamental concept, since it avoids some of the wear and crash-risk of mechanical microswitches.

X-SensorSwitchBoard V0.4 optical endstop board" srcset=" /assets/images/resized/600/guider2_opto_endstop_original1.jpg 600w, /assets/images/resized/1150/guider2_opto_endstop_original1.jpg 1150w, /assets/images/guider2_opto_endstop_original1.jpg 1280w" sizes="(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="" />

Flashforge X-SensorSwitchBoard V0.4 optical endstop board

However, the custom sensor board that Flashforge uses seem poorly designed. Instead of using a photo interrupter (opto interrupter) with a built-in transistor, they’re using one that only has a diode, requiring a custom transistor setup to turn the sensor output into a clean binary signal that is fed to the mainboard.

It seems they fumbled this design, building a circuit that wastes a comparably large amount of power (about 100mA at 3.3V, so ~1/3W) once the printer is on if the light barrier isn’t interrupted. This is a multiple of what the LED in the opto-sensor itself consumes for the basic operation, and makes the sensor boards heat up for no good reason. It’s not as bad of an electrical design problem as the diagnostics LEDs, but becomes fairly relevant when driving multiple of these boards with a custom new mainboard, since most printer boards usually don’t expect to supply a lot of 3.3V. The resulting endstop signal wasn’t even particularly clean in terms of voltage levels, increasing the chance of improper detections, which was another thing that I was unhappy about for my later build.

As part of my major modifications, I’ve now replaced these sensors with other opto-interrupter boards which have a phototransistor, cleaner output, are 5V compatible, and avoid the power draw problem completely.

Custom modified optical endstop board

Quick warning: the optical sensor on the SensorSwitchBoard modules has a reduced outer sensor tower width and -profile compared to those of other common phototransistor sensors, making those boards and sensors not fit in the same narrow screw mount. This can be fixed carefully with a dremel, but requires special attention and care against shorted or broken sensors.

Suboptimal Fan Sizes and Cooling Design

The Guider II has several small 24V fans, namely:

30mm x 10mm axial fan for the extruder hotend cooling
50mm x 15mm blower fan for the extruder parts cooling
two 40mm x 10mm axial fans for the motherboard cooling
40mm x 20mm axial fan in the power supply

I can partially understand the choices for the hotend, where space is very limited. A slightly less noisy 40mm x 10 instead of a 30mm x 10 extruder fan would have been nice, but their custom molded plastic case limited upgrade options here after the initial design.

The built-in fan in the power supply is a given for that off-the-shelf product and can’t be changed, but notably it is internally temperature-controlled, so it’s not on full power all the time and only really loud when the bed heating is active, which draws the most power and heats the power supply.

40mm x 40mm x 10mm Motherboard cooling fans, note the obstacles

What I was most annoyed about are the two unregulated motherboard cooling fans. I think it was a bad design choice to put the heat-sensitive electronics, including power supply and stepper drivers, effectively inside the heated build chamber instead of outside of it. Given this printer’s dimensions, they could have put at least some electronics in the unused open space under the printer, as it’s done in other large printer designs like the Voron series. For some reason, they wanted to keep all electronics behind the side cover. However, why not fully section off the electronics compartment on the right hand side and supply it with cool outside air from a slow-running 80mm or 120mm fan, instead of leaving these small fans to re-circulate hot air? Looking at design drawings in the Guider IIs manual suggests they at least added a small 40mm (?) fan for the electronics compartment in the redesign, which would confirm that they’ve also identified their original design as problematic. I’m not familiar with any details on this, though, and still expect this to be far from optimal or quiet.

The positive efficiency benefits of treating the mainboard and power supply as additional chamber heaters are, in my opinion, definitely outweighed by the problems of having hotter-than-necessary electronics and louder-than-necessary fans. If the designers were concerned with dust and other contamination, they could probably have found a better solution with some kind of air intake filter.

My temporary modification here was to mostly run the 3D printer with one enclosure side removed, connecting a variable step-down voltage regulator to the 24V DC fan header, and running a 12V 3-Pin 120mm computer fan to cool the mainboard components. This leaves the build enclosure open and is not ideal, but was acceptable for the type of printing I did.

Noisy Buzzer

After some fan modifications reduced the fan noise of the printer when it was idle (not printing), I’ve noticed another unexpected source of noise: the black piezo buzzer on the mainboard was constantly driven by some kind of electrical interference.

Noisy mainboard buzzer

My low-tech solution for this was to apply a short strip of tape over the speaker opening, which dampens the noise enough to no longer be annoying, while still letting through most of the intended buzzer sounds.

Limiting Hotend Flex Cable

The Guider II uses a special flex cable as the only connection between the hotend carriage and the mainboard. This represents a proprietary and custom interface that only a few other Flashforge printers share, in part with incompatible pinouts (and therefore incompatible mainboards / modules).

Flex cable and mainboard connector

While I have to admit the flex cables were a novel and somewhat clever way to avoid the usual set of dragged regular cables during the original design, it’s also arguably somewhat of an Achilles heel when it breaks and totally in the way of printer modifications. Any more substantial modifications such as mainboard replacements don’t just have to deal with the various other electronic parts and bringing in a compatible new display, but also have to re-do the extruder setup completely if there is no way to interface with this special flex cable connection.

I’ll go into more detail on how I’ve (mostly) solved this in a later post.

It should be noted that the flex cable strongly contributes to atrocious electrical conditions for data and sensor lines since they’re all pushed near the high-power PWM DC and high-noise motor lines. Additionally, the limited pin count forces many voltage rails to be shared among components, which causes a lack of spare control lines for extruder components. As a result, cleaner fan control via separate PWM control lines, fan tacho signals and other improvements such as fixed accelerometers aren’t possible.

Hotend Thermocouple Usage

As I found out during later parts of my modifications, the screwed (M3 screw) K-type thermocouple temperature sensor in the stock heater block is a really problematic design choice when combined with the flex cable electronic issues and a conversion chip located on the mainboard, not on the extruder PCB.

Thermocouples generate a few millivolts worth of signal in a special metal connection and require careful compensation of unwanted temperature-dependent effects in other cable junctions. In the Guider II construction, the flex cable lines carrying this signal get absolutely hammered with injected noise (voltage spikes) from the parallel-running fluctuating currents to the extruder stepper motor once it is activem which is inherent in how steppers are driven. This leads to severe problems for accurate and reliable temperature readings from the thermocouple.

TI ADS1118 Analog-to-Digital converter" srcset=" /assets/images/resized/600/guider2_thermocouple_conversion_circuit1.jpg 600w, /assets/images/resized/1150/guider2_thermocouple_conversion_circuit1.jpg 1150w, /assets/images/guider2_thermocouple_conversion_circuit1.jpg 1280w" sizes="(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="" />

Thermocouple conversion circuit with TI ADS1118 Analog-to-Digital converter

For custom printer modifications, the reliability problems persist even with specially designed and Klipper-supported readout chips such as the MAX31855K on modules with passive LC filter components. On the original mainboard, the designers seem to have counteracted this to some degree via additional custom filtering and ADC conversion, but it’s fundamentally a self-inflicted design problem. For the hotend sensor, a cheap NTC 3950 type thermistor or a more expensive PT100/PT1000 thermistor would have been a much better choice to avoid these problems right from the start, since a resistive sensor is a lot less susceptible to this type of interference.

At best, using this stock design with a MAX31855K module causes Klipper to go into emergency shutdown a few minutes into a print due to exceeding the allowed number of consecutive temp sensor readout errors. At worst, wrong readings could trick Klipper into overheating the extruder until something breaks or burns, which something you want to avoid at all costs.

Over the lifetime of my printer before the modifications, there was at least one incident where the firmware was operating with a false low extruder temperature reading (reading a low temperature when the extruder was actually hot), which is a very dangerous situation when actively heating the extruder. It’s possible this situation caused some of the damage to my hotend PTFE tube that I discovered later. While I’m not certain of the exact circumstances and performance in the original design, I highly recommend switching to a different hotend block thermometer type for any printer modification builds that allow it. Sticking with the original K-type sensor in this electrical configuration isn’t worth the hassle.

Semi-Custom MK10 Nozzle/Heater Block Format

Unlike most other printers on the market, the Guider II and its sister models use the MK10 type hotend and nozzle dimensions, which have standard M7 screw diameters on all hotend parts. This hasn’t found wide adoption among other manufacturers, which means that there are few quality upgrade and replacement options left available in today’s market for items such as high-quality nozzles of different sizes and materials. For the M6 screw diameters like E3D V6, Volcano, MK6/MK8 and so on, a lot more parts are available. With the benefit of hindsight, supporting one of those standards or allowing a conversion to it would have been better for end users for parts that eventually fail or wear out.

This is what a cheap Ebay MK10 nozzle looks like before any usage. Urgs!

Additionally, the Guider II variant of the hotend heater block appears to be a cut down version of what other vendors use. The standard MK10 replacement hotend blocks that are still available online are larger than the original, interfering with the Guider’s stock part fan cooling attachment due to being a few millimeters wider and longer. This was annoying to find out and underlines the negative effects of shrinking official Flashforge parts supply.

Similarly, the Micro Swiss vendor sells an all metal hotend kit specifically meant to replace some parts of the official Flashforge hotend, to get rid of the PTFE parts and enable higher print temperatures. While it’s great that this is still available, I found out after installation that it’s actually not 100% compatible with the Guider II. The upper part that screws into the cooling fins adds a non-negligible height to the overall hotend assembly, leading the nozzle to sit over a millimeter further down. While this may not sound like much, it violates the assumptions of the original firmware about the nozzle position and reduces the distance to the bed leveling sensor, which creates new headaches. It’s also an old design without optimized bi-metal heatbreak materials, and overall didn’t perform as well as I hoped.

After spending some money on MK10 replacement parts, I’ve decided to go a completely different route to migrate to the E3D V6 nozzle standard and a much better hotend. More on that modification in a later article.

Modding a Flashforge Guider II 3D Printer - Part 1

2024-09-19T12:45:00+02:00

This article series documents a journey of modifying and improving a 3D printer.

I’ve started out with fully reversible modifications of the original printer system. At the time of writing, I’ve now switched to a full conversion towards new electronics with open-source firmware and other modern components, which has been quite the technical journey. Unlike my other posts, this has relatively little to do with infosec, although it does touch on some general electronics and hardware hacking topics.

Introduction
General Printer Design
Less-invasive Printer Mods
- Flexible Build Plate Replacement
- Stepper Driver Replacement
Appendix: Part Numbers

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Introduction

Several years ago, I bought a lightly used Flashforge Guider II 3D printer from a friend. The printer is massive: with a fully enclosed build platform, strong metal frame and transparent plastic top, its dimensions are roughly 55cm x 55cm x 75cm when fully assembled and with a print spool attached. At 30-38kg total weight according to the Flashforge specs, it’s a real chonker, too.

Inside view of the printer (partially upgraded/modded)

Over time, I got less and less satisfied with the printer’s performance and hardware/software limitations of the closed proprietary system (more on this later), so I set out do a number of custom hardware modifications on it. For multiple years, this consisted of partial improvements to individual components as well as research on the original mainboard control PCB, firmware and other parts. Recently, I finally decided to drop all that and go for a radical redesign. By replacing and rewiring basically all electronics including the mainboard, I was able to convert the printer to a modern Klipper-based open source firmware setup and add a number of significant new capabilities.

Some words of warning: I do not recommend buying variants of this printer in 2024, either for unmodified usage or for a Klipper conversion, due to reasons I’ll outline along this journey. If you’re interested in advanced tinkering and tuning of settings and hardware, you’re probably better off buying & building a Voron parts kit for example, where replacement parts, upgrades and documentation are much easier to come by. If you want cheaper DIY options, consider getting at least a printer that can “natively” be converted to a modern open firmware like Klipper without ripping out all electronics. Going down this path with the Guider II comes with a lot of extra effort and costs.
If you absolutely want to do this anyway, take a look here to get some context on the hidden hardware upgrade variants, since that’ll make a difference along the way. Unless you have a lot of free time and access to cheap Klipper-compatible parts, buying a more modern and open/hackable printer will still be much cheaper and easier.

Disclaimer: I also want to stress that modifications to any 3D printer come with a serious risk of electrical problems, fires, crush injuries, and other health hazards. This set of blogposts is not a tutorial or guide, just a collection of notes, which aren’t meant to be complete or tested enough to be followed blindly. Do any changes at your own risk. Expect them to burn your house down 🔥. You have been warned!

General Printer Design

My particular printer unit began its life re-branded as a Monoprice MP Education Guider II when the original owner bought it in early 2020 at reduced clearance sale price. Except for the special firmware, it was essentially a relabeled Guider II by Flashforge, and I was able to cross-flash to the standard vendor firmware and use the Flashforge-provided flashprint slicer software under Linux.

Engineering-wise, the Guider II is arguably a high-end consumer printer design, but one from 2017. I’ll come back to this point later, because it influences a lot of modding topics. The state of the art of 3D printers has advanced significantly since then, and the resulting gap is hard to overlook in 2024.

All 3D printers are complex bundles of at least two dozen different engineering problems, heavily constrained by price pressure on the finished product.
For the Guider II, the designers decided to spend extra money on:

Robust mechanics - dual linear 10mm diameter rods on all axis, cast aluminium mounting parts
Large build volume
Enclosed print chamber
Heated bed - notably, driven by 24V DC and not mains voltage (120V / 230V AC)
Mean Well 500W 24V power supply - this is a reputable manufacturer
Full Linux system + 32 bit microcontroller
Network connectivity - Ethernet and Wifi allow network print jobs without SD cards
Color TFT display + touchscreen interface
Bed probe in the gantry head - for assisted manual bed leveling
Direct extruder

Each of those drives the price up compared to simple low-end designs, for example through material costs, software complexity and higher shipping costs.

Notably, the Guider II

Is limited to 240°C maximum extruder temperature
Is incompatible with abrasive filaments due to its PTFE tube, brass nozzle, non-hardened extruder wheel etc
Doesn’t have any camera to watch the print remotely (would have been possible via its existing Linux system and network connection)
Doesn’t have any carbon air filter system

This prevents its usage with certain more advanced materials, or greater comfort / control while printing. For a model targeted at “industrial-grade” usage (according to advertising), those are obvious limitations.

Likely for those reasons, Flashforge improved upon this design with the Guider IIs hardware variant that came to market in ~2018. This variant carries over a lot of the Guider II parts, but has some additional improvements. Somewhat confusingly, there’s a second “2020” or “V2” variant of the Guider IIs manufactured from 2020 onwards, which has additional major hardware improvements and incompatible parts, but didn’t get a new model number. On top of that, Flashforge apparently used some newer parts for later production models of the plain II series, leading to many different actual hardware configurations in the wild. This is relevant for modding and general purchase decisions.

Here’s an incomplete, speculative overview of the variants I’ve seen so far:

Guider Variant	Components	Notes
`II` early production	Mainboard `V0.5`	My device
`II` mid production (?)	Mainboard `V0.5` (?), new aluminium mounting parts	seen on `Bresser Rex II` photos
`II` mid production (?)	Mainboard `V0.5` (?), different build plate holder, new aluminium mounting parts	seen on `Bresser Rex II` photos
`II` late production (?)	Mainboard `V0.5` (?), flexible build plate, different build plate holder, new aluminium mounting parts	seen on `Bresser Rex II` photos
`IIs`	Mainboard `V0.5`, air filter + fan, camera, new aluminium mounting parts (?)
`IIs` `"2020"`/`"V2"`	Mainboard `V0.6`, high-temperature (HT) 300°C all-metal hotend with incompatible parts, air filter + fan, camera, hardened extruder parts, new TMC2100 stepper driver for X + Y + extruder, flexible build plate, different build plate holder, new aluminium mounting parts

Sources: (click to unfold)

2FA Auth Bypass in devise-two-factor (CVE-2024-0227)

2024-01-11T21:00:00+01:00

While auditing internal infrastructure for Radically Open Security, I discovered a weakness in the devise-two-factor Time-based One-time Password (TOTP) library. With the help of Chris MacNaughton, we confirmed the vulnerability and informed the upstream vendor of the library.
This article has some details about the vulnerability and disclosure.

TL;DR
Introduction & Background
TOTP Brute-Force Vulnerability in devise-two-factor
Coordinated Disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

TL;DR

A popular Ruby TOTP two factor server-side library lacks built-in protections against brute force attacks of the verification step. Due to design limitations of the underlying standards, this allows bypassing the second factor in targeted attacks against some web applications. The brute force attacks are practical in common configurations and take a few days or less to break into an account for which the main password is known.

Introduction & Background

Two-factor authentication (2FA) mechanisms are security systems designed to make account takeovers harder to accomplish and more difficult to scale, while at the same time having a bearable overhead cost to the users they’re protecting.

As such, 2FA mechanisms often have a number of trade-offs which weaken their overall effectiveness in order to make them easier to use.

The Time-based One-time Password (TOTP) standard at the center of this vulnerability is a popular 2FA mechanism specified in RFC6238. TOTP is basically a time-based adaptation of the HMAC-based One-Time Password (HOTP) algorithm standard as specified in RFC4226, and has inherited most of HOTP’s general design. If you have a smartphone security app that shows you some simple secret code which changes every 30s and can be typed into websites to confirm your login, TOTP is probably the standard that you’ve been using.

For this article, the most relevant security design choice of TOTP/HOTP was to keep the “one time password” secret very short and low in complexity. The commonly used TOTP default parameters require just six numerical digits for this temporary password.

Most readers are likely aware that a 6-digit account password is very insecure under normal conditions. If there are no consequences for wrong guesses, an attacker can simply try every number combination between 000000 and 999999 and find the valid password - a so-called “brute force” attack. Compare this to the short numerical PIN on your banking card: if the bank’s ATM system enforces no limit on incorrect PIN attempts, a persistent thief with your card can circumvent that second factor fairly easily by trying combinations for a while. It’s only the restriction on a few wrong PIN attempts before blocking the card that makes this a reasonable system.

The authors of the HOTP standard were fully aware of this design limitation, which they described in section 7.3 of their RFC:

Truncating the HMAC-SHA-1 value to a shorter value makes a brute force attack possible. Therefore, the authentication server needs to detect and stop brute force attacks.

They described two potential countermeasures that implement a lockout or delay based defense:

We RECOMMEND setting a throttling parameter T, which defines the maximum number of possible attempts for One-Time Password validation. […]

Another option would be to implement a delay scheme to avoid a brute force attack. After each failed attempt A, the authentication server would wait for an increased T*A number of seconds, e.g., say T = 5, then after 1 attempt, the server waits for 5 seconds, at the second failed attempt, it waits for 5*2 = 10 seconds, etc.

This security recommendation was public since at least the October 2004 draft, almost two decades ago at this point. However, it is not always adopted correctly, making it a potential weak point for attacks that target the second factor.

TOTP Brute-Force Vulnerability in devise-two-factor

Radically Open Security, a non-profit computer security consultancy that I work with as a freelancer, runs an instance of the open source EyeDP identity provider software. EyeDP is used for single-sign on (SSO) login access to internal services. During some internal audit work, I discovered a suspicious absence of TOTP anti brute-force defenses in the EyeDP code.

After reaching out to EyeDP’s developer Chris MacNaughton, we were able to confirm together that EyeDP is susceptible to brute forcing the TOTP codes to bypass the 2FA. EyeDP is a Ruby application that uses the Ruby devise authentication framework for auth handling and the devise-two-factor library extension to implement its 2FA mechanisms. We quickly found out that the upstream devise-two-factor also does not have any protections against one time password (OTP) brute force attacks, which EyeDP implicitly relied upon.

Here is a relevant code excerpt from devise-two-factor:

def validate_and_consume_otp!(code, options = {})
    otp_secret = options[:otp_secret] || self.otp_secret
    return false unless code.present? && otp_secret.present?

    totp = otp(otp_secret)

    if self.consumed_timestep
        # reconstruct the timestamp of the last consumed timestep
        after_timestamp = self.consumed_timestep * otp.interval
    end

    if totp.verify(code.gsub(/\s+/, ""), drift_behind: self.class.otp_allowed_drift, drift_ahead: self.class.otp_allowed_drift, after: after_timestamp)
        return consume_otp!
    end

    false
end

two_factor_authenticatable.rb L36-L52

As you can see, if the totp.verify() call does not succeed, the function returns false to signal failure but doesn’t change any state in memory or in the database to count the failed attempt. Some form of bookkeeping would be necessary to recognize that a subsequent failed attempt has exceeded some threshold and enforce any delay or lockout countermeasure scheme.

In case of a successful login, there is a protection mechanism which tracks the last used OTP code self.consumed_timestep to prevent its re-use after a successful login:

# An OTP cannot be used more than once in a given timestep
# Storing timestep of last valid OTP is sufficient to satisfy this requirement
def consume_otp!
    if self.consumed_timestep != current_otp_timestep
        self.consumed_timestep = current_otp_timestep
        return save(validate: false)
    end

    false
end

two_factor_authenticatable.rb L79-L88

This solves previous security issues in devise-to-factor, namely CVE-2015-7225 and CVE-2021-43177 concerning OTP reuse that is forbidden by the standard. However, the OTP reuse detection for the last login unfortunately does not provide any protection against brute force attacks on new logins, so this defense is basically irrelevant here.

Prior Art & References

The brute force attack vector against OTP standards has been publicly known for as long as the standards existed. Unsurprisingly, a lot of public references and writeups for it exist.

Here are some that are worth taking a look at:

Brute Forcing TOTP Multi-Factor Authentication is Surprisingly Realistic and otpbrute by Michael Fincham
usd-2021-0016 covers this issue in the popular Keycloak access system, where the protection is opt-in.

Additionally, the issue #19799 in GitLab from 2016-07-06 came very close to publicly expressing this exact issue in devise-two-factor. GitLab is a Ruby application which uses devise-to-factor, and proper defenses in the library would have helped to prevent this. As far as we’re aware, this issue wasn’t raised upstream, and the GitLab TOTP handling was fixed with custom logic.

Security Implications

This vulnerability allows an attacker who knows the correct primary login credentials of a victim user to repeatedly guess the second factor OTP verification code until they randomly succeed, without getting locked out or delayed.

Without additional protections such as rate limiting, an attacker testing possible OTP codes at the maximum rate that the server can process them is able to break the 2FA protection of an account within a few days or even hours, under default conditions.

Since the attack is probabilistic, the attacker may get lucky and succeed on the first try, or try the whole number range and still not succeed yet. In this regard, the issue behaves differently from normal password brute force since the changing OTP code is a moving target. To account for with this, it helps focusing on the “average” length of attack required to have a 50% chance of getting in.

Additional Attack Considerations

There are some configuration options which significantly affect the complexity of practical attacks:

TOTP codes are typically 6-digit, but can be longer. An 8-digit code is 100x more time intensive to break than a 6-digit code, on average. Using the a-z alphabet or alpha-numerical codes would be dramatically more complex, but is not common.
TOTP requires correct time synchronization between client and server so that the legitimately generated codes match up correctly on both sides. Since drifting clocks are an issue, some servers configure the verification logic with grace periods that account for time drift ahead/behind the actual time. If in use, this can multiply the attacker’s chances of guessing one of the correct OTPs and shorten the attack by making multiple OTP codes valid at once.
Accidental Denial-of-Service/Resource Exhaustion via login endpoint. On some web applications, each OTP code verification triggers login checks that cause a significant amount of system load (for example due to repeating expensive computations for password hashing). By issuing an unusually high number of repeated logins as fast as the server can process them, the attacker’s attempts may be limited by the available free CPU resources of the application server and cause noticeable CPU load that affects the rest of the application as well.
Depending on the application configuration, each failed OTP attempt will create a log entry, which makes continued attacks very visible to administrators (but likely not to the targeted user).

Guessing the OTP code right once doesn’t help the attacker for future attempts against the same account, but this may not be necessary. Depending on the application, the attacker may be able to use the granted authentication session to make the 2FA protections ineffective by disabling 2FA on the account or adding new attacker-controlled 2FA tokens.

Example Attack Time Calculation

Let’s take the following example:

TOTP with 6-digit code
The server allows a time drift of one extra code ahead and behind (= 3 codes valid at any given time)
The attacker is able to test 4 codes per second

Borrowing the python-based calculation method from Michael Fincham’s article:

from scipy.stats import binom
1 - binom.pmf(k=0, n=4 * 3600 * 24, p=3 / 1000000)
0.6454130028943306

After a day of OTP testing, chances for success are already 64.54% in this scenario. This is bad!

Mitigations

Since the vendor decided not to include any targeted defenses against TOTP brute force attacks, this responsibility falls on the projects using the library.

Wherever possible, we recommend defense strategies that count failed attempts primarily by targeted account, and after the attacker has passed some initial form of authentication barrier. This design ensures the inherent new attack vector towards locking out genuine users stays as minimal as possible. Additionally, this pattern cannot be bypassed by distributing an attack against the same user onto many source addresses (IPv4 / IPv6).

Rack::Attack is a common Ruby library to block and throttle requests, which may be of help if user session data is available during a multi-stage login flow. Alternatively, it can be used to implement some additional secondary defense which limits login requests per IP address or IP subnet (with obvious limitations).

Devise has the Devise::Models::Lockable mechanism to block user accounts after some number of incorrect login attempts. We see this as an inferior solution that creates new issues. Attackers who know neither the correct TOTP code nor the correct account password can still trigger account lockouts, which is undesirable.

High-Level Recommendations

For users:

If possible, switch exclusively to more modern 2FA mechanisms such as WebAuthn.
Unfortunately, many popular websites do not support WebAuthn yet, or enforce the use of TOTP as a fallback mechanism.
If TOTP is your only option for 2FA, using it is still strictly better than not using it.
If you’re concerned about your account security, use this opportunity to ensure you’re using unique, complex, impossible-to-guess passwords everywhere and have a decent password manager. If the attackers don’t know your password to a given site, this 2FA attack doesn’t matter.

Coordinated Disclosure

The disclosure process had several busier-than-usual phases, since we had to work on coordinating the issue shortly before and after the Christmas holidays. Additionally, the embargo timeline changed from 60 days -> 90+ days -> 30 days. From the start, we supported a shortened embargo to get this information out sooner, but things still got unexpectedly busy with last-minute writeup and coordination work.

CVE Status

During vulnerability disclosure with the vendor, we proposed to Synopsys that a CVE ID should be assigned. After some internal deliberation, they agreed and went ahead to assign CVE-2024-0227. Notably, Synopsys is an official CVE Numbering Authority (CNA). As a CNA, they have specical privileges to assign their own CVEs, which is not very common.

About two months after CVE-2024-0227 was made public and used in various places as the official disclosure reference, Synopsys informed us that they’re retroactively rejecting their own CVE:

“Because the vulnerability identified by CVE-2024-0227 exists as flaw in the ToTP design and not devise-two-factor, the CVE isn’t appropriate.”

We do not agree with this assessment and think the CVE ID should continue to be used.

devise-two-factor ignored the clear RFC security design requirements “[…] the authentication server needs to detect and stop brute force attacks.” from RFC4226 Section 7.3, making this more of an implementation issue in their library than an impossible-to-protect design issue.

As an unfortunate consequence of Synopsys’ unilateral decision to reject CVE-2024-0227, all vulnerability details are effectively depublished on official CVE sites, which negates its value for coordination efforts. Considering that the vendor has decided not to patch the issue and requires downstream projects to implement custom protections, we see this depublishing as counterproductive.

Credits and Sponsoring

I discovered this issue during internal audit work for Radically Open Security (ROS). Radically Open Security supported the disclosure and sponsored the worktime for most steps such as initial analysis, triage, and disclosure to the vendor Synopsys.

Thanks go to Chris MacNaughton (Centauri Solutions) who was heavily involved in the analysis steps, and to the team at Radically Open Security who helped with coordination efforts.

This article was written on my own time.

Confirmed Affected Projects

Project	Source	Likely Affected Version	Fix	References
Synopsys devise-two-factor	GitHub	>1.0.0	Not planned	GHSA-chcr-x7hc-8fp8 advisory, CVE-2024-0227 (inactive due to vendor decision)
Centauri Solutions EyeDP	GitHub	`<= 1.0.16`, `< 1.1`	1.0.17, 1.1.0-rc4	GHSA-qrqh-v2j6-3g7w advisory
diaspora*	GitHub	`< 0.9.0.0` (not verified)	0.9.0.0	Changelog notes

Potentially Affected Projects

To be determined.

Related lookups:

GitHub index of projects that depend on devise-two-factor

Public notifications:

Bullet train #1285

Projects with Mitigations

The following projects use devise-two-factor, but have mitigations that can be effective under at least some conditions:

Project	Source	Comment
GitLab	GitLab	Uses dedicated layers of rate-limit / lockout mechanisms.
Mastodon	GitHub	Uses rate limits on login endpoint, if accessed via normal network paths. Under some conditions, the rate limiting may not be sufficient or could be circumvented, see GHSA-c2r5-cfqr-c553.

Please note that we have not analyzed the defenses closely and do not vouch for their effectiveness.

Detailed Timeline

Date	Information
2023-12-12	Initial discovery of issue in local ROS EyeDP installation
2023-12-12	Triage of vulnerability together with EyeDP developer
2023-12-12	Brief exposure of EyeDP mitigation patch on GitHub
2023-12-12	Rollout of mitigations for local ROS EyeDP installation
2023-12-13	Coordinated disclosure of vulnerability to Synopsys PSIRT
2023-12-20	Status request to Synopsys PSIRT
2023-12-20	Synsopsys PSIRT responds, asks us to re-send the disclosure
2023-12-20	Repeated transmission of vulnerability to Synopsys PSIRT
2023-12-20	Synsopsys PSIRT confirms receipt of disclosure, announces goal of 90d embargo starting 2023-12-20
2023-12-21	Followup with more information to Synsopsys PSIRT
2024-01-08	Status request to Synopsys PSIRT
2024-01-09	Synopsys PSIRT provides some updates, CVE ID, and announces embargo end in less than 7d
2024-01-10	Coordination with Synopsys PSIRT
2024-01-11	Coordination with Synopsys PSIRT
2024-01-11	Synopsys publishes advisory
2024-01-11	Publication of this article
2024-03-12	Synopsys PSIRT notifies us they’re invalidating CVE-2024-0227
2024-06-16	diaspora* releases a patched version

Please note: additional steps after initial article publication are not covered in this timeline.

Bug Bounty

There was no bug bounty involved.

Milk Sad - How Weak Entropy can Ruin Your Savings (CVE-2023-39910)

2023-12-08T17:00:00+01:00

In late July and August 2023, a team of fellow researchers and I rushed to understand, write up and publish a serious cryptocurrency wallet creation issue in the Libbitcoin Explorer bx software tool that left victims exposed to remote & automated wide-scale theft of funds.
The coordinated theft of assets that happened on 2023-07-12, during which bx user’s funds were targeted among with other weak wallet types, amounted to millions of dollars in damages across hundreds of victims and various blockchains and coin types.

We found that the core issue for bx was the usage of the unsuited Mersenne Twister Pseudo Random Number Generator (PRNG) algorithm, which led to cryptocurrency assets being stored on what is essentially a “32 bit number in a trench coat”, instead of a strong private key. Anyone with knowledge of the issue and a moderate amount of computing power could reverse these without any access to the victim’s computer and use the recovered private keys to move funds away. We gave this vulnerability the codename Milk Sad after the first weak BIP39 mnemonic key output, and worked frantically during a short period of 2 1/2 weeks between detection and disclosure to learn, research and explore what we could about the issue and its backstory. Our motivation was to help users saving their remaining funds and understand the problem, and help developers fix and prevent issues like this for the future.
You can read the results in the full disclosure writeup.

For “normal” software vulnerabilities, most of the research work is done after identifying, reproducing, classifying and disclosing them. Not in this case - exploring the complex and wide-reaching impacts of the vulnerability is a huge task, with practical challenges for coding the necessary custom tooling and analyzing the results. I’m investing a lot of research time to further understand and publish new information on Milk Sad and previous similar vulnerabilities as a series of research updates, since they’re both fascinating and under-reported. Head over there if you want to read more!

Yubico YubiHSM PKCS#11 Library Vulnerability (CVE-2023-39908)

2023-08-14T19:50:00+02:00

Heiko Schäfer discovered a new security issue in the Yubico yubihsm_pkcs11.so driver library, which we disclosed together to Yubico. The YubiHSM PKCS#11 client-side library is designed to interact with Yubico HSM2 hardware security modules. Due to flaws in the memory handling, the library code accidentally returns 8192 bytes of previously used process memory under some circumstances. This impacts the memory confidentiality of the calling program for some usages.

This article will describe the issue.

Memory Handling Issue in C_GetAttributeValue
Coordinated Disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Memory Handling Issue in C_GetAttributeValue

The C_GetAttributeValue() function in yubihsm_pkcs11.c can be used to query X.509 certificate attributes of particular types. However, some codepaths have problematic memory handling.

Consider client software that retrieves the certificate attribute CKA_SERIAL_NUMBER with the C_GetAttributeValue call from a regular, non-malicious YubiHSM2 device via the PKCS#11 API interface.

Internally, the PKCS#11 functions will establish a session with the YubiHSM2 device and then attempt to handle and parse the received object information. This happens via the following code positions:

CK_DEFINE_FUNCTION(CK_RV, C_GetAttributeValue)[...]

yubihsm_pkcs11.c L2018

CK_RV populate_template(int type, void *object, CK_ATTRIBUTE_PTR pTemplate, [..])

util_pkcs11.c L5065

Note that populate_template() reserves a local stack buffer CK_BYTE tmp[8192]; without explicit data initialization and passes its size as the len parameter to subsequent function calls in

CK_ULONG len = sizeof(tmp);

util_pkcs11.c L5072 This will become relevant later.

The particular security issue is associated with opaque attributes, which are handled via

static CK_RV get_attribute([...])

util_pkcs11.c L1969

static CK_RV get_attribute_opaque(CK_ATTRIBUTE_TYPE type, [...] )

util_pkcs11.c L1030

In get_attribute_opaque(), some function paths overwrite the length parameter - which is passed as a reference - to the actual length of the field for the specific attribute they fetched. However, for at least three specific field types, this does not happen:

   case CKA_SUBJECT:
   case CKA_ISSUER:
   case CKA_SERIAL_NUMBER:
break;

util_pkcs11.c L1105-L1108

As a result, code flows hitting the quoted code lines in get_attribute_opaque() will return to the parent function without writing data into the tmp buffer, without changing the length parameter away from the maximum value, and without returning an error code. By convention, this appears like a successful operation which produced 8192 bytes of new output, although in reality the tmp buffer was not written to at all.

This constellation leads to a problematic memcpy() call which leaks the uninitialized memory contents of tmp into the output:

memcpy(pTemplate[i].pValue, tmp, len);

util_pkcs11.c L5095

Typically, the content of uninitialized stack memory variables will contain memory from stack frames which previously occupied the relevant memory region. Copying data from this memory region will likely contain stack variables and other stack-related information (stack canaries, pointers) from previous function calls.

The leaked data will then get returned to the PKCS#11 caller. Since the caller requested some specific certificate information, but instead gets data from an information leak, this represents a security issue. Additionally, since the problematic library function indicates no errors, this information may be passed on by the caller towards other components and trust spheres, depending on the specific behavior of the program.

Programs that use the YubiHSM are likely to handle secrets, possibly including secret key material, or sensitive plaintext. The sensitive key material may include the PIN secret used to secure the HSM communication. Because of this bug, a program that uses yubihsm_pkcs11.so may inadvertently return such information instead of the requested X.509 certificate attributes.

Since the vulnerable component is a flexible library, it is unclear which programs call into the problematic function, and under which circumstances. Additionally, the relevant memory accesses are undefined behavior (UB) in C and may depend on the compiler and system environment. If you have more information about specific integrating applications and their confirmed security impacts, please contact us.

Initial Debugging

To confirm that this data actually leaks from the tmp variable on the stack, we used a modified yubihsm_pkcs11.so library which specifically marks the memory in question with ASCII ‘A’ characters. During initial debugging, this allowed a straightforward identification of the problematic memory in returned data:

CK_BYTE tmp[8192];
+ // mark potentially leaked bytes
+ memset(tmp, 0x41, sizeof(tmp));

PoC

In order to confirm the issue and help Yubico reproduce it, we crafted a proof-of-concept (PoC). The PoC consists of a short dummy program written in C that triggers the issue: pkcs11-memleak.c.

WARNING: use the provided PoC code at your own risk, and only on non-production HSM devices.

Please see the code comments for setup details and explanations. The special put_dummy_secrets_on_stack() function may be of particular interest to understand the leaked output and attack conditions.

Code History

Note: this section has been updated to include new information.

By our understanding, this issue was introduced via a combination of two commits:

This commit on Sep 8, 2020 (released with version 2.0.3)
This commit merged on Jan 1, 2023 (released with version 2.4.0)

Commit 1) introduces the weak tmp buffer initialization, and commit 2) introduces problematic code paths and makes the issue reachable.

Patch

The main patch improves the problematic code paths in get_attribute_opaque():

    case CKA_SUBJECT:
    case CKA_ISSUER:
    case CKA_SERIAL_NUMBER:
+      *((CK_BYTE_PTR *) value) = NULL;
+      *length = 0;
      break;

d56f8567d4fe807dc097febbac7bb4e02ca9dea3

A second patch improves the buffer initialization:

-  CK_BYTE tmp[8192];
+  CK_BYTE tmp[8192] = {0};

util_pkcs11.c R5179

Additional references:

PR#354

Security Implications

CVSS Score

The described security issue affects the confidentiality of program memory. Due to the characteristics of the flaw, we think that program memory integrity and program availability is not impacted.

As with other memory- and library-related vulnerabilities, it is difficult to say generally what the sensitive information in memory is going to be, and how the leaked information will be processed or exposed by the caller. As a result, the practical worst-case impact will likely be very target-dependent.

ID	CVSS 3.1 Score	Parameters
CVE-2023-39908 stack information leak	4.4 (Medium)	AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:N/A:N

The listed scoring maps the impact on a network-enabled integrating program which allows a remote user to trigger the affected functionality and obtain secrets after some form of authentication as a high-privileged user. Other integrating programs that use the PKCS#11 driver may have different impacts. For example, if a lower-privileged user can trigger the issue, PR: L would turn this into a 5.3 CVSS base score (calculator).

Coordinated Disclosure

As outlined in the timeline, the first ~60 days out of the overall 90 days of disclosure did not see a lot of activity or feedback from the vendor side. This follows a pattern seen with previous coordinated disclosures to Yubico, which also had significant delays between reporting and technical discussion & assessment coordination with the vendor. We recommend focusing on a quicker initial handling for future disclosures to reduce the time pressure on coordination tasks. We want to positively mention that Yubico has provided security patches and an advisory on the disclosure date for this disclosure, which is an improvement over the previous issue.

Credits and Commercial Work

During work on integration of YubiHSM2 into an OpenPGP project, Heiko Schäfer found the memory safety issue. Christian Reitter assisted with triage, issue analysis, coordinated disclosure and report writeup. In references to this issue, please credit “Heiko Schäfer and Christian Reitter”.

Heiko Schäfer is available for commercial work with a focus on OpenPGP and Rust:

OpenPGP CA,
OpenPGP on HSM devices (OpenPGP card, PKCS #11, PIV),
Sequoia PGP.

Relevant Sources

Variant	Source	Likely Affected	Fix	References
Yubico upstream	GitHub	`2.4.0`	SDK 2023.08, `2.4.1`	YSA-2023-01 advisory, CVE-2023-39908
Fedora package	rpm package	`2.4.0-1`	`2.4.1-1`, commit	bugzilla #2232340

We originally reproduced the issue with yubihsm-shell-2.4.0-1.fc38.x86_64 under Fedora. It appears that earlier versions before 2.4.0 do not contain the problematic code path, see here.

Detailed Timeline

Date	Information
2023-05-18	Disclosure of issue to Yubico, including proof-of-concept code
2023-05-24	Response by Yubico, confirms recept of disclosure
2023-06-21	Request to Yubico for a status update and severity assessment
2023-07-17	Status update request to Yubico after lack of response
2023-07-17	Response by Yubico with technical details and CVSS scoring
2023-07-21	Message to Yubico, discussing proposed CVSS scoring & CVE
2023-07-25	Response by Yubico, discussing proposed CVSS scoring
2023-07-31	Message to Yubico, discussing proposed CVSS scoring & CVE
2023-08-02	Response by Yubico, outlining CVE assignment and disclosure date plans
2023-08-04	Response by Yubico, disclosure date plans
2023-08-05	Message to Yubico, acknowledgment
2023-08-14	Yubico publishes YSA-2023-01 and patch release
2023-08-14	Publication of this article
2023-08-16	Original end date of 90-day coordinated disclosure period
2023-08-23	Update of this article, revising version information, adding patch details

Bug Bounty

At the time of the disclosure, the vendor did not offer a bug bounty.

KeepKey Memory Exfiltration Vulnerability (CVE-2023-27892)

2023-04-17T20:00:00+02:00

The article describes a new vulnerability in the KeepKey hardware wallet. Vulnerable code in the Ethereum transaction handling can leak memory from attacker-controlled address locations onto the display when processing a crafted EthereumSignTx message. An attacker with physical access to an unlocked KeepKey device can extract the BIP39 seed or other confidential device secrets via this flaw without tampering with the device hardware or leaving permanent traces.

The Vulnerability
Coordinated disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

The Vulnerability

Annotated photo of KeepKey with successful leak of BIP39 seed portion (details described in PoC section)

Attacker-Controlled Out-of-bounds Read (CVE-2023-27892)

This section outlines how Ethereum-related processing code introduced with firmware v7.5.2 can be used as an arbitrary read gadget to display confidential device memory on the OLED screen, which violates security goals.

Once an attacker sends a special Ethereum signing request message, the following code path in ethereum_signing_init() can be triggered:

if (!ethereum_cFuncConfirmed(data_total, msg)) {

ethereum.c L715

This calls the recently added cf_confirmExecTx() function via the short intermediary function ethereum_cFuncConfirmed():

bool ethereum_cFuncConfirmed(uint32_t data_total, const EthereumSignTx *msg) {
   if (cf_isExecTx(msg)) {
     return cf_confirmExecTx(data_total, msg);

ethereum_contracts.c L38-L40

The code vulnerability is located in cf_confirmExecTx().

Before we dig deeper, first some context on this section of the firmware.

On an abstract level, the code for Ethereum transaction confirmation functionality is supposed to

Parse the incoming attacker-controlled EthereumSignTx *msg request received via USB from the host computer.
Show several important transaction details on the device screen for secure confirmation by the human operator independent of the untrusted host computer.

Subsequent code stages then perform the actual Ethereum transaction signing, but they are not relevant to understanding this issue.

The confirmation flow in question has multiple stages to display and approve the individual components of the transaction:

Call confirm(ButtonRequestType_ButtonRequest_ConfirmOutput, ...) on the decoded receiver address.
Repeat this action on the decoded transfer amount.
Display a dynamic amount of raw data in the parsed request message in hexadecimal encoding on the OLED display, with pagination if needed.

It is at the third step where things go bad. 🌩

Here is the problematic code section:

// get data bytes
bn_from_bytes(msg->data_initial_chunk.bytes + 4 + 2*32, 32, &bnNum);        // data offset
offset = bn_write_uint32(&bnNum);
bn_from_bytes(msg->data_initial_chunk.bytes + 4 + offset, 32, &bnNum);      // data len
dlen = bn_write_uint32(&bnNum);
data = (uint8_t *)(msg->data_initial_chunk.bytes + 4 + 32 + offset);

contractfuncs.c L68-L73

The goal of the listed code instructions is to prepare the uint8_t* data pointer and uint32_t dlen length variables of the data that should be printed. The display logic then uses them to show hexadecimal encoded text versions of the referenced data payload in the Ethereum transaction message to the user. Due to the limited screen size, the conversion and screen dialog operates on paginated chunks.

Display logic:

n = 1;
chunkSize = 39;
while (true) {
    chunk=chunkSize*(n-1);
    for (ctr=chunk; ctr<chunkSize+chunk && ctr<dlen; ctr++) {
        snprintf(&confStr[(ctr-chunk)*2], 3, "%02x", data[ctr]);
    }
    if (!confirm(ButtonRequestType_ButtonRequest_ConfirmOutput,
              title, "Data payload %d: %s", n, confStr)) {
        return false;
    }
    if (ctr >= dlen) {
        break;
    }
    n++;
}

contractfuncs.c L75-L90

The crucial mistake in the message parsing logic is the lack of range checks for the variables. Both uint32_t offset and uint32_t dlen are assigned and used without ensuring that the referenced memory region is firmly within the msg->data_initial_chunk.bytes payload section. This leads to serious problems!

Let’s walk through one of the problematic assignments in more detail:

bn_from_bytes(msg->data_initial_chunk.bytes + 4 + 2*32, 32, &bnNum);        // data offset
offset = bn_write_uint32(&bnNum);

contractfuncs.c L69-L70

In simplified terms, the combination of void bn_from_bytes(const uint8_t *value, size_t value_len, bignum256 *val) and uint32_t bn_write_uint32(const bignum256 *in_number) reads an uint32_t value from a particular memory location without imposing any additional range limitations on the resulting number. In the code snippet shown above, the number conversion first reads a 256 bit bignum number from a fixed byte offset within msg->data_initial_chunk.bytes and then assigns the least significant four bytes to offset, discarding the rest of the input.

A similar operation happens for the dlen read, but from a flexible offset location (more on this later).

It’s important to remember that the Ethereum transfer request message comes from an untrusted source - the computer acting as the USB host could be compromised by malware, which is the reason behind showing the user confirmation steps on the hardware wallet display in the first place. In this particular code branch of Ethereum transaction signing, the format validation functions run before cf_confirmExecTx() impose no meaningful limitations on the msg->data_initial_chunk.bytes content.

To summarize, msg->data_initial_chunk.bytes passes over a trust boundary, isn’t validated to any strict specification, and then used without sufficient length checks.

An attacker with control over the message content can exploit the unbounded conversion flaws in two general ways:

Set a large uint32_t offset value, use it to move the data pointer, and leak content from an arbitrary memory location.
Set a small uint32_t offset value, control uint32_t dlen, and run the memory printing function arbitrarily far beyond the packet buffer.

In both cases, the previously quoted display logic will trigger confirm() dialogs that leak raw memory from out-of-bounds regions via snprintf() to the KeepKey device OLED screen. That’s a pretty powerful attack gadget on a hardware wallet, which is supposed to avoid data leaks at all costs!

The following attack description will focus on direct data pointer control via large offset values (variant no. 1), which I’ve found to be more powerful and practical for manual attacks without physical automation. It’s simpler to leak an interesting memory region directly on the screen in a few display pages, compared to setting an oversized dlen length and manually cycling through thousands of display pages before arriving there.

Digging deeper into the code behavior, we can see that the attacker can force arbitrary pointer addresses for data. Due to the unsigned integer overflow wrapping, the msg->data_initial_chunk.bytes + 4 + 32 + offset calculation can end up with any address in front of or behind msg->data_initial_chunk.bytes! To make matters worse for the defenders, msg->data_initial_chunk.bytes is at a static and well-known absolute address. The currently processed Ethereum message will always be located in a special decode buffer after it is converted from the protobuf wire format:

static void dispatch(const MessagesMap_t *entry, uint8_t *msg,
                     uint32_t msg_size) {
  static uint8_t decode_buffer[MAX_DECODE_SIZE] __attribute__((aligned(4)));

messages.c L122-L124

Since decode_buffer[] is a static global variable and the ARM Cortex-M3 platform has no address space layout randomization, the buffer and the msg->data_initial_chunk.bytes struct field will always be located at the same absolute memory location for a given firmware version. This allows attackers precise and reliable exploitation of this issue without the need for guesses or usage of other information leaks.

For attacks that intend to read out a specific, narrow memory region via crafted offset values, the last remaining obstacle is the limited attacker control over dlen when manipulating data. By picking crafted offset values in the attack message which move data towards other microcontroller memory outside of the message buffer, the dlen-defining read operation moves there as well:

bn_from_bytes(msg->data_initial_chunk.bytes + 4 + offset, 32, &bnNum);      // data len
dlen = bn_write_uint32(&bnNum);

contractfuncs.c L71-L72

Unfortunately for the defenders, this is drawback can be worked around since the bignum data read logic and display code is very forgiving and will treat basically any data as a meaningful length field. The attackers can simply point to a memory region slightly in front of the targeted data that is known to have some non-null data bytes in the 4 byte window of dlen. As long as the converted dlen value is at least as large as the desired data readout section, the resulting memory readout will successful leaks all relevant data after some pagination.

For edge cases where dlen is unexpectedly small, the display code runs into another failure mode and leaks previously used stack memory via the unitialized char confStr[131]; variable. However, compared to the arbitrary read gadget of specific memory address contents, this is not nearly as interesting or powerful. Similarly, the attacker can set offset such that display reads will access forbidden memory regions and cause a crash. Given the requirements of this attack, this exploitation variant is also not of much interest, but technically part of the potential impact.

Additional Attack Considerations

The problematic functionality can be triggered by local or remote attackers once the device is in an unlocked state (if a PIN is set on the target device) and the user physically confirms at least some steps of an Ethereum signing flow. The most limiting factor in the attack is that the secret information is only rendered on the physical KeepKey display as hexadecimal-encoded data and not leaked back towards the host computer.

The latter behavior is due to the confirm() handler at confirm_sm.c which does not make use of the data field in the ButtonRequest message and therefore does not send the displayed string towards the computer, where malware could read it after tricking the user to confirm a supposedly low-value Ethereum transaction.

message ButtonRequest {
  optional ButtonRequestType code = 1;
  optional string data = 2;
}

messagemap.def L78

As with other KeepKey USB related vulnerabilities, a malicious website with user-granted WebUSB permissions could trigger this issue. However, in this particular vulnerability there is no return channel for the leaked information, so additional physical capabilities by the attacker are needed. Under some edge conditions, social engineering may be used to trick the victim user of the KeepKey to voluntarily copy or photograph the leaked information from the device screen, but I see this as difficult to achieve reliably given the circumstances.

From a threat model perspective, I see this vulnerability as relevant despite the high attack requirements since it undermines both the implicit and explicit security guarantees of the hardware wallet with regards to the confidentiality of long-term cryptographic key material.

One of the affected mechanisms is an advanced wallet initialization mode of the KeepKey wallet which doesn’t reveal the generated BIP39 mnemonic seed to the user at any point, see lib/firmware/reset.c. Wallets initialized with this mode permanently have the no_backup flag set to true, and the communicated goal is to make a recovery of the key impossible. The demonstrated attack for CVE-2023-27892 clearly violates this goal, as the no_backup flag stays unchanged despite the revealed secret.

Similarly, wallet users may have the expectation that the effects of hands-on attacks against their wallet have to be immediate, i.e., the transfer of funds during the attack, or that attacks are only possible if the unlocked wallet already has significant funds available at the time of the attack. While this doesn’t have to be correct 100% from the technical side, for example since attackers could delay the submission of their illegitimately obtained signed transactions to public networks, access to the underlying BIP39 seed by the attacker certainly allows for much more flexible and targeted theft months or years later across various coins, wallet accounts and addresses. In the case of wallets which were temporarily less protected - no PIN configured, accessible to other people, left connected to an unlocked and unsupervised computer for some minutes - this could make a significant difference in practical risk over the multi-year lifetime of a typical BIP39 seed.

Finally, there’s also the consideration with regards to BIP39 passphrases, which are an additional and highly recommended safety layer on top of the BIP39 mnemonic words to prevent the theft of funds. CVE-2023-27892 opens the door for two particular attacks against passphrases. If a given hardware wallet is accessed/stolen by the attacker due to lack of PIN protection (or by using a known PIN), even a moderately complex passphrase could prevent an attacker from discovering and using the custom passphrase-based wallet that holds some additional funds. An online brute-force attack against possible passphrases using the built-in firmware mechanisms is significantly rate-limited due to slow APIs, limited microcontroller processor speed for derivations as well as physical confirmation steps, which results in very limited attack capabilities. Using CVE-2023-27892, an attacker can obtain the BIP39 seed and then scale offline brute-force attacks to an arbitrary number of powerful systems, making it much more feasible to determine the correct derivation with e.g., a dictionary-based attack. In rare scenarios where the attacker temporarily gets access to a hardware wallet that is not just plugged in and unlocked, but also has a sensitive passphrase cached in-memory, the passphrase may also be revealed directly. This also applies to other volatile secrets in memory such as the PIN, but note that auto-locking and other functionality may interfere with this.

To summarize, CVE-2023-27892 does not benefit attackers who steal a PIN-protected KeepKey that is powered off, but significantly increases attacker capabilities for delayed theft, circumvention of no_backup mode guarantees, and enables BIP39 passphrase brute-forcing or direct retrieval as well as other attacks in case of temporarily unprotected and unsupervised devices.

Also noteworthy: this security issue may be beneficial to legitimate owners who have partially or completely forgotten/lost essential secrets of their configured devices. Under some conditions, it may be possible to recover secrets that are still in the device (see the previous paragraphs). Leveraging firmware up- and downgrade capability between vendor-signed official firmwares without the mandatory erasure of BIP39 seed secrets could help with this (disclaimer: perform at your own risk!). I’m looking forward to feedback from users in case this security research was helpful in particular recovery cases.

POC

WARNING: use the provided PoC code at your own risk. The instructions will PERMANENTLY overwrite the configuration of the hardware wallet. Only test with an expendable unit.

Prepare the target KeepKey with a well-known BIP39 seed.
In the following example, this is done via keepkeyctl wipe_device and keepkeyctl load_device -l "poc_test" -m "keep key program problem process input result memory display defense broken inform", which is a custom seed with a valid checksum.
Ensure the target KeepKey has the firmware v7.5.2, which the PoC is prepared for.
Ensure a working Python3 environment with the pyusb module installed.
Re-connect and PIN-unlock the target KeepKey to simulate the targeted scenario.
Run the BIP39 seed extraction PoC with sufficient permissions for USB access.
Confirm the Ethereum transaction dialogs until data payload information is shown.
Transcribe the hex-encoded ASCII data to obtain the revealed seed information, and skip through additional pages to reveal additional parts of the seed data.
In the example, the Data payload #1 reveals ep key program problem process input and the Data payload #2 page reveals result memory display defense broken in, with additional data following on the third page.
The revealed information is sufficient to fully recover the configured BIP39 secret.
Also see the PoC code documentation for additional context.

Coordinated disclosure

This disclosure was marked by significant delays and missing feedback when communicating with the vendor (KeepKey). Initially, they created a public patch for the issue on GitHub but did not respond to the confidential disclosure. After three weeks and two reminders, I got a direct response and technical confirmation, but then the contact broke off again and didn’t resume after multiple followups. Despite releasing public security patches and issuing a firmware release, I’m not aware of any public security notes or advisory by the vendor on this issue at the time of publishing of this blog post. This is a further regression of disclosure handling over the last disclosure process CVE-2022-30330 with this vendor in 2022, and may be related to ownership and team changes of the KeepKey product.

In summary, the overall coordinated disclosure progress and publication handling was neither motivating on the researcher side nor overall adequate in my opinion.

In future disclosures, I’ll consider releasing my disclosure information sooner in cases where vendors silently fix security issues during the disclosure period, depending on the patch publication and software release circumstances.

Relevant product

Product	Source	Known Affected Version	Fixed Version	Patch	Vendor Publications	IDs
KeepKey	GitHub	firmware `v.7.5.2` to `v7.6.0`	`v7.7.0`	PR337	none	CVE-2023-27892

A Note About Research Affiliation and Work Time

I want to emphasize that this research was done on my own time and initiative. In particular, it was not sponsored by SatoshiLabs, for whom I do some paid freelance security research on the related Trezor project.

Detailed timeline

Date	Information
2023-01-17	Confidential disclosure to KeepKey
2023-01-26	KeepKey publishes GitHub Pull Request no. 337 with security patch
2023-01-29	POC and additional analysis communicated to KeepKey
2023-02-05	Followup email to KeepKey requesting feedback
2023-02-06	Issue confirmation by KeepKey
2023-02-22	GitHub Pull Request no. 337 is merged
2023-03-06	MITRE assigns requested CVE
2023-03-07	Release of KeepKey firmware `v7.7.0` with security patch
2023-04-17	End of disclosure period
2023-04-17	Publication of this report
2023-04-19	Report: “Additional Attack Considerations” section extended

Bug bounty

At the time of the report publication, KeepKey has not offered a bug bounty.

Yubico libykpiv Vulnerabilities II

2022-08-29T20:00:00+02:00

I have discovered two new security issues in the Yubico libykpiv client-side code which were introduced as a regression in the 2.3.0 release. Flaws in the memory handling of the auth handshake procedure with a PIV smartcard could lead to memory corruption, denial of service or other unexpected behavior under some conditions. The practical security impact on tested production binaries appears to be limited.

This article will describe the issues.

Stack-out-of-bounds-write in ykpiv_authenticate2()
- Security Implications
Stack-use-after-scope in ykpiv_authenticate2()
- Security Implications
Coordinated Disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Stack-out-of-bounds-write in ykpiv_authenticate2()

The first issue is a code flaw related to insufficient length restrictions for smartcard-provided data. This issue is similar to previous libykpiv vulnerabilities and again leads to dangerous memory safety issues due to custom low-level memory handling.

The ykpiv_authenticate2() function performs a sequence of interactions with an external PIV smartcard such as a Yubikey 5 device connected via USB for smartcard actions that require authentication. During those steps, the host receives a cryptographic challenge from the smartcard:

static ykpiv_rc _ykpiv_authenticate2(ykpiv_state *state, unsigned const char *key, size_t len) {
  // [...]
  unsigned char data[261] = {0};
  // [...]

  /* get a challenge from the card */
  {
    int sw = 0;
    APDU apdu = {0};
    recv_len = sizeof(data);
    // [...]
    if ((res = _ykpiv_send_apdu(state, &apdu, data, &recv_len, &sw)) != YKPIV_OK) {
      goto Cleanup;
    }

ykpiv.c L993-L1008

While there is an upper bound on the received data that prevents any direct issues, the length recv_len of the reply is also reused for the cryptographic challenge from the host to the smartcard. This becomes an issue if recv_len is particularly large:

  uint32_t challenge_len = recv_len - 4;

  /* send a response to the cards challenge and a challenge of our own. */
  {
    int sw = 0;
    APDU apdu = {0};
    // [...]
    unsigned char *dataptr = apdu.st.data;
    *dataptr++ = 0x7c;
    *dataptr++ = 2 + challenge_len + 2 + challenge_len;
    *dataptr++ = 0x80;
    *dataptr++ = challenge_len;
    uint32_t out_len = challenge_len;
    drc = cipher_decrypt(mgm_key, challenge, challenge_len, dataptr, &out_len);
    if (drc != CIPHER_OK) {
      // [...]
    }
    dataptr += out_len;
    *dataptr++ = 0x81;
    *dataptr++ = challenge_len;
    challenge = dataptr;
    if (PRNG_GENERAL_ERROR == _ykpiv_prng_generate(challenge, challenge_len)) {

ykpiv.c L1016-L1043

The manual memory management via custom pointer advances becomes a liability here, since the upper bound on the recv_len is not sufficiently tied to how much data the apdu struct can hold at this point. As a result, _ykpiv_prng_generate(challenge, challenge_len) can end up writing behind the struct if the unchecked assumptions about the memory sizes are violated:

==259861==ERROR: AddressSanitizer: stack-buffer-overflow on address [...]
WRITE of size 128 at [...] thread T0
    #0 [...] in memset
    #1 [...] (/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
    #2 [...] in RAND_DRBG_generate (/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
    #3 [...] in RAND_DRBG_bytes (/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
    #4 [...] in _ykpiv_prng_generate /yubico-piv-tool/lib/internal.c
    #5 [...] in _ykpiv_authenticate2 /yubico-piv-tool/lib/ykpiv.c

Since _ykpiv_prng_generate() overwrites the target buffer with random data via OpenSSL’s RAND_DRBG_generate() function, a malicious smartcard that triggers this flaw doesn’t have control over the exact values that are written behind apdu on the stack during the out-of-bounds write, and the values will be different on each execution. This doesn’t mitigate the memory safety issue itself but definitely makes it harder to manipulate the stack data in a controlled way.

Security Implications

Due to the compiler- and target-specific aspects of the program stack layout, it is difficult to make global statements about the expected security implications of the stack-buffer-overflow or lack thereof. By my knowledge, the yubico-piv-tool binary that includes the libykpiv library is always compiled with stack canary protections for production builds, which should turn any out-of-bounds write to the stack canary segment into a controlled program crash and therefore a denial-of-service. Some limited analysis of the situation for Linux x86_64 binaries of the yubico-piv-tool 2.3.0 release suggests that the OOB write can’t reach the stack canary segment and “only” overwrites stack memory of other local variables that are not used at this point.

Given the nature of libykpiv as a library intended for use within other applications, it’s plausible that the affected code is also in use with other build system configurations or compilers where those observations do not apply.

My current understanding is that this flaw cannot be used to hijack the execution flow of the program or manipulate essential internal variables in yubico-piv-tool 2.3.0 and will at worst causes a crash, but my confidence of this is limited due to the outlined complexity. For example, there may be additional variations of this attack by malicious smartcards which are aware of the mgm_key secret that is shared between host and smartcard.

CVSS Score

ID	CVSS 3.1 Score	Parameters
`ykpiv_authenticate2()` stack OOB write	2.9 (Low)	AV:P/AC:H/PR:N/UI:R/S:U/C:N/I:L/A:L

Please note that this scoring assumes that there is a way to impact the availability of the libykpiv component, for example by writing into a segment of the stack memory that is protected by stack canaries or causing a segmentation fault, and that the attacker can get by without authentication secrets. During the disclosure process, we discussed Availability: High vs. Availability: Low impact scoring in such a scenario. Only the lower rating is reflected in the scoring above to accommodate the current uncertainty about practical availability impact.

Stack-use-after-scope in ykpiv_authenticate2()

The second issue is a code flaw related to accessing a C variable’s memory content after its valid lexical program scope.

The ykpiv_authenticate2() function contains multiple code regions with locally scoped variables, as well as variables that are used across multiple regions. Consider the challenge pointer which is defined early in the function:

  uint8_t *challenge = data + 4;

ykpiv.c L1015

As part of the challenge-response handshake, a locally scoped code region sets the challenge pointer to reference data in the apdu struct on the stack:

  /* send a response to the cards challenge and a challenge of our own. */
  {
    int sw = 0;
    APDU apdu = {0};
    apdu.st.ins = YKPIV_INS_AUTHENTICATE;
    apdu.st.p1 = metadata.algorithm;
    apdu.st.p2 = YKPIV_KEY_CARDMGM; /* management key */
    unsigned char *dataptr = apdu.st.data;
    // [...]
    challenge = dataptr;
    // [...]
  }

ykpiv.c L1018-L1042

The problem now occurs in the following code, which uses the memory referenced by challenge:

  /* compare the response from the card with our challenge */
  {
    uint32_t out_len = challenge_len;
    drc = cipher_encrypt(mgm_key, challenge, challenge_len, challenge, &out_len);

    if (drc != CIPHER_OK) {
      if(state->verbose) {
        fprintf(stderr, "%s: cipher_encrypt: %d\n", ykpiv_strerror(YKPIV_AUTHENTICATION_ERROR), drc);
      }
      res = YKPIV_AUTHENTICATION_ERROR;
      goto Cleanup;
    }

    if (memcmp(data + 4, challenge, challenge_len) == 0) {

ykpiv.c L1063-L1076

While the challenge pointer variable itself is valid throughout the ykpiv_authenticate2() function, the stack memory it references has gone out of scope together with the apdu variable at that point. This leads to an AddressSanitizer: stack-use-after-scope error on debug builds with compiler sanitizers. AddressSanitizer warns on the memcmp(data + 4, challenge, challenge_len) call via __interceptor_memcmp, but the cipher_encrypt(mgm_key, challenge, challenge_len, challenge, &out_len) call should be affected by this as well.

There is no security mechanism to detect this in production builds.

Security Implications

In theory, the C compiler is allowed to make arbitrary changes to the referenced stack memory content once it is no longer in scope, e.g., to overwrite it with other variables or clear it. Using this memory again leads to undefined behavior.

The stack-use-after-scope issue is triggered on each successful execution of ykpiv_authenticate2(), but I’m not aware of any bug reports of functional issues in the handshake that are expected if there is a change in memory behavior, and nothing like that has been indicated by Yubico during the disclosure. Therefore, I think Yubico got lucky with the bug behavior, since the relevant compilers for the production binaries apparently decided to leave the memory content intact long enough so that the logical program execution works as originally intended (likely because it is the fastest behavior). Since I’m not aware of a practical way for an attacker to influence this behavior and use it to their advantage, I’m handling it as a non-issue in terms of practical security impact on the 2.3.0 release.

The most worrying aspect to me is that this bug made it into a stable release without getting detected by static analysis tools or dynamic analysis at runtime despite being present during every smartcard authentication. This suggests that the associated test suites should be improved.

Coordinated Disclosure

I have the impression that Yubico is currently not assigning a lot of resources or priority to security disclosure handling of their client-side open source libraries. During this coordinated disclosure process, it took almost two months to get a technical reply, and neither a release nor a patch was published during the 90-day disclosure timeframe as far as I’m aware.

The discovered security issues certainly aren’t the most severe, but memory corruption and undefined behavior issues are often difficult to classify as benign with a high certainty due to the amount of compiler- and architecture-related assumptions that may or may not hold in practice for all affected users. Since we didn’t identify a practical security impact on any of the tested production binaries, I decided not to ask for a CVE ID assignment at the moment.

In light of the other difficulties and delays observed during the previous disclosures to Yubico, the current situation is neither very encouraging for researchers who report issues nor adequately reducing the risk to end users via prompt security patches in my opinion.

Relevant yubico-piv-tool / libykpiv Sources

To my knowledge, both regression issues were introduced after the 2.2.0 stable version release tag and are only present in the 2.3.0 stable release.

Variant	Source	Affected	Fix	References
Yubico upstream	GitHub	version `2.3.0`	version `2.3.1`, patch 1 via PR402	no known public references

The previous libykpiv vulnerability article contains a list of related other sources such as Linux distributions.

Detailed Timeline

Date	Information
2022-05-28	Disclosure of issue to Yubico
2022-06-01	Yubico confirms receipt of disclosure
2022-06-13	Followup to Yubico to query disclosure status
2022-07-21	Yubico confirms the technical issue, describes some analysis, proposed CVSS scoring
2022-07-30	Reply to Yubico with technical discussion, feedback on proposed scoring, discussion about criteria for potential CVE assignment
2022-08-08	Yubico replies on proposed scoring
2022-08-15	Reply to Yubico with technical discussion, followup question on crashing behavior, discussion about criteria for CVE assignment
2022-08-26	End of 90-day disclosure period
2022-08-29	Publication of this article
2022-10-03	Yubico adds a patch for both issues to the public code repository
2023-02-07	Yubico releases patched `libykpiv` version

Bug Bounty

The vendor did not offer a bug bounty.

KeepKey Supervisor Vulnerabilities (CVE-2022-30330)

2022-05-18T16:00:00+02:00

The article describes several vulnerabilities in the KeepKey hardware wallet. Flaws in the supervisor interface can be exploited to bypass important security restrictions on firmware operations. Using these flaws, malicious firmware code can elevate privileges, permanently make the device inoperable or overwrite the trusted bootloader code to compromise the hardware wallet across reboots or storage wipes.

The new discovery has implications for code execution attacks such as CVE-2021-31616, attacks with some level of physical access as well as the general trust expectations for the wallet system integrity after the installation of unofficial firmware.

High-Level Summary
Technical Introduction
The Vulnerabilities
POC
Attack Scenario and Security Implications
Coordinated disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

High-Level Summary

The following article is highly technical, so here is a slightly less-technical summary.

The KeepKey hardware wallet has some basic protections in place to limit what some parts of its software can do. This gives trust in the device by making it harder to backdoor permanently via malware, similar to modern smartphone systems.

The new flaws in KeepKey protections that I discovered basically allow a “Jailbreak” of the KeepKey. The main program on the device can break out of the protective cage it is in. This may be useful for some power users who want more control over their device, but it’s also useful for attackers who temporarily made it onto the device somehow or have physical access and can install custom firmware. They can use these flaws to permanently corrupt the core device software.

A device with malicious core software no longer has to follow the normal rules. It could generate new mnemonic secrets that an attacker has access to, lie to you about installing updates or attack your computer via USB. It can also erase itself and stop working at any time. This is clearly a bad situation for trusting the device with funds, and the extra annoying part is that it is difficult to find out if a device is malicious, for example if you buy a new one tomorrow from a less-trustworthy seller. Unfortunately, the hologram stickers won’t help you and wiping the device storage or reinstalling the firmware is not enough.

My main recommendation is to swiftly install the new security patches. However, if you have previously used firmware v7.0.3 on computers or websites you don’t fully trust, it may be a good time to read up on CVE-2021-31616, check your funds and change your mnemonic seed or device.

Be extra careful about new devices that you buy, as this vulnerability makes it cheaper for attacker to corrupt them.

Technical Introduction

This article focuses on breaking the security supervisor code implementation of the KeepKey hardware wallet. To understand the context, first a little primer on what this software component is supposed to be doing.

The ARM Cortex M3 microcontroller series does not have any multi-tasking capability or sophisticated process security concepts that one may expect from larger processors. Instead, the available hardware-assisted protections consist of a two-level privilege concept for code separation at runtime which is enforced through hardware-assisted privilege level handling and memory protection settings. The Trezor and KeepKey system designs use this privilege system to limit potential actions of malicious firmware, especially for the flash write operations, with the goal to harden the overall system or at least make security issues observable to the user. This is done through a software root-of-trust concept based on a trusted bootloader, combined with cryptographically signed firmware releases. The bootloader controls firmware updates, checks firmware signatures on device startup, and provides the code for the supervisor component that is active after boot.

Essential configuration steps during startup:

The Memory Protection Unit (MPU) for the lower-privileged operation mode is configured to disallow access to flash controller related memory areas, the flash, the bootloader RAM section and other memory. This limits the internal access of the firmware.
The code drops permissions by switching into a lower-privileged mode before starting the main firmware (for custom firmware) or briefly after the start of main firmware execution (for signed firmware). From this state, hardware protections ensure that the firmware is not able to directly re-enter the privileged mode or change the MPU configuration. This helps to limit the impact of code issues or compromises of the main firmware during normal operations.

On the KeepKey, the supervisor logic mainly focuses on guarding flash operations. All flash writes of the firmware are proxied through the supervisor code via custom interrupts. The svc_handler_main() is tasked with the role of a gatekeeper for potentially dangerous accesses.

However, I’ve discovered that this code is broken in several ways, which completely undermines the sandbox design and allows the firmware to break out of it.

The Vulnerabilities

During security research in February 2022, I took a closer look at the supervise.c code and found several flaws. They are clustered into several sections with similar issue patterns.

Insufficient Protection of Flash Sector Erase Functionality (VULN-22004)

The ARM Cortex M onboard flash is divided into a number of differently sized flash sectors. On the STM32F205 chip that the KeepKey uses, they have the id 0 to 11. Sector numbers go up to 23 on other STM32 chip series.

For technical reasons, the supervisor function call parameters of svc_* functions are typically passed as unsigned 32-bit integer variables during the interrupt handling. As a result, despite the limited numerical range that is actually required to describe the target sector, svhandler_flash_erase_sector() accepts and internally uses the full 32-bit uint32_t sector for describing the flash sector ID that should be erased.

This choice of parameter type is problematic.

The defensive code checks on the flash erase are designed to reject the three specific sector numbers of 0, 5 and 6 that correspond to important flash areas for the bootloader and for the microcontroller configuration that are exclusively controlled by the bootloader. Aside from the three numbers on the blocklist, they allow the main firmware to request erasures of all other sectors.

Here is the corresponding code:

uint32_t sector = _param_1;

// Do not allow firmware to erase bootstrap or bootloader sectors.
if ((sector == FLASH_BOOTSTRAP_SECTOR) ||
    (sector >= FLASH_BOOT_SECTOR_FIRST && sector <= FLASH_BOOT_SECTOR_LAST)) {
  return;
}

supervise.c

The sector erase is done via a libopencm3 library call:

// Erase the sector.
flash_erase_sector(sector, FLASH_CR_PROGRAM_X32);

supervise.c

Crucially, the libopencm3 library function is defined as follows:

void flash_erase_sector(uint8_t sector, uint32_t program_size)

libopencm3 documentation

Why is this a problem?

svhandler_flash_erase_sector() treats the sector number as an unsigned 32 bit number, and incorrectly expects the flash library function to count the same way. Instead, the difference in sector integer type leads to a well-defined but lossy unsigned integer conversion of the sector number down to the uint8_t type before it is handed over to the library function.

This conversion maps multiple larger numbers into the forbidden sector numbers 0, 5 and 6. An attacker can use this to completely bypass the defensive checks shown previously. For example, a deletion request for the sector 256 passes the checks but then actually asks the library to erase the forbidden sector 0.

Using this flaw, malicious firmware can request the erasure of any flash sector.

Insufficient Protection of Flash Block Write Functionality (VULN-22005)

The KeepKey supervisor interface has two functions for flash writes:

svhandler_flash_pgm_word() for writing individual 32-bit words to flash
svhandler_flash_pgm_blk() for writing larger blocks of memory to flash

VULN-22005 concerns the block write functionality. The code has existing defenses that detect overflows of the address calculation. It also checks that the beginAddr and beginAddr + length pointers are not in the forbidden memory regions of sectors 0 or 5 & 6.

Here is the first part of the code checks:

// Do not allow firmware to erase bootstrap or bootloader sectors.
if (((beginAddr >= BSTRP_FLASH_SECT_START) &&
      (beginAddr <= (BSTRP_FLASH_SECT_START + BSTRP_FLASH_SECT_LEN - 1))) ||
    (((beginAddr + length) >= BSTRP_FLASH_SECT_START) &&
      ((beginAddr + length) <=
      (BSTRP_FLASH_SECT_START + BSTRP_FLASH_SECT_LEN - 1)))) {
  return;
}

supervise.c

However, these defenses have are incomplete. They do not prevent a situation where beginAddr points in front of the forbidden region and beginAddr + length points behind it. In other words, whole bootloader sections can be overwritten as long as at least one extra byte behind and in front of them is also overwritten.

Using this flaw, malicious firmware can modify protected flash memory in bulk.

Limitations of this Attack

Similarly to svhandler_flash_pgm_word(), the block write has the typical limitations when writing data to physical flash memory, which means it can only change flash memory bits from 1 to 0. If this were the only vulnerability a malicious firmware had access to, modifications would be limited to flipping bits in one direction in the existing flash data contents. However, this attack can be combined with vulnerability VULN-22004 from the previous section, which makes the data limitation go away. By first erasing the targeted flash region and then overwriting it, memory content can be modified arbitrarily.

During practical testing, writing into sector 0 using the svhandler_flash_pgm_blk() does not work. The attack requires at least one write operation in front of the targeted sector. However, the required flash write in front of sector 0 is not seen as valid by the microcontroller and the operation gets stuck. The memory in front of sector 0 is “reserved” according to datasheet. It may be possible to circumvent this problem by using some other undocumented edge case behavior. However, I haven’t explored this edge case further after the discovery of another attack that doesn’t share this limitation.

Writing over the combined sector block 5+6 works as described, see the proof-of-concept.

Unrestricted Memory and Flash Overwrite via Supervisor Functions (VULN-22006)

While looking into additional problems of VULN-22005, I noticed that the arbitrary pointer “write data from the source to the destination” construction of svhandler_flash_pgm_blk() and “write this value to the destination” of svhandler_flash_pgm_word() are very powerful primitives. The blocklist-based defense has shown to be incomplete, are there other ways to misuse them?

After digging a bit deeper, I realized that one needs to view these functions as privileged memory write gadgets (both functions) or a privileged memory read gadget (via svhandler_flash_pgm_blk()). This is because the STM32 uses memory-mapped IO to write to the flash and has one continuous memory region. In other words, the microprocessor generally treats flash content as normal memory and writes to it word-wise with direct assignments, or smaller writes if necessary. Therefore, the libopencm3 flash functions can essentially be used to write or read any other data in the STM32 address space if they’re called with target pointers outside of flash space.

For example, the flash_program_word() essentially prepares the flash write, unlocks the flash and then does a simple write:

void flash_program_word(uint32_t address, uint32_t data)
{
	/* Ensure that all flash operations are complete. */
	flash_wait_for_last_operation();
	flash_set_program_size(FLASH_CR_PROGRAM_X32);

	/* Enable writes to flash. */
	FLASH_CR |= FLASH_CR_PG;

	/* Program the word. */
	MMIO32(address) = data;

	/* Wait for the write to complete. */
	flash_wait_for_last_operation();

	/* Disable writes to flash. */
	FLASH_CR &= ~FLASH_CR_PG;
}

supervise.c

Crucially, the MMIO32(address) = data; succeeds even if it’s not in flash related memory space. The svhandler_flash_pgm_blk() works similarly and can also be used to copy secret information out of protected memory.

Since this write operation happens in the context of the privileged bootloader code, it does not falls under the restrictive MPU protections for the unprivileged thread. This is a huge problem for the supervisor integrity. The supervisor operates on its own little memory stack that’s protected by the MPU from interference by the main firmware:

  // SRAM (0x2001F800 - 0x2001FFFF, bootloader protected ram, priv read-write
  // only, execute never, disable high subregion)
  MPU_RBAR = BLPROTECT_BASE | MPU_RBAR_VALID | (2 << MPU_RBAR_REGION_LSB);
  MPU_RASR = MPU_RASR_ENABLE | MPU_RASR_ATTR_SRAM | MPU_RASR_DIS_SUB_8 |
             MPU_RASR_SIZE_2KB | MPU_RASR_ATTR_AP_PRW_UNO | MPU_RASR_ATTR_XN;

memory.c

The memory region protection falls apart if the main firmware can make the privileged thread corrupt its own stack with targeted writes. This has a significant impact on the bootloader code integrity at runtime. Practical impact may be limited a bit by stack protection and other defenses, but those can likely be circumvented through additional writes.

Additionally, in the global address space of the STM32, important device control registers are memory-mapped to special positions. The unprivileged firmware can access them with through the same flaw, for example the flash controller:

  // by default, the flash controller regs are accessible in unpriv mode, apply
  // protection (0x40023C00 - 0x40023FFF, privileged read-write, unpriv no,
  // execute never)
  MPU_RBAR = 0x40023c00 | MPU_RBAR_VALID | (4 << MPU_RBAR_REGION_LSB);
  MPU_RASR = MPU_RASR_ENABLE | MPU_RASR_ATTR_PERIPH | MPU_RASR_SIZE_1KB |
             MPU_RASR_ATTR_AP_PRW_UNO | MPU_RASR_ATTR_XN;

memory.c

This can have additional impact, although the MPU still protects some parts of the flash, so there is a remaining barrier against direct modifications of sector 0.

How can we break the remaining defenses?

The explicit memory region defense logic of the mentioned flash write functions assumes that there is only one canonical way to address and overwrite the protected flash sections. However, this assumption is wrong: as the STM32F205 datasheet hints at on page 66, other memory regions such as 0x0000 0000 to 0x000F FFFF can alias into the flash memory range. Here is a helpful visual overview of relevant memory regions.

What does this mean? Depending on the microcontroller system configuration, the lower memory ranges map directly into flash memory, just as the “main” flash memory section starting at 0x08000000 does. The main difference is that the supervisor flash functions forbid access to the protected sectors in the 0x080.... regions due to the address comparisons, but they completely allow all writes to the 0x000.... region.

Bingo! We’ve just broken the remaining bootloader and trusted boot code integrity defenses.

At this point, I would like to give some credits to Thomas Roth and the rest of the wallet.fail team. They published this memory alias based attack concept as part of the F00DBABE attack in 2018, see the talk section of their classic 35C3 presentation. I half-remembered, half re-discovered this on my own for the KeepKey, but their work is clearly a direct inspiration for the attack idea.

By making the privileged thread write into the aliased flash region, the write protections for sectors 0, 5 and 6 are circumvented without the strict need for special offsets or complete sector overwrites. This allows more targeted overwrites of individual areas than the previously described VULN-22005 vulnerability.

As a result of this attack, the complete flash memory can be replaced with arbitrary contents, which breaks the core security model of the KeepKey root of trust.

POC

Please read the following section carefully.

By the nature of the KeepKey hardware wallet design, access to SWD and other debug interfaces is permanently disabled on production devices and production firmware. This is done with the explicit goal to prevent read or write access to the flash. As a result, there is no intended or straightforward way to recover from problems with the boot-related flash memory. Testing the issues discussed in this article directly requires erasing or modifying flash content in those essential sectors, so there is a good chance that you’ll permanently turn your test device into a dead device. No, it’s not resting - it’s stone dead! 🦜.

To prevent any devices from passing on due to catastrophic flash writes, it is required to both

Have a custom KeepKey with an unlocked STM32F205 microcontroller that is not in RDP2 state.
Use custom compiled variants of bootloader and firmware which do not lock it.

A custom KeepKey devkit can be built by SMD rework, specifically by replacing the TQFP64 chip with a new chip in factory configuration and programming the custom bootloader and firmware variants.

In this configuration, a hardware debugger like the STLINK-V3 can be connected and used to restore flash contents externally as well as controlling the execution. Note that the MPU and thread privilege mechanisms are still active, the unit is just at RDP0 debug protection level. The POC section describes testing steps with such a setup.

The following proof-of-concept steps will be deadly to your device unless you have working hardware debugger access. You have been warned.

POC for VULN-22004 and VULN-22005

This is a combined proof-of-concept for two issues.

For VULN-22004, the sector number 261 is used to target sector 261 % 256 = 5.

// connecting to SWD debug, firmware in idle
display_refresh () at /root/keepkey-firmware/lib/board/keepkey_display.c:296

// original sector 5 bootloader start
// normal code content
(gdb) x/16xb 0x08020000
0x8020000:    0xe8    0xfe    0x01    0x20    0xef    0x0c    0x02    0x08
0x8020008:    0xe9    0x13    0x02    0x08    0xf9    0xa2    0x02    0x08

// using vulnerability VULN-22004 to erase sector no. 5
// this triggers the supervisor call from the main firmware
(gdb) call svc_flash_erase_sector(261)

// sector 5 is now in erased 0xff state
// attack successful
(gdb) x/16xb 0x08020000
0x8020000:    0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff
0x8020008:    0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff

// arbitrarily chosen memory region with source data
(gdb) x/16xb 0x20000000 + 1024*10
0x20002800 <shadow_config+32>:    0x33    0x30    0x30    0x30    0x31    0x43    0x30    0x30
0x20002808 <shadow_config+40>:    0x00    0x00    0x00    0x00    0x10    0x00    0x00    0x00

// using vulnerability VULN-22005 to overwrite the bootloader sector 5 and 6 with memory from the SRAM source
(gdb) call svc_flash_pgm_blk(0x0801FFFF, 0x20000000 + 1024*10 , 0x20000*2+1)
// [...]

// result shows that the flash is overwritten with the provided values
// note the expected 1-byte address offset due to the target offset
(gdb) x/16xb 0x08020000
0x8020000:    0x30    0x30    0x30    0x31    0x43    0x30    0x30    0x00
0x8020008:    0x00    0x00    0x00    0x10    0x00    0x00    0x00    0x7c

POC VULN-22006 - Attacking Privileged SRAM Region

// connecting to a SWD debug, firmware in idle

// show target area in supposedly secure bootloader ram
(gdb) x/16xb 0x2001F800
0x2001f800:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x2001f808:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

// attack via vulnerability VULN-22006
// 32-bit word write variant
(gdb) call svc_flash_pgm_word(0x2001F800, 0x42424242)
$2 = true

// show successful write
(gdb) x/16xb 0x2001F800
0x2001f800:    0x42    0x42    0x42    0x42    0x00    0x00    0x00    0x00
0x2001f808:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

// connecting to a SWD debug, firmware in idle

// show target area in supposedly secure bootloader ram
(gdb) x/16xb 0x2001F800
0x2001f800:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x2001f808:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

// attack via vulnerability VULN-22006
// block write variant
// copy data from firmware SRAM to bootloader SRAM
// chosen source and size are arbitrary examples
(gdb) call svc_flash_pgm_blk(0x2001F800, 0x20000000 + 1024*10, 16)
$1 = true

// show successful write
(gdb) x/16xb 0x2001F800
0x2001f800:    0x33    0x30    0x30    0x30    0x31    0x43    0x30    0x30
0x2001f808:    0x00    0x00    0x00    0x00    0x10    0x00    0x00    0x00

For comparison, the following call with firmware-level access would lead to a memory exception due to the MPU:

(gdb) call memset(0x2001F800, 0x00, 4)

POC VULN-22006 - Attacking Privileged Flash Region

// inspect sector 0 beginning at the main address
(gdb) x/16xb 0x8000000
0x8000000:    0xf8    0xff    0x01    0x20    0xc7    0x01    0x00    0x08
0x8000008:    0x2b    0x02    0x00    0x08    0x29    0x02    0x00    0x08

// write 0x00000000 into address 0x0, which aliases to 0x8000000
(gdb) call svc_flash_pgm_word(0x0, 0x0)
$2 = true

// show successful write
(gdb) x/16xb 0x8000000
0x8000000:    0x00    0x00    0x00    0x00    0xc7    0x01    0x00    0x08
0x8000008:    0x2b    0x02    0x00    0x08    0x29    0x02    0x00    0x08

Attack Scenario and Security Implications

See the high-level summary.

The discovered KeepKey issues apply to all recent bootloader versions since the problems in supervisor.c have been present for multiple years.

Coordinated disclosure

The coordinated disclosure went similarly to the VULN-22003 disclosure that started slightly earlier in February with the same vendor. I received a lot of good feedback and confirmation in a technical call about two weeks into the disclosure.

Unfortunately, there was a significant gap in the communication in April where I was unable to reach them via multiple communication channels. As a result, I did not have a chance to comment on their patch set before the release or coordinate with them on a publication date. It’s good to see that they still released a firmware fix and public acknowledgment within the 90-day timeframe. I have been able to re-establish communications in May.

I’m looking forward to the full vendor advisory, which has not been released at the time of writing.

Relevant product

Product	Source	Known Affected Version	Fixed Version	Patch	Vendor Publications	IDs
ShapeShift KeepKey	GitHub	bootloader ≤ bl_v2.0.0	bootloader bl_v2.1.4	patch1	bl_v2.1.4 + v7.3.2 GitHub Changelog	CVE-2022-30330 VULN-22004, VULN-22005, VULN-22006

I’m not aware of other hardware wallets with practical security impacts.

Please note that I’ve included SatoshiLabs in the disclosure communication due to the Trezor One product to ensure that there are no practical vulnerabilities on the Trezor side where some the code originated from after finding a minor code issue. Ultimately, the Trezor One did not have any practical issues and we did not switch to a full multi-vendor format for the coordinated disclosure. This approach was discussed with both vendors.

A Note About Research Affiliation and Work Time

I want to emphasize that the main work for this security research was done on my own time and initiative. In particular, the original research that led to the discovery of the issue was not sponsored by SatoshiLabs.

With agreement by ShapeShift, I spend some paid hours on extended background research to evaluate the potential security impacts of related issues on the Trezor project for SatoshiLabs.

Detailed timeline

Date	Information
2022-02-23	Confidential disclosure to ShapeShift, with CC to SatoshiLabs
2022-03-10	Technical call with ShapeShift, ShapeShift acknowledges the issues
2022-04-26	ShapeShift releases patched bootloader version `bl_v2.1.4` together with firmware `v7.3.2`
2022-04-26	ShapeShift publishes a short advisory summary via the GitHub tag description
2022-05-07	`CVE-2022-30330` assigned by MITRE
2022-05-18	Publication of this blog article

Bug bounty

ShapeShift paid a bug bounty for this issue.

KeepKey Message State Handling Vulnerability (VULN-22003)

2022-05-05T14:00:00+02:00

The article describes a vulnerability in the KeepKey hardware wallet which allows triggering specific wallet functionality at times when it should not be available. Under certain limited conditions, this may be used to trick users into accepting unwanted actions on the device.

Introduction
The Vulnerability
- Attack Scenario and Security Implications
- Fix
Coordinated disclosure

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Introduction

The details of this issue revolve around low level concepts and implementation details in the KeepKey firmware. The code area in question has been the source of other serious issues before, for example CVE-2019-18671, and was originally derived from the Trezor One firmware several years ago.

The limited hardware capabilities on the KeepKey wallet and its finite state machine (FSM) message handling require strong restrictions on how new logical tasks can be scheduled or interrupted to prevent errors. For user-facing tasks such as the confirmation of cryptocurrency transactions, it is also meaningful to disallow user interface actions to interrupt each other. This helps keeping the flow simple and unambiguous for the user while important actions such as transaction confirmations are performed.

In the codebase, this is implemented by the separation of communication messages into two classes: normal and tiny messages. Normal messages can trigger complex new tasks. In contrast, tiny messages are focused on essential user input and cancellations. If a message-related interaction is needed during a complex action, the global message handling is restricted to tiny messages, avoiding major interruptions and other problems. In the code, a global Boolean state variable determines if message processing is restricted or not.

The Vulnerability

The KeepKey developers made a number of changes to the USB packet handling code after adopting it from the original codebase in 2014. One of the changes was to rename the global tiny variable to msg_tiny_flag:

/* Tiny messages */
static bool msg_tiny_flag = false;

messages.c

The msg_tiny_flag was still used in an identical role for global message state handling after the rename.

In 2018, the KeepKey developers adopted U2F support (a two-factor authentication protocol) based on code from the Trezor. During this software port, they apparently missed the difference in the message handling on the KeepKey side and re-introduced a global tiny variable for use with the U2F code:

static volatile char tiny = 0;

usb.c

char usbTiny(char set) {
  char old = tiny;
  tiny = set;
  return old;
}

usb.c

usbTiny(1);

u2f.c

As a result of the double state handling in the KeepKey, the message restrictions of the U2F functionality and of all other message handling functionality are independent of each other and don’t lock in the originally intended way. This means that U2F actions and U2F dialogs can still be invoked while other functionality of the KeepKey has triggered the restricted message mode, and the same is true in the other direction as well.

Attack Scenario and Security Implications

The possibility to interrupt important dialogs and button confirmations breaks user assumptions about the basic interaction with the device. It can be leveraged to trick the user into interacting with the new dialog B that pops up while dialog A is supposed to be ongoing. Fortunately, this attack is limited to U2F <> non-U2F dialog combinations.

The most relevant attack scenario that I could find is related to a two-factor authentication (2FA) bypass with user interaction:

Precondition: local malware on the host computer has captured user credentials for a target website that is protected with U2F via the KeepKey, such as an important banking account that the KeepKey is registered to.
To start the attack, the malware first triggers some plausible user dialog on the KeepKey device that the user has to physically confirm by holding down the single button.
- Alternatively, the malware waits until the user interacts with the wallet and requests some action on their own that has similar characteristics.
Confirmation dialogs for important actions on the KeepKey typically require several seconds of continuous button pressing.
During this time frame, the malware can launch an U2F confirmation request for the target website by proxying the U2F challenge of the attempted login.
Due to the message-handling bug, the U2F dialog is able to interrupt the existing dialog on the wallet while the user interacts with the harmless unrelated dialog that was started first.
Importantly, the U2F dialog confirmation is implemented in a way that it only requires a single split-second button press and accepts buttons that are in the pressed state when it is invoked.
- If the U2F dialog is triggered while the KeepKey button is pressed, it is accepted immediately.
- Alternatively, the user may accidentally accept the dialog normally when expecting the other dialog.
The U2F dialog briefly show up on the screen (not stealthy), but there is no way to abort or undo the U2F confirmation once accepted.
As an end result, the malware has gained access to the target website and bypassed the 2FA.

Note that this attack is based on a number of preconditions with regards to the host malware capabilities, known information, existing use of the KeepKey as U2F hardware token and user interaction. It also operates at the edge of what U2F is normally protecting against, since most U2F tokens do not have a screen to show what is being accepted. Still, without the discovered message handling vulnerability, this attack scenario would not be possible.

Fix

In their current patch, the KeepKey developers have not solved the problem of the double message state handling. Instead, they have chosen to apply a partial mitigation for the described U2F bypass attack by preventing U2F dialogs from getting auto-accepted immediately. This is definitely an improvement, but doesn’t fully resolve the underlying issue in my opinion.

    // wait for next commmand/ button press
    reader->cmd = 0;
    reader->seq = 255;
+    bool saw_button_up_at_least_once = false;
    while (dialog_timeout > 0 && reader->cmd == 0) {
      dialog_timeout--;
+      saw_button_up_at_least_once = saw_button_up_at_least_once || keepkey_button_up();
      usbPoll();  // may trigger new request
      // buttonUpdate();
-      if (keepkey_button_down() &&
+      if (saw_button_up_at_least_once && keepkey_button_down() &&
          (last_req_state == AUTH || last_req_state == REG)) {
        last_req_state++;
        // standard requires to remember button press for 10 seconds.

u2f.c

Formal Scoring

The mapping shown here aims to represent the impact of the U2F bypass scenario described above, but the issue is difficult to score. There may be different impact through other attack combinations.

Description	CVSS 3.1	Score
VULN-22003	CVSS:3.1/AV:L/AC:H/PR:L/UI:R/S:U/C:L/I:L/A:N	3.3 (Low)

Coordinated disclosure

The disclosure process with ShapeShift started out well, with good direct feedback. Unfortunately, there was a significant gap in the communication in April where I was unable to reach them via multiple communication channels. As a result, I did not have a chance to comment on their patch before the release or coordinate with them on a publication date. Still, it’s good to see that they released a firmware fix and public acknowledgment within the 90 day timeframe. I have recently heard back from them in May.

Relevant product

Product	Source	Known Affected Version	Fixed Version	Patch	Publications	IDs
ShapeShift KeepKey	GitHub	v7.2.1	v7.3.2	patch1	v7.3.2 Changelog	VULN-22003

I’m not aware of other affected hardware wallets.

Note that I’ve included SatoshiLabs in the disclosure communication to ensure that there are no related vulnerabilities on the Trezor side where some the code originated from. We did not find an issue in the Trezor One product that required switching to a multi-vendor format for the coordinated disclosure.

A Note About Research Affiliation

I want to emphasize that this security research was done on my own time and initiative. In particular, the original research that led to the discovery of the issue was not sponsored by SatoshiLabs, for whom I do some paid freelance security research on the related Trezor project.

Detailed timeline

Date	Information
2022-02-09	Confidential disclosure to ShapeShift, with CC to SatoshiLabs
2022-02-10	ShapeShift acknowledges receipt of the disclosure and assigns a VULN ID
2022-03-10	Technical call with ShapeShift
2022-04-26	ShapeShift releases patched firmware version v7.3.2
2022-05-05	Publication of this blog article

Bug bounty

ShapeShift paid a bug bounty for this issue.

Building the PicoEMP EMFI Tool

2022-03-15T11:10:00+01:00

The ChipSHOUTER PicoEMP is an electromagnetic fault injection (EMFI) tool by NewAE that is available as a do-it-yourself project.

Introduction
- Disclaimer
PicoEMP Assembly
PicoEMP Probe Assembly
Summary

Consulting

I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.

Introduction

What is EMFI? By applying a short electric pulse through a coil, a localized and strong electromagnetic field can be generated. This field injects voltages into inner parts of electronic chips that are near some parts of the coil, which triggers all kinds of side effects that the chip designers did not expect or want to happen. For some processors, EMFI can be used to alter the program behavior in ways that are interesting from a security perspective, such as skipping CPU instructions.

There are other fault injection methods like voltage glitching or clock glitching. Typically, they are based on interfering with electrical connections and components that are exposed outside of the chip. These techniques have a lot of value as well, and I use them via the ChipWhisperer (also made by NewAE), as described in other articles. However, there are a few significant differences of EMFI over those methods:

The pulse acts contact-less and through material - although typically only within very short distances
The pulse acts upon individual internal areas of the targeted chip
There is a good chance of bypassing existing hardware protections against other fault injection attacks

As a result, EMFI can be used trigger some effects that are hard or impossible to reach with more classical techniques.

I saw the PicoEMP project release in January and quickly decided to build one. I ordered the PCB, PCB stencil and electronic parts for both the device and a number of custom coil probes. My choices were mainly based on the hardware bill of materials (BOM) recommendations in the project repository and what was available at electronics distributors at the time. Despite the component shortages, I managed to get suitable parts for everything.

Disclaimer

The PicoEMP generates ↯ high voltage ↯ and may permanently harm you, your environment or any electronics around it. The following build photos and remarks have not gone through exhaustive safety checks, so assume that they are wrong and use your own judgment. This article is not at all a step-by-step tutorial and there are many aspects of a DIY project that can go catastrophically wrong.

Without warranty of any kind. Build and use at your own risk.

PicoEMP Assembly

The assembly starts with the PCB. Due to known issues with the design files, the ordered PCB is missing the two milled slots near the high-voltage area and for the two enclosure tabs. I was able to work around those issues to still mount the safety shield (as explained later) but recommend checking twice if the PCB manufacturer in question correctly handles the design in the manufacturing preview.

The rest of the board looked acceptable, so I went ahead with the build.

Note the missing milling in the left section between the white text sections and components area" srcset=" /assets/images/resized/600/picoemp_assembly1_pcb1.jpg 600w, /assets/images/resized/1150/picoemp_assembly1_pcb1.jpg 1150w, /assets/images/picoemp_assembly1_pcb1.jpg 1600w" sizes="(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="" />

PCB frontside of the PicoEMP
Note the missing milling in the left section between the white text sections and components area

Since most of the components are SMD parts, I ordered a matching solder mask stencil to distribute the solder paste.

Metal stencil for SMD solder paste

After adding of the paste, the PCB looks like this:

PCB with solder paste

Closeup of SMD footprints with paste

With the paste applied, I started to place the components for the right-hand side of the PCB. The large component footprint in the center is for the pre-made Raspberry Pico2040 board which I decided to hand-solder. It is not included in the solder paste steps and added later.

Build suggestion: watch out for the correct orientation of the LEDs, as they can look visually different on the front while oriented in the same way, and the silkscreen direction marking is easy to miss. I think the switches only have to be correct left-right and are wired identically on both sides, but you should check this out yourself.

PCB with first components

After the components of the right-hand side were in place, I used a hot air station to melt the solder paste. A proper reflow oven would be more reliable and consistent, but the DIY technique works for prototypes like this. The SMD switches are sensitive to heat, so I will likely hand-solder them with a soldering iron next time.

PCB with soldered components on the right side

Next I started placing the components for the high voltage section.

Build suggestion: this requires some attention to detail, especially with the direction of the transformers, diodes and the phototransistor.

PCB component placing step

All PCB components in place

Closeup view of components before soldering

The hot air station soldering of the components went fine, except for the optional switch SW3 and pin connector J3 which deformed a bit in the heat, but they are still functional. As mentioned before, on potential future builds I will likely hand-solder these components to avoid heat damage.

The next photo also has the SMA connector soldered on at the left board edge and protected with white shrink-wrap:

PCB after assembly, left side

On the right sight of the PCB, the Raspberry Pico and the battery connector is soldered on:

PCB after assembly, right side

Due to the milling issue, it was necessary to customize the protective plastic enclosure by stripping off its tabs. The screws that come with the enclosure don’t fit into the mounting holes properly, so I’ve used a pair of smaller screws that do not have this problem.

Build result:

Final status with shield and probe

I’ve flashed the microcontroller with the C firmware, which worked without an external programmer due to the mass storage support. Up and running, the PicoEMP works and successfully glitches targets. Great!

The only bug I’ve noticed so far is that the arming button is unreliable on my unit. I think this is a software problem, but will re-check the electrical button behavior at some point to debug this further.

PicoEMP Probe Assembly

The previous section covered the build process of the main device, but there are additional parts required for fault injection, namely probe tips. I’ll show some of them here as well.

The PicoEMP can be fitted with a number of different electrical coil probes. It is important that they are exchangeable since otherwise the PicoEMP would be limited to one specific coil characteristic. In general, the probes are mainly designed around their SMA connector that allows easy swapping, the coil with its ferrite material in various configurations and some protective shrink wrap.

To summarize a complex topic, different coils are required for individual injection targets and use cases. The coil dimensions, coil type, number of windings, winding direction, number of layers, ferrite core form and other design aspects play a role in the strength and size of electromagnetic field that is generated. The physics details go beyond this article, but the injection tips section of the upstream documentation is a good place to start if you’re looking for component ideas.

Here are some probes in various stages of assembly:

Probe after soldering

Probe with partial shrink wrap protection

Different probe mounted on the PicoEMP device

First probe collection with axial coils

For my second set of custom probes, I decided to try out different variants of designs and also include some physically smaller probes for more accurate injections.

Partial disassembly of a pre-wound coil

Probe with half-toroid configuration

Probe based on a partially disassembled shielded coil

Small probe with handmade coil

Second probe collection after soldering

Second probe collection with protective shrink-wrap

Summary

The PicoEMP is a really interesting DIY tool for advanced uses in hardware hacking, and I think it is great that it is available under an open license and accessible. The DIY nature of the tool also means that you have to get your hands dirty to get one (at least at this stage). Since the complex subject of EMFI needs a lot of experimentation time in any case, there are benefits in getting to know the low-level details of the device operation and probe design.

I have some future plans and ideas for potential tests of the device and experimental improvements, which may give some material for a second article at some point in the future.

invd blog

Modding a Flashforge Guider II 3D Printer - Part 2

Contents

Consulting

Hardware Flaws and Bad Design Decisions

Overdriven Diagnostics LEDs

Hot Optical Sensors

Suboptimal Fan Sizes and Cooling Design

Noisy Buzzer

Limiting Hotend Flex Cable

Hotend Thermocouple Usage

Semi-Custom MK10 Nozzle/Heater Block Format

Modding a Flashforge Guider II 3D Printer - Part 1

Contents

Consulting

Introduction

General Printer Design

2FA Auth Bypass in devise-two-factor (CVE-2024-0227)

Contents

Consulting

TL;DR

Introduction & Background

TOTP Brute-Force Vulnerability in devise-two-factor

Prior Art & References

Security Implications

Additional Attack Considerations

Example Attack Time Calculation

Mitigations

High-Level Recommendations

Coordinated Disclosure

CVE Status

Credits and Sponsoring

Confirmed Affected Projects

Potentially Affected Projects

Projects with Mitigations

Detailed Timeline

Bug Bounty

Milk Sad - How Weak Entropy can Ruin Your Savings (CVE-2023-39910)

Yubico YubiHSM PKCS#11 Library Vulnerability (CVE-2023-39908)

Contents

Consulting

Memory Handling Issue in C_GetAttributeValue

Initial Debugging

PoC

Code History

Patch

Security Implications

CVSS Score

Coordinated Disclosure

Credits and Commercial Work

Relevant Sources

Detailed Timeline

Bug Bounty

KeepKey Memory Exfiltration Vulnerability (CVE-2023-27892)

Contents

Consulting

The Vulnerability

Attacker-Controlled Out-of-bounds Read (CVE-2023-27892)

Additional Attack Considerations

POC

Coordinated disclosure

Relevant product

A Note About Research Affiliation and Work Time

Detailed timeline

Bug bounty

Yubico libykpiv Vulnerabilities II

Contents

Consulting

Stack-out-of-bounds-write in ykpiv_authenticate2()

Security Implications

CVSS Score

Stack-use-after-scope in ykpiv_authenticate2()

Security Implications

Coordinated Disclosure

Relevant yubico-piv-tool / libykpiv Sources

Detailed Timeline

Bug Bounty

KeepKey Supervisor Vulnerabilities (CVE-2022-30330)

Contents

Consulting