devise-two-factor
Time-based One-time Password (TOTP) library. With the help of Chris MacNaughton, we confirmed the vulnerability and informed the upstream vendor of the library.I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
A popular Ruby TOTP two factor server-side library lacks built-in protections against brute force attacks of the verification step. Due to design limitations of the underlying standards, this allows bypassing the second factor in targeted attacks against some web applications. The brute force attacks are practical in common configurations and take a few days or less to break into an account for which the main password is known.
Two-factor authentication (2FA) mechanisms are security systems designed to make account takeovers harder to accomplish and more difficult to scale, while at the same time having a bearable overhead cost to the users they’re protecting.
As such, 2FA mechanisms often have a number of trade-offs which weaken their overall effectiveness in order to make them easier to use.
The Time-based One-time Password (TOTP) standard at the center of this vulnerability is a popular 2FA mechanism specified in RFC6238. TOTP is basically a time-based adaptation of the HMAC-based One-Time Password (HOTP) algorithm standard as specified in RFC4226, and has inherited most of HOTP’s general design. If you have a smartphone security app that shows you some simple secret code which changes every 30s and can be typed into websites to confirm your login, TOTP is probably the standard that you’ve been using.
For this article, the most relevant security design choice of TOTP/HOTP was to keep the “one time password” secret very short and low in complexity. The commonly used TOTP default parameters require just six numerical digits for this temporary password.
Most readers are likely aware that a 6-digit account password is very insecure under normal conditions. If there are no consequences for wrong guesses, an attacker can simply try every number combination between 000000 and 999999 and find the valid password - a so-called “brute force” attack. Compare this to the short numerical PIN on your banking card: if the bank’s ATM system enforces no limit on incorrect PIN attempts, a persistent thief with your card can circumvent that second factor fairly easily by trying combinations for a while. It’s only the restriction on a few wrong PIN attempts before blocking the card that makes this a reasonable system.
The authors of the HOTP standard were fully aware of this design limitation, which they described in section 7.3 of their RFC:
Truncating the HMAC-SHA-1 value to a shorter value makes a brute force attack possible. Therefore, the authentication server needs to detect and stop brute force attacks.
They described two potential countermeasures that implement a lockout or delay based defense:
We RECOMMEND setting a throttling parameter T, which defines the maximum number of possible attempts for One-Time Password validation. […]
Another option would be to implement a delay scheme to avoid a brute force attack. After each failed attempt A, the authentication server would wait for an increased T*A number of seconds, e.g., say T = 5, then after 1 attempt, the server waits for 5 seconds, at the second failed attempt, it waits for 5*2 = 10 seconds, etc.
This security recommendation was public since at least the October 2004 draft, almost two decades ago at this point. However, it is not always adopted correctly, making it a potential weak point for attacks that target the second factor.
Radically Open Security, a non-profit computer security consultancy that I work with as a freelancer, runs an instance of the open source EyeDP identity provider software. EyeDP
is used for single-sign on (SSO) login access to internal services. During some internal audit work, I discovered a suspicious absence of TOTP anti brute-force defenses in the EyeDP
code.
After reaching out to EyeDP
’s developer Chris MacNaughton, we were able to confirm together that EyeDP
is susceptible to brute forcing the TOTP codes to bypass the 2FA. EyeDP
is a Ruby application that uses the Ruby devise authentication framework for auth handling and the devise-two-factor library extension to implement its 2FA mechanisms.
We quickly found out that the upstream devise-two-factor
also does not have any protections against one time password (OTP) brute force attacks, which EyeDP
implicitly relied upon.
Here is a relevant code excerpt from devise-two-factor
:
two_factor_authenticatable.rb L36-L52
As you can see, if the totp.verify()
call does not succeed, the function returns false
to signal failure but doesn’t change any state in memory or in the database to count the failed attempt. Some form of bookkeeping would be necessary to recognize that a subsequent failed attempt has exceeded some threshold and enforce any delay or lockout countermeasure scheme.
In case of a successful login, there is a protection mechanism which tracks the last used OTP code self.consumed_timestep
to prevent its re-use after a successful login:
two_factor_authenticatable.rb L79-L88
This solves previous security issues in devise-to-factor
, namely CVE-2015-7225 and CVE-2021-43177 concerning OTP reuse that is forbidden by the standard. However, the OTP reuse detection for the last login unfortunately does not provide any protection against brute force attacks on new logins, so this defense is basically irrelevant here.
The brute force attack vector against OTP standards has been publicly known for as long as the standards existed. Unsurprisingly, a lot of public references and writeups for it exist.
Here are some that are worth taking a look at:
Additionally, the issue #19799 in GitLab from 2016-07-06 came very close to publicly expressing this exact issue in devise-two-factor
. GitLab is a Ruby application which uses devise-to-factor
, and proper defenses in the library would have helped to prevent this. As far as we’re aware, this issue wasn’t raised upstream, and the GitLab TOTP handling was fixed with custom logic.
This vulnerability allows an attacker who knows the correct primary login credentials of a victim user to repeatedly guess the second factor OTP verification code until they randomly succeed, without getting locked out or delayed.
Without additional protections such as rate limiting, an attacker testing possible OTP codes at the maximum rate that the server can process them is able to break the 2FA protection of an account within a few days or even hours, under default conditions.
Since the attack is probabilistic, the attacker may get lucky and succeed on the first try, or try the whole number range and still not succeed yet. In this regard, the issue behaves differently from normal password brute force since the changing OTP code is a moving target. To account for with this, it helps focusing on the “average” length of attack required to have a 50% chance of getting in.
There are some configuration options which significantly affect the complexity of practical attacks:
a-z
alphabet or alpha-numerical codes would be dramatically more complex, but is not common.Guessing the OTP code right once doesn’t help the attacker for future attempts against the same account, but this may not be necessary. Depending on the application, the attacker may be able to use the granted authentication session to make the 2FA protections ineffective by disabling 2FA on the account or adding new attacker-controlled 2FA tokens.
Let’s take the following example:
Borrowing the python-based calculation method from Michael Fincham’s article:
After a day of OTP testing, chances for success are already 64.54% in this scenario. This is bad!
Since the vendor decided not to include any targeted defenses against TOTP brute force attacks, this responsibility falls on the projects using the library.
Wherever possible, we recommend defense strategies that count failed attempts primarily by targeted account, and after the attacker has passed some initial form of authentication barrier. This design ensures the inherent new attack vector towards locking out genuine users stays as minimal as possible. Additionally, this pattern cannot be bypassed by distributing an attack against the same user onto many source addresses (IPv4 / IPv6).
Rack::Attack is a common Ruby library to block and throttle requests, which may be of help if user session data is available during a multi-stage login flow. Alternatively, it can be used to implement some additional secondary defense which limits login requests per IP address or IP subnet (with obvious limitations).
Devise
has the Devise::Models::Lockable
mechanism to block user accounts after some number of incorrect login attempts. We see this as an inferior solution that creates new issues. Attackers who know neither the correct TOTP code nor the correct account password can still trigger account lockouts, which is undesirable.
For users:
The disclosure process had several busier-than-usual phases, since we had to work on coordinating the issue shortly before and after the Christmas holidays. Additionally, the embargo timeline changed from 60 days -> 90+ days -> 30 days. From the start, we supported a shortened embargo to get this information out sooner, but things still got unexpectedly busy with last-minute writeup and coordination work.
I discovered this issue during internal audit work for Radically Open Security (ROS). Radically Open Security supported the disclosure and sponsored the worktime for most steps such as initial analysis, triage, and disclosure to the vendor Synopsys
.
Thanks go to Chris MacNaughton (Centauri Solutions) who was heavily involved in the analysis steps, and to the team at Radically Open Security who helped with coordination efforts.
This article was written on my own time.
Project | Source | Likely Affected Version | Fix | References |
---|---|---|---|---|
Synopsys devise-two-factor | GitHub | >1.0.0 | Not planned | GHSA-chcr-x7hc-8fp8 advisory, CVE-2024-0227 |
Centauri Solutions EyeDP | GitHub | <= 1.0.16 , < 1.1 |
1.0.17, 1.1.0-rc4 | GHSA-qrqh-v2j6-3g7w advisory |
To be determined.
Related lookups:
The following projects use devise-two-factor
, but have mitigations that can be effective under at least some conditions:
Project | Source | Comment |
---|---|---|
GitLab | GitLab | Uses dedicated layers of rate-limit / lockout mechanisms. |
Mastodon | GitHub | Uses rate limits on login endpoint, if accessed via normal network paths. |
Please note that we have not analyzed the defenses closely and do not vouch for their effectiveness.
Date | Information |
---|---|
2023-12-12 | Initial discovery of issue in local ROS EyeDP installation |
2023-12-12 | Triage of vulnerability together with EyeDP developer |
2023-12-12 | Brief exposure of EyeDP mitigation patch on GitHub |
2023-12-12 | Rollout of mitigations for local ROS EyeDP installation |
2023-12-13 | Coordinated disclosure of vulnerability to Synopsys PSIRT |
2023-12-20 | Status request to Synopsys PSIRT |
2023-12-20 | Synsopsys PSIRT responds, asks us to re-send the disclosure |
2023-12-20 | Repeated transmission of vulnerability to Synopsys PSIRT |
2023-12-20 | Synsopsys PSIRT confirms receipt of disclosure, announces goal of 90d embargo starting 2023-12-20 |
2023-12-21 | Followup with more information to Synsopsys PSIRT |
2024-01-08 | Status request to Synopsys PSIRT |
2024-01-09 | Synopsys PSIRT provides some updates, CVE ID, and announces embargo end in less than 7d |
2024-01-10 | Coordination with Synopsys PSIRT |
2024-01-11 | Coordination with Synopsys PSIRT |
2024-01-11 | Synopsys publishes advisory |
2024-01-11 | Publication of this article |
Please note: additional steps after initial article publication are not covered in this timeline.
There was no bug bounty involved.
]]>Libbitcoin Explorer
bx
software tool that left victims exposed to remote & automated wide-scale theft of funds.bx
user’s funds were targeted among with other weak wallet types, amounted to millions of dollars in damages across hundreds of victims and various blockchains and coin types.
We found that the core issue for bx
was the usage of the unsuited Mersenne Twister
Pseudo Random Number Generator (PRNG) algorithm, which led to cryptocurrency assets being stored on what is essentially a “32 bit number in a trench coat”, instead of a strong private key. Anyone with knowledge of the issue and a moderate amount of computing power could reverse these without any access to the victim’s computer and use the recovered private keys to move funds away. We gave this vulnerability the codename Milk Sad
after the first weak BIP39 mnemonic key output, and worked frantically during a short period of 2 1/2 weeks between detection and disclosure to learn, research and explore what we could about the issue and its backstory. Our motivation was to help users saving their remaining funds and understand the problem, and help developers fix and prevent issues like this for the future.
You can read the results in the full disclosure writeup.
For “normal” software vulnerabilities, most of the research work is done after identifying, reproducing, classifying and disclosing them.
Not in this case - exploring the complex and wide-reaching impacts of the vulnerability is a huge task, with practical challenges for coding the necessary custom tooling and analyzing the results. I’m investing a lot of research time to further understand and publish new information on Milk Sad
and previous similar vulnerabilities as a series of research updates, since they’re both fascinating and under-reported. Head over there if you want to read more!
yubihsm_pkcs11.so
driver library, which we disclosed together to Yubico. The YubiHSM PKCS#11 client-side library is designed to interact with Yubico HSM2 hardware security modules. Due to flaws in the memory handling, the library code accidentally returns 8192 bytes of previously used process memory under some circumstances. This impacts the memory confidentiality of the calling program for some usages.
This article will describe the issue.
I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
The C_GetAttributeValue()
function in yubihsm_pkcs11.c
can be used to query X.509 certificate attributes of particular types. However, some codepaths have problematic memory handling.
Consider client software that retrieves the certificate attribute CKA_SERIAL_NUMBER
with the C_GetAttributeValue
call from a regular, non-malicious YubiHSM2 device via the PKCS#11 API interface.
Internally, the PKCS#11 functions will establish a session with the YubiHSM2 device and then attempt to handle and parse the received object information. This happens via the following code positions:
Note that populate_template()
reserves a local stack buffer CK_BYTE tmp[8192];
without explicit data initialization and passes its size as the len
parameter to subsequent function calls in
util_pkcs11.c L5072 This will become relevant later.
The particular security issue is associated with opaque attributes, which are handled via
In get_attribute_opaque()
, some function paths overwrite the length
parameter - which is passed as a reference - to the actual length of the field for the specific attribute they fetched. However, for at least three specific field types, this does not happen:
As a result, code flows hitting the quoted code lines in get_attribute_opaque()
will return to the parent function without writing data into the tmp
buffer, without changing the length
parameter away from the maximum value, and without returning an error code. By convention, this appears like a successful operation which produced 8192 bytes of new output, although in reality the tmp
buffer was not written to at all.
This constellation leads to a problematic memcpy()
call which leaks the uninitialized memory contents of tmp
into the output:
Typically, the content of uninitialized stack memory variables will contain memory from stack frames which previously occupied the relevant memory region. Copying data from this memory region will likely contain stack variables and other stack-related information (stack canaries, pointers) from previous function calls.
The leaked data will then get returned to the PKCS#11 caller. Since the caller requested some specific certificate information, but instead gets data from an information leak, this represents a security issue. Additionally, since the problematic library function indicates no errors, this information may be passed on by the caller towards other components and trust spheres, depending on the specific behavior of the program.
Programs that use the YubiHSM are likely to handle secrets, possibly including secret key material, or sensitive plaintext. The sensitive key material may include the PIN secret used to secure the HSM communication. Because of this bug, a program that uses yubihsm_pkcs11.so
may inadvertently return such information instead of the requested X.509 certificate attributes.
Since the vulnerable component is a flexible library, it is unclear which programs call into the problematic function, and under which circumstances. Additionally, the relevant memory accesses are undefined behavior (UB) in C and may depend on the compiler and system environment. If you have more information about specific integrating applications and their confirmed security impacts, please contact us.
To confirm that this data actually leaks from the tmp variable on the stack, we used a modified yubihsm_pkcs11.so
library which specifically marks the memory in question with ASCII ‘A’ characters. During initial debugging, this allowed a straightforward identification of the problematic memory in returned data:
In order to confirm the issue and help Yubico reproduce it, we crafted a proof-of-concept (PoC). The PoC consists of a short dummy program written in C that triggers the issue: pkcs11-memleak.c.
WARNING: use the provided PoC code at your own risk, and only on non-production HSM devices.
Please see the code comments for setup details and explanations. The special put_dummy_secrets_on_stack()
function may be of particular interest to understand the leaked output and attack conditions.
Note: this section has been updated to include new information.
By our understanding, this issue was introduced via a combination of two commits:
2.0.3
)2.4.0
)Commit 1) introduces the weak tmp
buffer initialization, and commit 2) introduces problematic code paths and makes the issue reachable.
The main patch improves the problematic code paths in get_attribute_opaque()
:
d56f8567d4fe807dc097febbac7bb4e02ca9dea3
A second patch improves the buffer initialization:
Additional references:
The described security issue affects the confidentiality of program memory. Due to the characteristics of the flaw, we think that program memory integrity and program availability is not impacted.
As with other memory- and library-related vulnerabilities, it is difficult to say generally what the sensitive information in memory is going to be, and how the leaked information will be processed or exposed by the caller. As a result, the practical worst-case impact will likely be very target-dependent.
ID | CVSS 3.1 Score | Parameters |
---|---|---|
CVE-2023-39908 stack information leak | 4.4 (Medium) | AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:N/A:N |
The listed scoring maps the impact on a network-enabled integrating program which allows a remote user to trigger the affected functionality and obtain secrets after some form of authentication as a high-privileged user. Other integrating programs that use the PKCS#11 driver may have different impacts. For example, if a lower-privileged user can trigger the issue, PR: L
would turn this into a 5.3 CVSS base score (calculator).
As outlined in the timeline, the first ~60 days out of the overall 90 days of disclosure did not see a lot of activity or feedback from the vendor side. This follows a pattern seen with previous coordinated disclosures to Yubico, which also had significant delays between reporting and technical discussion & assessment coordination with the vendor. We recommend focusing on a quicker initial handling for future disclosures to reduce the time pressure on coordination tasks. We want to positively mention that Yubico has provided security patches and an advisory on the disclosure date for this disclosure, which is an improvement over the previous issue.
During work on integration of YubiHSM2 into an OpenPGP project, Heiko Schäfer found the memory safety issue. Christian Reitter assisted with triage, issue analysis, coordinated disclosure and report writeup. In references to this issue, please credit “Heiko Schäfer and Christian Reitter”.
Heiko Schäfer is available for commercial work with a focus on OpenPGP and Rust:
Variant | Source | Likely Affected | Fix | References |
---|---|---|---|---|
Yubico upstream | GitHub | 2.4.0 |
SDK 2023.08, 2.4.1 |
YSA-2023-01 advisory, CVE-2023-39908 |
Fedora package | rpm package | 2.4.0-1 |
2.4.1-1 , commit |
bugzilla #2232340 |
We originally reproduced the issue with yubihsm-shell-2.4.0-1.fc38.x86_64
under Fedora. It appears that earlier versions before 2.4.0
do not contain the problematic code path, see here.
Date | Information |
---|---|
2023-05-18 | Disclosure of issue to Yubico, including proof-of-concept code |
2023-05-24 | Response by Yubico, confirms recept of disclosure |
2023-06-21 | Request to Yubico for a status update and severity assessment |
2023-07-17 | Status update request to Yubico after lack of response |
2023-07-17 | Response by Yubico with technical details and CVSS scoring |
2023-07-21 | Message to Yubico, discussing proposed CVSS scoring & CVE |
2023-07-25 | Response by Yubico, discussing proposed CVSS scoring |
2023-07-31 | Message to Yubico, discussing proposed CVSS scoring & CVE |
2023-08-02 | Response by Yubico, outlining CVE assignment and disclosure date plans |
2023-08-04 | Response by Yubico, disclosure date plans |
2023-08-05 | Message to Yubico, acknowledgment |
2023-08-14 | Yubico publishes YSA-2023-01 and patch release |
2023-08-14 | Publication of this article |
2023-08-16 | Original end date of 90-day coordinated disclosure period |
2023-08-23 | Update of this article, revising version information, adding patch details |
At the time of the disclosure, the vendor did not offer a bug bounty.
]]>I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
This section outlines how Ethereum-related processing code introduced with firmware v7.5.2
can be used as an arbitrary read gadget to display confidential device memory on the OLED screen, which violates security goals.
Once an attacker sends a special Ethereum signing request message, the following code path in ethereum_signing_init()
can be triggered:
This calls the recently added cf_confirmExecTx()
function via the short intermediary function ethereum_cFuncConfirmed()
:
The code vulnerability is located in cf_confirmExecTx()
.
Before we dig deeper, first some context on this section of the firmware.
On an abstract level, the code for Ethereum transaction confirmation functionality is supposed to
EthereumSignTx *msg
request received via USB from the host computer.Subsequent code stages then perform the actual Ethereum transaction signing, but they are not relevant to understanding this issue.
The confirmation flow in question has multiple stages to display and approve the individual components of the transaction:
confirm(ButtonRequestType_ButtonRequest_ConfirmOutput, ...)
on the decoded receiver address.It is at the third step where things go bad. 🌩
Here is the problematic code section:
The goal of the listed code instructions is to prepare the uint8_t* data
pointer and uint32_t dlen
length variables of the data that should be printed. The display logic then uses them to show hexadecimal encoded text versions of the referenced data payload in the Ethereum transaction message to the user. Due to the limited screen size, the conversion and screen dialog operates on paginated chunks.
Display logic:
The crucial mistake in the message parsing logic is the lack of range checks for the variables. Both uint32_t offset
and uint32_t dlen
are assigned and used without ensuring that the referenced memory region is firmly within the msg->data_initial_chunk.bytes
payload section. This leads to serious problems!
Let’s walk through one of the problematic assignments in more detail:
In simplified terms, the combination of void bn_from_bytes(const uint8_t *value, size_t value_len, bignum256 *val)
and uint32_t bn_write_uint32(const bignum256 *in_number)
reads an uint32_t
value from a particular memory location without imposing any additional range limitations on the resulting number. In the code snippet shown above, the number conversion first reads a 256 bit bignum number from a fixed byte offset within msg->data_initial_chunk.bytes
and then assigns the least significant four bytes to offset
, discarding the rest of the input.
A similar operation happens for the dlen
read, but from a flexible offset location (more on this later).
It’s important to remember that the Ethereum transfer request message comes from an untrusted source - the computer acting as the USB host could be compromised by malware, which is the reason behind showing the user confirmation steps on the hardware wallet display in the first place. In this particular code branch of Ethereum transaction signing, the format validation functions run before cf_confirmExecTx()
impose no meaningful limitations on the msg->data_initial_chunk.bytes
content.
To summarize, msg->data_initial_chunk.bytes
passes over a trust boundary, isn’t validated to any strict specification, and then used without sufficient length checks.
An attacker with control over the message content can exploit the unbounded conversion flaws in two general ways:
uint32_t offset
value, use it to move the data
pointer, and leak content from an arbitrary memory location.uint32_t offset
value, control uint32_t dlen
, and run the memory printing function arbitrarily far beyond the packet buffer.In both cases, the previously quoted display logic will trigger confirm()
dialogs that leak raw memory from out-of-bounds regions via snprintf()
to the KeepKey device OLED screen. That’s a pretty powerful attack gadget on a hardware wallet, which is supposed to avoid data leaks at all costs!
The following attack description will focus on direct data
pointer control via large offset
values (variant no. 1), which I’ve found to be more powerful and practical for manual attacks without physical automation. It’s simpler to leak an interesting memory region directly on the screen in a few display pages, compared to setting an oversized dlen
length and manually cycling through thousands of display pages before arriving there.
Digging deeper into the code behavior, we can see that the attacker can force arbitrary pointer addresses for data
. Due to the unsigned integer overflow wrapping, the msg->data_initial_chunk.bytes + 4 + 32 + offset
calculation can end up with any address in front of or behind msg->data_initial_chunk.bytes
! To make matters worse for the defenders, msg->data_initial_chunk.bytes
is at a static and well-known absolute address. The currently processed Ethereum message will always be located in a special decode buffer after it is converted from the protobuf wire format:
Since decode_buffer[]
is a static global variable and the ARM Cortex-M3 platform has no address space layout randomization, the buffer and the msg->data_initial_chunk.bytes
struct field will always be located at the same absolute memory location for a given firmware version. This allows attackers precise and reliable exploitation of this issue without the need for guesses or usage of other information leaks.
For attacks that intend to read out a specific, narrow memory region via crafted offset
values, the last remaining obstacle is the limited attacker control over dlen
when manipulating data
.
By picking crafted offset
values in the attack message which move data
towards other microcontroller memory outside of the message buffer, the dlen
-defining read operation moves there as well:
Unfortunately for the defenders, this is drawback can be worked around since the bignum data read logic and display code is very forgiving and will treat basically any data as a meaningful length field.
The attackers can simply point to a memory region slightly in front of the targeted data that is known to have some non-null data bytes in the 4 byte window of dlen
. As long as the converted dlen
value is at least as large as the desired data readout section, the resulting memory readout will successful leaks all relevant data after some pagination.
For edge cases where dlen
is unexpectedly small, the display code runs into another failure mode and leaks previously used stack memory via the unitialized char confStr[131];
variable. However, compared to the arbitrary read gadget of specific memory address contents, this is not nearly as interesting or powerful.
Similarly, the attacker can set offset
such that display reads will access forbidden memory regions and cause a crash. Given the requirements of this attack, this exploitation variant is also not of much interest, but technically part of the potential impact.
The problematic functionality can be triggered by local or remote attackers once the device is in an unlocked state (if a PIN is set on the target device) and the user physically confirms at least some steps of an Ethereum signing flow. The most limiting factor in the attack is that the secret information is only rendered on the physical KeepKey display as hexadecimal-encoded data and not leaked back towards the host computer.
The latter behavior is due to the confirm()
handler at confirm_sm.c which does not make use of the data
field in the ButtonRequest
message and therefore does not send the displayed string towards the computer, where malware could read it after tricking the user to confirm a supposedly low-value Ethereum transaction.
As with other KeepKey USB related vulnerabilities, a malicious website with user-granted WebUSB permissions could trigger this issue. However, in this particular vulnerability there is no return channel for the leaked information, so additional physical capabilities by the attacker are needed. Under some edge conditions, social engineering may be used to trick the victim user of the KeepKey to voluntarily copy or photograph the leaked information from the device screen, but I see this as difficult to achieve reliably given the circumstances.
From a threat model perspective, I see this vulnerability as relevant despite the high attack requirements since it undermines both the implicit and explicit security guarantees of the hardware wallet with regards to the confidentiality of long-term cryptographic key material.
One of the affected mechanisms is an advanced wallet initialization mode of the KeepKey wallet which doesn’t reveal the generated BIP39 mnemonic seed to the user at any point, see lib/firmware/reset.c. Wallets initialized with this mode permanently have the no_backup
flag set to true, and the communicated goal is to make a recovery of the key impossible. The demonstrated attack for CVE-2023-27892 clearly violates this goal, as the no_backup
flag stays unchanged despite the revealed secret.
Similarly, wallet users may have the expectation that the effects of hands-on attacks against their wallet have to be immediate, i.e., the transfer of funds during the attack, or that attacks are only possible if the unlocked wallet already has significant funds available at the time of the attack. While this doesn’t have to be correct 100% from the technical side, for example since attackers could delay the submission of their illegitimately obtained signed transactions to public networks, access to the underlying BIP39 seed by the attacker certainly allows for much more flexible and targeted theft months or years later across various coins, wallet accounts and addresses. In the case of wallets which were temporarily less protected - no PIN configured, accessible to other people, left connected to an unlocked and unsupervised computer for some minutes - this could make a significant difference in practical risk over the multi-year lifetime of a typical BIP39 seed.
Finally, there’s also the consideration with regards to BIP39 passphrases, which are an additional and highly recommended safety layer on top of the BIP39 mnemonic words to prevent the theft of funds. CVE-2023-27892 opens the door for two particular attacks against passphrases. If a given hardware wallet is accessed/stolen by the attacker due to lack of PIN protection (or by using a known PIN), even a moderately complex passphrase could prevent an attacker from discovering and using the custom passphrase-based wallet that holds some additional funds. An online brute-force attack against possible passphrases using the built-in firmware mechanisms is significantly rate-limited due to slow APIs, limited microcontroller processor speed for derivations as well as physical confirmation steps, which results in very limited attack capabilities. Using CVE-2023-27892, an attacker can obtain the BIP39 seed and then scale offline brute-force attacks to an arbitrary number of powerful systems, making it much more feasible to determine the correct derivation with e.g., a dictionary-based attack. In rare scenarios where the attacker temporarily gets access to a hardware wallet that is not just plugged in and unlocked, but also has a sensitive passphrase cached in-memory, the passphrase may also be revealed directly. This also applies to other volatile secrets in memory such as the PIN, but note that auto-locking and other functionality may interfere with this.
To summarize, CVE-2023-27892 does not benefit attackers who steal a PIN-protected KeepKey that is powered off, but significantly increases attacker capabilities for delayed theft, circumvention of no_backup
mode guarantees, and enables BIP39 passphrase brute-forcing or direct retrieval as well as other attacks in case of temporarily unprotected and unsupervised devices.
Also noteworthy: this security issue may be beneficial to legitimate owners who have partially or completely forgotten/lost essential secrets of their configured devices. Under some conditions, it may be possible to recover secrets that are still in the device (see the previous paragraphs). Leveraging firmware up- and downgrade capability between vendor-signed official firmwares without the mandatory erasure of BIP39 seed secrets could help with this (disclaimer: perform at your own risk!). I’m looking forward to feedback from users in case this security research was helpful in particular recovery cases.
WARNING: use the provided PoC code at your own risk. The instructions will PERMANENTLY overwrite the configuration of the hardware wallet. Only test with an expendable unit.
keepkeyctl wipe_device
and keepkeyctl load_device -l "poc_test" -m
"keep key program problem process input result memory display defense broken inform"
, which is a custom seed with a valid checksum.v7.5.2
, which the PoC is prepared for.pyusb
module installed.Data payload #1
reveals ep key program problem process input
and the Data payload #2
page reveals result memory display defense broken in
, with additional data following on the third page.This disclosure was marked by significant delays and missing feedback when communicating with the vendor (KeepKey). Initially, they created a public patch for the issue on GitHub but did not respond to the confidential disclosure. After three weeks and two reminders, I got a direct response and technical confirmation, but then the contact broke off again and didn’t resume after multiple followups. Despite releasing public security patches and issuing a firmware release, I’m not aware of any public security notes or advisory by the vendor on this issue at the time of publishing of this blog post. This is a further regression of disclosure handling over the last disclosure process CVE-2022-30330 with this vendor in 2022, and may be related to ownership and team changes of the KeepKey product.
In summary, the overall coordinated disclosure progress and publication handling was neither motivating on the researcher side nor overall adequate in my opinion.
In future disclosures, I’ll consider releasing my disclosure information sooner in cases where vendors silently fix security issues during the disclosure period, depending on the patch publication and software release circumstances.
Product | Source | Known Affected Version | Fixed Version | Patch | Vendor Publications | IDs |
---|---|---|---|---|---|---|
KeepKey | GitHub | firmware v.7.5.2 to v7.6.0 |
v7.7.0 |
PR337 | none | CVE-2023-27892 |
I want to emphasize that this research was done on my own time and initiative. In particular, it was not sponsored by SatoshiLabs, for whom I do some paid freelance security research on the related Trezor project.
Date | Information |
---|---|
2023-01-17 | Confidential disclosure to KeepKey |
2023-01-26 | KeepKey publishes GitHub Pull Request no. 337 with security patch |
2023-01-29 | POC and additional analysis communicated to KeepKey |
2023-02-05 | Followup email to KeepKey requesting feedback |
2023-02-06 | Issue confirmation by KeepKey |
2023-02-22 | GitHub Pull Request no. 337 is merged |
2023-03-06 | MITRE assigns requested CVE |
2023-03-07 | Release of KeepKey firmware v7.7.0 with security patch |
2023-04-17 | End of disclosure period |
2023-04-17 | Publication of this report |
2023-04-19 | Report: “Additional Attack Considerations” section extended |
At the time of the report publication, KeepKey has not offered a bug bounty.
]]>2.3.0
release.
Flaws in the memory handling of the auth handshake procedure with a PIV smartcard could lead to memory corruption, denial of service or other unexpected behavior under some conditions. The practical security impact on tested production binaries appears to be limited.
This article will describe the issues.
I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
The first issue is a code flaw related to insufficient length restrictions for smartcard-provided data. This issue is similar to previous libykpiv vulnerabilities and again leads to dangerous memory safety issues due to custom low-level memory handling.
The ykpiv_authenticate2()
function performs a sequence of interactions with an external PIV smartcard such as a Yubikey 5 device connected via USB for smartcard actions that require authentication. During those steps, the host receives a cryptographic challenge from the smartcard:
While there is an upper bound on the received data that prevents any direct issues, the length recv_len
of the reply is also reused for the cryptographic challenge from the host to the smartcard. This becomes an issue if recv_len
is particularly large:
The manual memory management via custom pointer advances becomes a liability here, since the upper bound on the recv_len
is not sufficiently tied to how much data the apdu
struct can hold at this point. As a result, _ykpiv_prng_generate(challenge, challenge_len)
can end up writing behind the struct if the unchecked assumptions about the memory sizes are violated:
Since _ykpiv_prng_generate()
overwrites the target buffer with random data via OpenSSL’s RAND_DRBG_generate()
function, a malicious smartcard that triggers this flaw doesn’t have control over the exact values that are written behind apdu
on the stack during the out-of-bounds write, and the values will be different on each execution. This doesn’t mitigate the memory safety issue itself but definitely makes it harder to manipulate the stack data in a controlled way.
Due to the compiler- and target-specific aspects of the program stack layout, it is difficult to make global statements about the expected security implications of the stack-buffer-overflow or lack thereof. By my knowledge, the yubico-piv-tool binary that includes the libykpiv library is always compiled with stack canary protections for production builds, which should turn any out-of-bounds write to the stack canary segment into a controlled program crash and therefore a denial-of-service. Some limited analysis of the situation for Linux x86_64 binaries of the yubico-piv-tool 2.3.0
release suggests that the OOB write can’t reach the stack canary segment and “only” overwrites stack memory of other local variables that are not used at this point.
Given the nature of libykpiv as a library intended for use within other applications, it’s plausible that the affected code is also in use with other build system configurations or compilers where those observations do not apply.
My current understanding is that this flaw cannot be used to hijack the execution flow of the program or manipulate essential internal variables in yubico-piv-tool 2.3.0
and will at worst causes a crash, but my confidence of this is limited due to the outlined complexity. For example, there may be additional variations of this attack by malicious smartcards which are aware of the mgm_key
secret that is shared between host and smartcard.
ID | CVSS 3.1 Score | Parameters |
---|---|---|
ykpiv_authenticate2() stack OOB write |
2.9 (Low) | AV:P/AC:H/PR:N/UI:R/S:U/C:N/I:L/A:L |
Please note that this scoring assumes that there is a way to impact the availability of the libykpiv component, for example by writing into a segment of the stack memory that is protected by stack canaries or causing a segmentation fault, and that the attacker can get by without authentication secrets. During the disclosure process, we discussed Availability: High
vs. Availability: Low
impact scoring in such a scenario. Only the lower rating is reflected in the scoring above to accommodate the current uncertainty about practical availability impact.
The second issue is a code flaw related to accessing a C variable’s memory content after its valid lexical program scope.
The ykpiv_authenticate2()
function contains multiple code regions with locally scoped variables, as well as variables that are used across multiple regions. Consider the challenge
pointer which is defined early in the function:
As part of the challenge-response handshake, a locally scoped code region sets the challenge
pointer to reference data in the apdu
struct on the stack:
The problem now occurs in the following code, which uses the memory referenced by challenge
:
While the challenge
pointer variable itself is valid throughout the ykpiv_authenticate2()
function, the stack memory it references has gone out of scope together with the apdu
variable at that point. This leads to an AddressSanitizer: stack-use-after-scope
error on debug builds with compiler sanitizers. AddressSanitizer warns on the memcmp(data + 4, challenge, challenge_len)
call via __interceptor_memcmp
, but the cipher_encrypt(mgm_key, challenge, challenge_len, challenge, &out_len)
call should be affected by this as well.
There is no security mechanism to detect this in production builds.
In theory, the C compiler is allowed to make arbitrary changes to the referenced stack memory content once it is no longer in scope, e.g., to overwrite it with other variables or clear it. Using this memory again leads to undefined behavior.
The stack-use-after-scope issue is triggered on each successful execution of ykpiv_authenticate2()
, but I’m not aware of any bug reports of functional issues in the handshake that are expected if there is a change in memory behavior, and nothing like that has been indicated by Yubico during the disclosure.
Therefore, I think Yubico got lucky with the bug behavior, since the relevant compilers for the production binaries apparently decided to leave the memory content intact long enough so that the logical program execution works as originally intended (likely because it is the fastest behavior). Since I’m not aware of a practical way for an attacker to influence this behavior and use it to their advantage, I’m handling it as a non-issue in terms of practical security impact on the 2.3.0
release.
The most worrying aspect to me is that this bug made it into a stable release without getting detected by static analysis tools or dynamic analysis at runtime despite being present during every smartcard authentication. This suggests that the associated test suites should be improved.
I have the impression that Yubico is currently not assigning a lot of resources or priority to security disclosure handling of their client-side open source libraries. During this coordinated disclosure process, it took almost two months to get a technical reply, and neither a release nor a patch was published during the 90-day disclosure timeframe as far as I’m aware.
The discovered security issues certainly aren’t the most severe, but memory corruption and undefined behavior issues are often difficult to classify as benign with a high certainty due to the amount of compiler- and architecture-related assumptions that may or may not hold in practice for all affected users. Since we didn’t identify a practical security impact on any of the tested production binaries, I decided not to ask for a CVE ID assignment at the moment.
In light of the other difficulties and delays observed during the previous disclosures to Yubico, the current situation is neither very encouraging for researchers who report issues nor adequately reducing the risk to end users via prompt security patches in my opinion.
To my knowledge, both regression issues were introduced after the 2.2.0
stable version release tag and are only present in the 2.3.0
stable release.
Variant | Source | Affected | Fix | References |
---|---|---|---|---|
Yubico upstream | GitHub | version 2.3.0 |
version 2.3.1 , patch 1 via PR402 |
no known public references |
The previous libykpiv vulnerability article contains a list of related other sources such as Linux distributions.
Date | Information |
---|---|
2022-05-28 | Disclosure of issue to Yubico |
2022-06-01 | Yubico confirms receipt of disclosure |
2022-06-13 | Followup to Yubico to query disclosure status |
2022-07-21 | Yubico confirms the technical issue, describes some analysis, proposed CVSS scoring |
2022-07-30 | Reply to Yubico with technical discussion, feedback on proposed scoring, discussion about criteria for potential CVE assignment |
2022-08-08 | Yubico replies on proposed scoring |
2022-08-15 | Reply to Yubico with technical discussion, followup question on crashing behavior, discussion about criteria for CVE assignment |
2022-08-26 | End of 90-day disclosure period |
2022-08-29 | Publication of this article |
2022-10-03 | Yubico adds a patch for both issues to the public code repository |
2023-02-07 | Yubico releases patched libykpiv version |
The vendor did not offer a bug bounty.
]]>The new discovery has implications for code execution attacks such as CVE-2021-31616, attacks with some level of physical access as well as the general trust expectations for the wallet system integrity after the installation of unofficial firmware.
I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
The following article is highly technical, so here is a slightly less-technical summary.
The KeepKey hardware wallet has some basic protections in place to limit what some parts of its software can do. This gives trust in the device by making it harder to backdoor permanently via malware, similar to modern smartphone systems.
The new flaws in KeepKey protections that I discovered basically allow a “Jailbreak” of the KeepKey. The main program on the device can break out of the protective cage it is in. This may be useful for some power users who want more control over their device, but it’s also useful for attackers who temporarily made it onto the device somehow or have physical access and can install custom firmware. They can use these flaws to permanently corrupt the core device software.
A device with malicious core software no longer has to follow the normal rules. It could generate new mnemonic secrets that an attacker has access to, lie to you about installing updates or attack your computer via USB. It can also erase itself and stop working at any time. This is clearly a bad situation for trusting the device with funds, and the extra annoying part is that it is difficult to find out if a device is malicious, for example if you buy a new one tomorrow from a less-trustworthy seller. Unfortunately, the hologram stickers won’t help you and wiping the device storage or reinstalling the firmware is not enough.
My main recommendation is to swiftly install the new security patches.
However, if you have previously used firmware v7.0.3
on computers or websites you don’t fully trust, it may be a good time to read up on CVE-2021-31616, check your funds and change your mnemonic seed or device.
Be extra careful about new devices that you buy, as this vulnerability makes it cheaper for attacker to corrupt them.
This article focuses on breaking the security supervisor code implementation of the KeepKey hardware wallet. To understand the context, first a little primer on what this software component is supposed to be doing.
The ARM Cortex M3 microcontroller series does not have any multi-tasking capability or sophisticated process security concepts that one may expect from larger processors. Instead, the available hardware-assisted protections consist of a two-level privilege concept for code separation at runtime which is enforced through hardware-assisted privilege level handling and memory protection settings. The Trezor and KeepKey system designs use this privilege system to limit potential actions of malicious firmware, especially for the flash write operations, with the goal to harden the overall system or at least make security issues observable to the user. This is done through a software root-of-trust concept based on a trusted bootloader, combined with cryptographically signed firmware releases. The bootloader controls firmware updates, checks firmware signatures on device startup, and provides the code for the supervisor component that is active after boot.
Essential configuration steps during startup:
On the KeepKey, the supervisor logic mainly focuses on guarding flash operations. All flash writes of the firmware are proxied through the supervisor code via custom interrupts. The svc_handler_main()
is tasked with the role of a gatekeeper for potentially dangerous accesses.
However, I’ve discovered that this code is broken in several ways, which completely undermines the sandbox design and allows the firmware to break out of it.
During security research in February 2022, I took a closer look at the supervise.c
code and found several flaws. They are clustered into several sections with similar issue patterns.
The ARM Cortex M onboard flash is divided into a number of differently sized flash sectors. On the STM32F205
chip that the KeepKey uses, they have the id 0
to 11
. Sector numbers go up to 23
on other STM32 chip series.
For technical reasons, the supervisor function call parameters of svc_*
functions are typically passed as unsigned 32-bit integer variables during the interrupt handling. As a result, despite the limited numerical range that is actually required to describe the target sector, svhandler_flash_erase_sector()
accepts and internally uses the full 32-bit uint32_t sector
for describing the flash sector ID that should be erased.
This choice of parameter type is problematic.
The defensive code checks on the flash erase are designed to reject the three specific sector numbers of 0, 5 and 6 that correspond to important flash areas for the bootloader and for the microcontroller configuration that are exclusively controlled by the bootloader. Aside from the three numbers on the blocklist, they allow the main firmware to request erasures of all other sectors.
Here is the corresponding code:
The sector erase is done via a libopencm3
library call:
Crucially, the libopencm3 library function is defined as follows:
Why is this a problem?
svhandler_flash_erase_sector()
treats the sector number as an unsigned 32 bit number, and incorrectly expects the flash library function to count the same way. Instead, the difference in sector
integer type leads to a well-defined but lossy unsigned integer conversion of the sector number down to the uint8_t
type before it is handed over to the library function.
This conversion maps multiple larger numbers into the forbidden sector numbers 0
, 5
and 6
.
An attacker can use this to completely bypass the defensive checks shown previously. For example, a deletion request for the sector 256
passes the checks but then actually asks the library to erase the forbidden sector 0
.
Using this flaw, malicious firmware can request the erasure of any flash sector.
During analysis of the erase problem, I found a similar problem in the Trezor One code.
It uses a uint16_t sector
variable that theoretically has the same integer conversion problem during the flash_erase_sector(sector, FLASH_CR_PROGRAM_X32)
call.
However, the Trezor code uses an allowlist approach for the sector checks, which doesn’t let any problematic values through:
Sectors 2
and 3
don’t have a conversion problem, therefore the Trezor One is not practically affected via this issue.
The KeepKey supervisor interface has two functions for flash writes:
svhandler_flash_pgm_word()
for writing individual 32-bit words to flashsvhandler_flash_pgm_blk()
for writing larger blocks of memory to flashVULN-22005
concerns the block write functionality.
The code has existing defenses that detect overflows of the address calculation.
It also checks that the beginAddr
and beginAddr + length
pointers are not in the forbidden memory regions of sectors 0
or 5 & 6
.
Here is the first part of the code checks:
However, these defenses have are incomplete.
They do not prevent a situation where beginAddr
points in front of the forbidden region and beginAddr + length
points behind it.
In other words, whole bootloader sections can be overwritten as long as at least one extra byte behind and in front of them is also overwritten.
Using this flaw, malicious firmware can modify protected flash memory in bulk.
Similarly to svhandler_flash_pgm_word()
, the block write has the typical limitations when writing data to physical flash memory, which means it can only change flash memory bits from 1
to 0
. If this were the only vulnerability a malicious firmware had access to, modifications would be limited to flipping bits in one direction in the existing flash data contents. However, this attack can be combined with vulnerability VULN-22004
from the previous section, which makes the data limitation go away. By first erasing the targeted flash region and then overwriting it, memory content can be modified arbitrarily.
During practical testing, writing into sector 0
using the svhandler_flash_pgm_blk()
does not work. The attack requires at least one write operation in front of the targeted sector. However, the required flash write in front of sector 0
is not seen as valid by the microcontroller and the operation gets stuck. The memory in front of sector 0
is “reserved” according to datasheet. It may be possible to circumvent this problem by using some other undocumented edge case behavior. However, I haven’t explored this edge case further after the discovery of another attack that doesn’t share this limitation.
Writing over the combined sector block 5+6 works as described, see the proof-of-concept.
While looking into additional problems of VULN-22005
, I noticed that the arbitrary pointer “write data from the source to the destination” construction of svhandler_flash_pgm_blk()
and “write this value to the destination” of svhandler_flash_pgm_word()
are very powerful primitives.
The blocklist-based defense has shown to be incomplete, are there other ways to misuse them?
After digging a bit deeper, I realized that one needs to view these functions as privileged memory write gadgets (both functions) or a privileged memory read gadget (via svhandler_flash_pgm_blk()
).
This is because the STM32 uses memory-mapped IO to write to the flash and has one continuous memory region.
In other words, the microprocessor generally treats flash content as normal memory and writes to it word-wise with direct assignments, or smaller writes if necessary. Therefore, the libopencm3 flash functions can essentially be used to write or read any other data in the STM32 address space if they’re called with target pointers outside of flash space.
For example, the flash_program_word()
essentially prepares the flash write, unlocks the flash and then does a simple write:
Crucially, the MMIO32(address) = data;
succeeds even if it’s not in flash related memory space. The svhandler_flash_pgm_blk()
works similarly and can also be used to copy secret information out of protected memory.
Since this write operation happens in the context of the privileged bootloader code, it does not falls under the restrictive MPU protections for the unprivileged thread. This is a huge problem for the supervisor integrity. The supervisor operates on its own little memory stack that’s protected by the MPU from interference by the main firmware:
The memory region protection falls apart if the main firmware can make the privileged thread corrupt its own stack with targeted writes. This has a significant impact on the bootloader code integrity at runtime. Practical impact may be limited a bit by stack protection and other defenses, but those can likely be circumvented through additional writes.
Additionally, in the global address space of the STM32, important device control registers are memory-mapped to special positions. The unprivileged firmware can access them with through the same flaw, for example the flash controller:
This can have additional impact, although the MPU still protects some parts of the flash, so there is a remaining barrier against direct modifications of sector 0
.
How can we break the remaining defenses?
The explicit memory region defense logic of the mentioned flash write functions assumes that there is only one canonical way to address and overwrite the protected flash sections.
However, this assumption is wrong: as the STM32F205
datasheet hints at on page 66, other memory regions such as 0x0000 0000
to 0x000F FFFF
can alias into the flash memory range. Here is a helpful visual overview of relevant memory regions.
What does this mean? Depending on the microcontroller system configuration, the lower memory ranges map directly into flash memory, just as the “main” flash memory section starting at 0x08000000
does.
The main difference is that the supervisor flash functions forbid access to the protected sectors in the 0x080....
regions due to the address comparisons, but they completely allow all writes to the 0x000....
region.
Bingo! We’ve just broken the remaining bootloader and trusted boot code integrity defenses.
At this point, I would like to give some credits to Thomas Roth and the rest of the wallet.fail team. They published this memory alias based attack concept as part of the F00DBABE
attack in 2018, see the talk section of their classic 35C3 presentation. I half-remembered, half re-discovered this on my own for the KeepKey, but their work is clearly a direct inspiration for the attack idea.
By making the privileged thread write into the aliased flash region, the write protections for sectors 0
, 5
and 6
are circumvented without the strict need for special offsets or complete sector overwrites.
This allows more targeted overwrites of individual areas than the previously described VULN-22005
vulnerability.
As a result of this attack, the complete flash memory can be replaced with arbitrary contents, which breaks the core security model of the KeepKey root of trust.
Please read the following section carefully.
By the nature of the KeepKey hardware wallet design, access to SWD
and other debug interfaces is permanently disabled on production devices and production firmware.
This is done with the explicit goal to prevent read or write access to the flash.
As a result, there is no intended or straightforward way to recover from problems with the boot-related flash memory.
Testing the issues discussed in this article directly requires erasing or modifying flash content in those essential sectors, so there is a good chance that you’ll permanently turn your test device into a dead device. No, it’s not resting - it’s stone dead! 🦜.
To prevent any devices from passing on due to catastrophic flash writes, it is required to both
STM32F205
microcontroller that is not in RDP2
state.A custom KeepKey devkit can be built by SMD rework, specifically by replacing the TQFP64 chip with a new chip in factory configuration and programming the custom bootloader and firmware variants.
In this configuration, a hardware debugger like the STLINK-V3 can be connected and used to restore flash contents externally as well as controlling the execution. Note that the MPU and thread privilege mechanisms are still active, the unit is just at RDP0
debug protection level.
The POC section describes testing steps with such a setup.
The following proof-of-concept steps will be deadly to your device unless you have working hardware debugger access. You have been warned.
This is a combined proof-of-concept for two issues.
For VULN-22004
, the sector number 261
is used to target sector 261 % 256 = 5
.
For comparison, the following call with firmware-level access would lead to a memory exception due to the MPU:
See the high-level summary.
The discovered KeepKey issues apply to all recent bootloader versions since the problems in supervisor.c
have been present for multiple years.
The coordinated disclosure went similarly to the VULN-22003 disclosure that started slightly earlier in February with the same vendor. I received a lot of good feedback and confirmation in a technical call about two weeks into the disclosure.
Unfortunately, there was a significant gap in the communication in April where I was unable to reach them via multiple communication channels. As a result, I did not have a chance to comment on their patch set before the release or coordinate with them on a publication date. It’s good to see that they still released a firmware fix and public acknowledgment within the 90-day timeframe. I have been able to re-establish communications in May.
I’m looking forward to the full vendor advisory, which has not been released at the time of writing.
Product | Source | Known Affected Version | Fixed Version | Patch | Vendor Publications | IDs |
---|---|---|---|---|---|---|
ShapeShift KeepKey | GitHub | bootloader ≤ bl_v2.0.0 | bootloader bl_v2.1.4 | patch1 | bl_v2.1.4 + v7.3.2 GitHub Changelog | CVE-2022-30330 VULN-22004, VULN-22005, VULN-22006 |
I’m not aware of other hardware wallets with practical security impacts.
Please note that I’ve included SatoshiLabs in the disclosure communication due to the Trezor One product to ensure that there are no practical vulnerabilities on the Trezor side where some the code originated from after finding a minor code issue. Ultimately, the Trezor One did not have any practical issues and we did not switch to a full multi-vendor format for the coordinated disclosure. This approach was discussed with both vendors.
I want to emphasize that the main work for this security research was done on my own time and initiative. In particular, the original research that led to the discovery of the issue was not sponsored by SatoshiLabs.
With agreement by ShapeShift, I spend some paid hours on extended background research to evaluate the potential security impacts of related issues on the Trezor project for SatoshiLabs.
Date | Information |
---|---|
2022-02-23 | Confidential disclosure to ShapeShift, with CC to SatoshiLabs |
2022-03-10 | Technical call with ShapeShift, ShapeShift acknowledges the issues |
2022-04-26 | ShapeShift releases patched bootloader version bl_v2.1.4 together with firmware v7.3.2 |
2022-04-26 | ShapeShift publishes a short advisory summary via the GitHub tag description |
2022-05-07 | CVE-2022-30330 assigned by MITRE |
2022-05-18 | Publication of this blog article |
ShapeShift paid a bug bounty for this issue.
]]>I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
The details of this issue revolve around low level concepts and implementation details in the KeepKey firmware. The code area in question has been the source of other serious issues before, for example CVE-2019-18671, and was originally derived from the Trezor One firmware several years ago.
The limited hardware capabilities on the KeepKey wallet and its finite state machine (FSM) message handling require strong restrictions on how new logical tasks can be scheduled or interrupted to prevent errors. For user-facing tasks such as the confirmation of cryptocurrency transactions, it is also meaningful to disallow user interface actions to interrupt each other. This helps keeping the flow simple and unambiguous for the user while important actions such as transaction confirmations are performed.
In the codebase, this is implemented by the separation of communication messages into two classes: normal and tiny messages. Normal messages can trigger complex new tasks. In contrast, tiny messages are focused on essential user input and cancellations. If a message-related interaction is needed during a complex action, the global message handling is restricted to tiny messages, avoiding major interruptions and other problems. In the code, a global Boolean state variable determines if message processing is restricted or not.
The KeepKey developers made a number of changes to the USB packet handling code after adopting it from the original codebase in 2014.
One of the changes was to rename the global tiny
variable to msg_tiny_flag
:
The msg_tiny_flag
was still used in an identical role for global message state handling after the rename.
In 2018, the KeepKey developers adopted U2F support (a two-factor authentication protocol) based on code from the Trezor.
During this software port, they apparently missed the difference in the message handling on the KeepKey side and re-introduced a global tiny
variable for use with the U2F code:
As a result of the double state handling in the KeepKey, the message restrictions of the U2F functionality and of all other message handling functionality are independent of each other and don’t lock in the originally intended way. This means that U2F actions and U2F dialogs can still be invoked while other functionality of the KeepKey has triggered the restricted message mode, and the same is true in the other direction as well.
The possibility to interrupt important dialogs and button confirmations breaks user assumptions about the basic interaction with the device. It can be leveraged to trick the user into interacting with the new dialog B that pops up while dialog A is supposed to be ongoing. Fortunately, this attack is limited to U2F <> non-U2F dialog combinations.
The most relevant attack scenario that I could find is related to a two-factor authentication (2FA) bypass with user interaction:
Note that this attack is based on a number of preconditions with regards to the host malware capabilities, known information, existing use of the KeepKey as U2F hardware token and user interaction. It also operates at the edge of what U2F is normally protecting against, since most U2F tokens do not have a screen to show what is being accepted. Still, without the discovered message handling vulnerability, this attack scenario would not be possible.
In their current patch, the KeepKey developers have not solved the problem of the double message state handling. Instead, they have chosen to apply a partial mitigation for the described U2F bypass attack by preventing U2F dialogs from getting auto-accepted immediately. This is definitely an improvement, but doesn’t fully resolve the underlying issue in my opinion.
The mapping shown here aims to represent the impact of the U2F bypass scenario described above, but the issue is difficult to score. There may be different impact through other attack combinations.
Description | CVSS 3.1 | Score |
---|---|---|
VULN-22003 | CVSS:3.1/AV:L/AC:H/PR:L/UI:R/S:U/C:L/I:L/A:N | 3.3 (Low) |
The disclosure process with ShapeShift started out well, with good direct feedback. Unfortunately, there was a significant gap in the communication in April where I was unable to reach them via multiple communication channels. As a result, I did not have a chance to comment on their patch before the release or coordinate with them on a publication date. Still, it’s good to see that they released a firmware fix and public acknowledgment within the 90 day timeframe. I have recently heard back from them in May.
Product | Source | Known Affected Version | Fixed Version | Patch | Publications | IDs |
---|---|---|---|---|---|---|
ShapeShift KeepKey | GitHub | v7.2.1 | v7.3.2 | patch1 | v7.3.2 Changelog | VULN-22003 |
I’m not aware of other affected hardware wallets.
Note that I’ve included SatoshiLabs in the disclosure communication to ensure that there are no related vulnerabilities on the Trezor side where some the code originated from. We did not find an issue in the Trezor One product that required switching to a multi-vendor format for the coordinated disclosure.
I want to emphasize that this security research was done on my own time and initiative. In particular, the original research that led to the discovery of the issue was not sponsored by SatoshiLabs, for whom I do some paid freelance security research on the related Trezor project.
Date | Information |
---|---|
2022-02-09 | Confidential disclosure to ShapeShift, with CC to SatoshiLabs |
2022-02-10 | ShapeShift acknowledges receipt of the disclosure and assigns a VULN ID |
2022-03-10 | Technical call with ShapeShift |
2022-04-26 | ShapeShift releases patched firmware version v7.3.2 |
2022-05-05 | Publication of this blog article |
ShapeShift paid a bug bounty for this issue.
]]>I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
What is EMFI? By applying a short electric pulse through a coil, a localized and strong electromagnetic field can be generated. This field injects voltages into inner parts of electronic chips that are near some parts of the coil, which triggers all kinds of side effects that the chip designers did not expect or want to happen. For some processors, EMFI can be used to alter the program behavior in ways that are interesting from a security perspective, such as skipping CPU instructions.
There are other fault injection methods like voltage glitching or clock glitching. Typically, they are based on interfering with electrical connections and components that are exposed outside of the chip. These techniques have a lot of value as well, and I use them via the ChipWhisperer (also made by NewAE), as described in other articles. However, there are a few significant differences of EMFI over those methods:
As a result, EMFI can be used trigger some effects that are hard or impossible to reach with more classical techniques.
I saw the PicoEMP project release in January and quickly decided to build one. I ordered the PCB, PCB stencil and electronic parts for both the device and a number of custom coil probes. My choices were mainly based on the hardware bill of materials (BOM) recommendations in the project repository and what was available at electronics distributors at the time. Despite the component shortages, I managed to get suitable parts for everything.
The PicoEMP generates ↯ high voltage ↯ and may permanently harm you, your environment or any electronics around it. The following build photos and remarks have not gone through exhaustive safety checks, so assume that they are wrong and use your own judgment. This article is not at all a step-by-step tutorial and there are many aspects of a DIY project that can go catastrophically wrong.
Without warranty of any kind. Build and use at your own risk.
The assembly starts with the PCB. Due to known issues with the design files, the ordered PCB is missing the two milled slots near the high-voltage area and for the two enclosure tabs. I was able to work around those issues to still mount the safety shield (as explained later) but recommend checking twice if the PCB manufacturer in question correctly handles the design in the manufacturing preview.
The rest of the board looked acceptable, so I went ahead with the build.
Since most of the components are SMD parts, I ordered a matching solder mask stencil to distribute the solder paste.
After adding of the paste, the PCB looks like this:
With the paste applied, I started to place the components for the right-hand side of the PCB. The large component footprint in the center is for the pre-made Raspberry Pico2040 board which I decided to hand-solder. It is not included in the solder paste steps and added later.
Build suggestion: watch out for the correct orientation of the LEDs, as they can look visually different on the front while oriented in the same way, and the silkscreen direction marking is easy to miss. I think the switches only have to be correct left-right and are wired identically on both sides, but you should check this out yourself.
After the components of the right-hand side were in place, I used a hot air station to melt the solder paste.
A proper reflow oven would be more reliable and consistent, but the DIY technique works for prototypes like this.
The SMD switches are sensitive to heat, so I will likely hand-solder them with a soldering iron next time.
Next I started placing the components for the high voltage section.
Build suggestion: this requires some attention to detail, especially with the direction of the transformers, diodes and the phototransistor.
The hot air station soldering of the components went fine, except for the optional switch SW3
and pin connector J3
which deformed a bit in the heat, but they are still functional. As mentioned before, on potential future builds I will likely hand-solder these components to avoid heat damage.
The next photo also has the SMA connector soldered on at the left board edge and protected with white shrink-wrap:
On the right sight of the PCB, the Raspberry Pico and the battery connector is soldered on:
Due to the milling issue, it was necessary to customize the protective plastic enclosure by stripping off its tabs. The screws that come with the enclosure don’t fit into the mounting holes properly, so I’ve used a pair of smaller screws that do not have this problem.
Build result:
I’ve flashed the microcontroller with the C firmware, which worked without an external programmer due to the mass storage support. Up and running, the PicoEMP works and successfully glitches targets. Great!
The only bug I’ve noticed so far is that the arming button is unreliable on my unit. I think this is a software problem, but will re-check the electrical button behavior at some point to debug this further.
The previous section covered the build process of the main device, but there are additional parts required for fault injection, namely probe tips. I’ll show some of them here as well.
The PicoEMP can be fitted with a number of different electrical coil probes. It is important that they are exchangeable since otherwise the PicoEMP would be limited to one specific coil characteristic. In general, the probes are mainly designed around their SMA connector that allows easy swapping, the coil with its ferrite material in various configurations and some protective shrink wrap.
To summarize a complex topic, different coils are required for individual injection targets and use cases. The coil dimensions, coil type, number of windings, winding direction, number of layers, ferrite core form and other design aspects play a role in the strength and size of electromagnetic field that is generated. The physics details go beyond this article, but the injection tips section of the upstream documentation is a good place to start if you’re looking for component ideas.
Here are some probes in various stages of assembly:
For my second set of custom probes, I decided to try out different variants of designs and also include some physically smaller probes for more accurate injections.
The PicoEMP is a really interesting DIY tool for advanced uses in hardware hacking, and I think it is great that it is available under an open license and accessible. The DIY nature of the tool also means that you have to get your hands dirty to get one (at least at this stage). Since the complex subject of EMFI needs a lot of experimentation time in any case, there are benefits in getting to know the low-level details of the device operation and probe design.
I have some future plans and ideas for potential tests of the device and experimental improvements, which may give some material for a second article at some point in the future.
]]>I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
The PhyWhisperer is a special tool for USB-related hardware security research. Its main selling point is the ability to quickly sniff and trigger on USB packets at USB 2.0 speed through the use of an FPGA, which is difficult to achieve with other tools in this price class. While computers can sniff their incoming and outgoing USB traffic fairly easily, the typical delays make it difficult to act upon that information with other equipment. Like other tools from NewAE, the PhyWhisperer is very open and both the software and the hardware schematics are public, which I think is really useful for complex equipment.
For this article, the most relevant functionality of the hardware is to turn the 5V USB target power supply on and off based on software control from the host computer. This is just a side feature, but very useful during research if frequent USB device restarts of a target device are required.
I used the PhyWhisperer in this USB power switch role while experimenting with voltage glitching based fault injection on a connected USB target via the ChipWhisperer device. The ChipWhisperer glitch line was directly hooked up to the “Shunt Out” port and I used a custom variable shunt resistor. Due to errors in the glitching setup, the PhyWhisperer was exposed to quickly repeating voltage glitches on the target’s USB power supply line for some time that periodically shorted 5V USB to ground over a ~3 Ohm resistor.
I expected the PhyWhisperer to be robust against this sort of condition given its design, after all the internal components (described later in the article) are designed to detect and limit repeated short circuits without going up in smoke, but unfortunately that was not the case. One of the takeaway messages is that while the intended use case of the PhyWhisperer includes some basic side channel measurements over a voltage shunt right in the device, that port is not designed to be used for fault injection.
After the mentioned incident, my PhyWhisperer unit did not work correctly anymore. The main symptom was the inability to power the USB target via the external USB port. So the natural next step was to open up the device and see if I can repair it with the help of the existing public design documents, and learn more about the device and its limitations in the process.
Disclaimer: Using and repairing a broken device like this can be dangerous for you or your equipment, even if it “just” involves 5 Volt DC. Do this at your own risk.
My first round of analysis was based on a visual inspection and some limited electrical probing of the PCB, focusing on the polyfuses, USB-related traces and components as well as power regulators, since they were potentially stressed to a breaking point.
My initial inspection didn’t show any obvious damage. I thought about checking some of the electrical characteristics of essential components and voltage rails, but this is very time-consuming.
For some faults, one can take a debugging shortcut by checking the device with a thermal camera. The basic idea is to identify for misbehaving components, partial shorts and other issues by looking for unusual heat spots on the PCB while it is in operation.
The camera used here is “just” an entry-level Seek Thermal Compact
camera that plugs into a smartphone.
It has a lot of sensor noise, general design limitations, unreliable absolute temperature readings and a mediocre smartphone application, but the manually adjustable focus and decent pixel resolution (for its price range) still make it interesting for this sort of occasional device analysis.
In the following grayscale images, bright pixels indicate higher temperatures, while darker pixels indicate lower temperatures or reflective metal surfaces. The absolute temperature readings are unreliable.
Closer visual inspection shows that clearly something isn’t right with the U4
component from the last thermal image.
Following the surrounding traces suggests that it is related to the USB power switching functionality without even looking at the schematic.
Its chip package is warped, likely because the internal silicon shorted out, which would explain the heating when powered.
The package damage was not yet present when I first looked at the board and showed up during the intermediary testing as the chip heated up more.
Knowing the broken component, I set out to learn more about the chip, find a spare and replace it. Additional electrical checks and remaining functionality suggested that the other chips of the device behaved normally.
The official schematic lists U4
and its twin neighbor U5
as the Diodes Incorporated AP22802
in a SOT-25
package.
Manufacturer datasheet description:
The AP22802 is a single channel current-limited integrated high-side power switch optimized for Universal Serial Bus […]
To summarize, the two switches control if the USB power to the target is supplied from the external connector (via U4
), from the USB host that controls the PhyWhisperer (via U5
), or none of those sources (= power off). In theory, both power sources could be enabled at the same time, but that would be a problem in case of any supply voltage mismatches, so the microcontroller control logic ensures at most one of the switches is active at any time.
Early in the repair process, I checked for alternative component options to see if there are other parts with a similar pinout that have additional protection features or lower current limits, but did not find any ideal candidates.
During this process, I noticed in the datasheets that the AP22802
may be problematic if used in a dual switch arrangement.
Similar to other high side USB power switches from other manufacturers that I looked at, its design assumes that it is alone on the USB power supply line and can always pull it to ground with a ~100 Ohm resistor when it is not supplying power on its own.
This discharge
mechanism is used to empty any capacitors that are on the power supply line.
In the dual switch arrangement of the PhyWhisperer, a hard-wired discharge effect is counterproductive since one switch partially shorts out (~100 Ohm) at the same time that the other switch is supplying power. Additionally, the PhyWhisperer has its own dedicated circuit to do discharges if necessary, so the switches themselves don’t have to do this.
I reached out to NewAE via email to clarify if there is a known problem and if AP22802AW5-7
was in fact the right replacement part, but didn’t hear back from them until the first replacement part order arrived after some days.
Therefore, I went ahead and replaced the broken chip via my hot air rework station, hoping the replacement part would be good enough.
After replacing the broken component, the PhyWhisperer was working and USB target devices could be powered through U4
again. Yay!
However, something was still wrong after the first repair. In some switch configurations of U4
and U5
,
specifically when U5
is supplying the power, the replaced U4
chip still got hotter than expected.
It wasn’t getting very hot, but the issue was clearly visible on the thermal camera.
At the same time, I was aware of the fact that the new chip marking XA 1K B
didn’t match the VW 6 yA
I had replaced.
This all pointed to the previously discovered fact that the replacement part was likely suboptimal.
Additional datasheet searching confirmed that the parts number of what shipped with hardware revision Rev 04A
corresponds to the Diodes Incorporated AP2171A
switch that does not have the discharge functionality built in.
So the official NewAE schematic was incorrect and used the outdated part number. Grr.
Schematics are not a great help during repair if they are leading people in the wrong direction.
To ensure I didn’t miss any important details, I wrote a post on the issue in the official NewAE forum. This was quickly answered and led me to errata documentation that confirmed the expected technical details.
After this, the path was clear for another order of replacement parts, this time the AP2171AW-7
.
Once the second replacement component arrived, I did another round of hot air rework and replaced U4
again.
This time, the part number prefix VW
matched the one it originally shipped with.
While the repair took a significant amount of time, technical deep dives like this can also be a chance to learn a lot about the capabilities and limitations of your equipment. This is especially helpful when using experimental equipment in more or less unintended or untested ways, which happens a lot during hardware hacking.
I have the impression that a future hardware revision of the device could be more robust against electrical fault conditions by actually checking the fault indicator pin of the USB switches and shutting down power in case of anomalies, but there may be other reasons why the existing design has not done this. I may revisit this topic in the future for other USB-related projects.
As a result of the repairs, my PhyWhisperer is fully operational again and works without internal heating short circuits, which I’m happy about. If anyone runs into similar issues in the future, I hope this article can shed some light onto the topic.
]]>V1.3
).
I’m a freelance Security Consultant and currently available for new projects. If you are looking for assistance to secure your projects or organization, contact me.
This covers the basic components of the wallet, which helps understanding how it operates. The core electronics are located under a soldered RF shield and not yet visible, opening this up may follow at a later time. Notably, the wallet is still operational after the steps shown here, so it appears that there are no tamper detection sensors in this part of the hardware.
]]>