Musings, deliberations, and end results.
Felix Jen – 03 February 2023 – 9 min read
For a workaround on AT90USB-based keyboards, jump down.
As an average user, I don’t bat an eye when plugging in a USB device. When I plug in a USB device of any sort into an open USB port on my computer, I expect the device to connect and begin working almost seamlessly, at most requiring an automatic driver install. This “automagic” connection, while completely transparent in its process, relies on a complex myriad of enumeration and negotiation routines behind the scenes. Most of us don’t even notice this is happening until something goes wrong.
Recently (or more like, it’s been a while), things have gone wrong with the AT90USB series of embedded AVR microcontrollers designed by Atmel (now Microchip). The AT90USB series, as its name implies, includes a USB PiP (Programming-in-Place) bootloader which allows programming of the IC through a standard USB connection to the programmer. The microcontroller (hereinafter, “MCU”) upon entering the bootloader state, presents itself as a
libusb0-based device to the computer and the computer will download and install the appropriate device driver.
As used in multiple FJLaboratories projects (e.g., Solanis, Scarlet, Velvet, among others), the AT90USB646 (the baseline model within the AT90USB series) is a core MCU in my lineup. However, it was routinely brought to my attention that flashing the MCU through USB directly was resulting in errors for users across Windows machines. While the device would appear to initially connect, any attempts to flash would result in the programmer failing to detect a device available and thus failing to flash. This problem caught my eye because it was (1) a core issue affecting usability and functionality; (2) repeatable across multiple occurrences, devices, and users; and (3) reproducible at close to 100%.
To narrow the problem further, this problem occurs exclusively in Windows-based machines, whereas Unix-based environments (e.g., macos, Linux, Android) appear entirely unaffected.1 This problem was also limited to only Windows 8.1 and newer, with older installations such as Windows 7 being entirely unaffected.2 Having narrowed the scope of the issue, it was originally believed that this was due to a driver issue. Windows is notorious for having subpar driver experience across products. While this seemed to be the perfect root cause candidate, repeated attempts to replace the core driver using a QMK provided driver, Atmel provided driver, Microchip provided driver, and Zadig provided driver did not clear up this problem across the problem base.
Originally, the plan was to write off this issue as a fact of life. On or around December 27, 2022, the Solanis project was delivered to Dallagen.xyz and some of the PCBs within the batch were delivered without a proper PiP procedure performed. Without the PiP, the boards did not contain firmware and therefore were nonfunctional. Unfortunately, some of this batch was delivered to customers and three reports were received of users unable to flash the firmware onto the PCB due to the Windows error. While some users readily had access to Unix based systems, a few did not. Therefore, a deeper dive into the problem had to be performed.
To further grasp the problem at hand, we need to quickly examine the USB protocol. As a bidirectional data protocol with multiple coexisting functions available, the USB device needs a way to communicate its properties to the USB host (the computer). This occurs immediately upon the USB device being connected, in a step known as “enumeration.” During this process, the host device requests and the device sends various bits of data describing itself. These typically include the Vendor ID (VID), Product ID (PID), name of the device, device power requirements, and some of the device’s USB capabilities. One of these capabilities is the USB operating specification, such as USB 1.0, USB 1.1, USB 2.0, USB 3.0, and so on.
In order to perform this enumeration quickly, USB devices come with a “hierarchy of descriptors” which contain the relevant data for each property. The descriptors branch from an overall singular “Device Descriptor,” following into “Configuration Descriptions,” then “Interface Descriptors,” then “Endpoint Descriptors,” and finally “String Descriptors.”
Microsoft has provided a comprehensive look at how Windows 7 machines3 perform their device enumeration. While the exact details of this process are not critical, the important step is the “Configuration Descriptor Request” stage, where the USB driver stack will issue a request for the device’s USB Configuration Descriptor. Note that as part of this request, the sole validation that is performed involves the
bLength field and the
bDescriptorType fields. No other additional validation is performed on such data.
With the release of Windows 8.0 and subsequently, the USB protocol got more complicated with additional capabilities in the USB 2.1 and USB 2.0 LPM (“Link Power Management”) versions. This necessitated a more enhanced enumeration stage for the machine to understand what the device is capable of. If this detailed enumeration is not performed, the machine would be drawing blanks to what the device could actually perform and may push it harder that it may actually allow.
Microsoft responded to this requirement by changing the enumeration protocol in Windows 8. Specifically, they added an additional query as part of the “Configuration Descriptor Request” stage. The USB 3.0 and USB 2.0 LPM specifications define a new descriptor known as the Binary Device Object Store (BOS). This BOS holds the descriptors for the additional capabilities of the device. Older USB protocols do not have this BOS and therefore there is nothing to query there. To know the USB protocol version, the machine queries the
bcdUSB value, a two-byte data value. This data value contains the protocol’s Major Revision Number, Minor Revision Number, and sub-Minor Revision Number. For example, a USB 2.0 device will have a
0200, where a USB 3.1 device will have a
As per Microsoft’s description of changes, if the
bcdUSB is greater than
0200, the machine will immediately further request the BOS from the device. If it is less, the machine will skip this step.
In programming the bootloader for the AT90USB series of MCUs, Microchip (or Atmel) made a single typo in the
bcdUSB value. The AT90USB series fully supports the USB 2.0 functionality but nothing higher. Therefore,
bcdUSB should be
0200. However, Microchip had set
2000. This corresponds to USB protocol support of USB 20.0, instead of USB 2.0. Correspondingly, following the Windows 8.1+ enumeration, Windows will see that
2000 is greater than
0200 and will request the BOS form the device. The AT90 series was never designed for this query, and will therefore not return anything.
Thus, while Windows expects a BOS to be returned by the device due to receiving an erroneous
2000, it gets nothing. Enumeration therefore fails and the AT90USB series bootloaders become nonfunctional in Windows.
Unix systems, I can only assume, handle the
bcdUSB to BOS query differently and do not have a “hard” validation check, but perhaps fail softly.
AS a result of the change, it appears that Microsoft’s engineers are acutely aware of the impact of their “fail safe” methodology for verifying the BOS. Posting a lengthy support article regarding the enumeration failure suddenly happening in Windows 8.1, they clearly mention the change to the enumeration procedures. While it’s difficult to determine whether a fail-safe approach taken by Microsoft or a fail-open approach taking by other Unix systems is “better,” we can clearly see the inconvenience of a fail-safe approach through Microsoft’s article. Another classic tradeoff between “security” and “convenience,” but that’s a discussion for another day.
As provided by Microsoft, the official workaround in case you have a misbehaving device (like the AT90USB series’ bootloader) involves changing a few Windows Registry keys to ignore the enumeration error. Luckily, this is a very easy change to perform with simply three lines to copy and paste into Windows Command Prompt.
For AT90USB64_ devices, copy and paste the following lines into Windows Command Prompt as an administrator:
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FF90000" /v "osvc" /t REG_BINARY /d "0000" /f reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FF90000" /v "SkipContainerIdQuery" /t REG_BINARY /d "01000000" /f reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FF90000" /v "SkipBOSDescriptorQuery" /t REG_BINARY /d "01000000" /f
For AT90USB128_ devices, copy and paste the following lines into Windows Command Prompt as an administrator:
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FFB0000" /v "osvc" /t REG_BINARY /d "0000" /f reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FFB0000" /v "SkipContainerIdQuery" /t REG_BINARY /d "01000000" /f reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\usbflags\03EB2FFB0000" /v "SkipBOSDescriptorQuery" /t REG_BINARY /d "01000000" /f
This step only needs to be performed once.
In a broad stroke, the workaround above instructs Windows to skip the querying of the BOS for a specific Vendor ID, Product ID, and revision. For AT90USB64_ devices, the Vendor ID is
03EB and Product ID is
2FF9. For AT90USB128_ devices, the Product ID changes to
2FFB. This is why the two sets of commands are different.
As the AT90USB series bootloader operates as an independent device from the main MCU’s function (which has its own VID and PID), skipping BOS queries on the bootloader’s own VID and PID is harmless. Furthermore, as each USB device should theoretically have a unique VID and PID, this registry edit should never overlap with any other devices connected to the computer, besides other AT90USB series bootloaders. This is intended, as of the same type of device (same VID and PID) will have the same issue.
Prior to this workaround, the standard workaround I had been advocating for was using a Unix based machine for any flashing tasks. All users with the problem reported that this resulted in a success. Personal testing using exclusively macos systems confirms no problems. [Back]
While personally not having any users or experience with Windows 7 or earlier, the factory, prior to shipping any PCBs, performs a functional test and PiP. Initially the factory line reported the identical issue, but later was able to perform PiP without issue. Internal discussions revealed that while employees’ individual machines run Windows 10, the line dedicated to PiP relies exclusively on Windows 7 machines. The Windows 7 machines were able to properly perform the suite of PiP tasks without issue. [Back]
While this article is dated October 12, 2018, the article was originally posted on October 30, 2009 in reference to the Windows 7 USB stack. [Back]