Dell & HPE Issue Updates to Fix 40K Hour Runtime Flaw in Enterprise SSDsby Anton Shilov on March 27, 2020 4:00 PM EST
In a second SSD snafu in as many years, Dell and HPE have revealed that the two vendors have shipped enterprise drives with a critical firmware bug, one will eventually cause data loss. The bug, seemingly related to an internal runtime counter in the SSDs, causes them to fail once they reach 40,000 hours runtime, losing all data in the process. As a result, both companies have needed to issue firmware updates for their respective drives, as customers who have been running them 24/7 (or nearly as much) are starting to trigger the bug.
Ultimately, both issues, while announced/documented separately, seem to stem from the same basic flaw. HPE and Dell both used the same upstream supplier (believed to be SanDisk) for SSD controllers and firmware for certain, now-legacy, SSDs that the two computer makers sold. And with the oldest of these drives having reached 40,000 hours runtime (4 years, 206 days, and 16 hours), this has led to the discovery of the firmware bug and the need to quickly patch it. To that end, both companies have begun rolling out firmware
As reported by Blocks & Files, the actual firmware bug seems to be a relatively simple off-by-one error that none the less has a significant repercussion to it.
The fault fixed by the Dell EMC firmware concerns an Assert function which had a bad check to validate the value of a circular buffer’s index value. Instead of checking the maximum value as N, it checked for N-1. The fix corrects the assert check to use the maximum value as N.
Overall, Dell EMC shipped a number of the faulty SAS-12Gbps enterprise drives over the years, ranging in capacity from 200 GB to 1.6 TB. All of which will require the new D417 firmware update to avoid an untimely death at 40,000 hours.
Meanwhile, HPE shipped 800 GB and 1.6 TB drives using the faulty firmware. These drives were, in turn, were used in numerous server and storage products, including HPE ProLiant, Synergy, Apollo 4200, Synergy Storage Modules, D3000 Storage Enclosure, and StoreEasy 1000 Storage, and require HPE's firmware update to secure their stability.
As for the supplier of the faulty SSDs, while HPE declined to name its vendor, Dell EMC did reveal that the affected drives were made by SanDisk (now a part of Western Digital). Furthermore, based on an image of HPE’s MO1600JVYPR SSDs published by Blocks & Files, it would appear that HPE’s drives were also made by SanDisk. To that end, it is highly likely that the affected Dell EMC and HPE SSDs are essentially the same drives from the same maker.
Overall, this is the second time in less than a year that a major SSD runtime bug has been revealed. Late last year HPE ran into a similar issue at 32,768 hours with a different series of drives. So as SSDs are now reliable enough to be put into service for several years, we're going to start seeing the long-term impact of such a long service life.
- Western Digital Introduces WD Gold Enterprise SSDs
- Western Digital Starts Sales of WD_Black P50 USB 3.2 Gen 2x2 SSDs
- Western Digital Ultrastar DC SS540 SAS SSDs: Up to 15.36 TB, Up to 3 DWPD
Sources: Blocks & Files, ZDNet
Post Your CommentPlease log in or sign up to comment.
View All Comments
eastcoast_pete - Friday, March 27, 2020 - linkOccurrences such as this - planned obsolescence baked right into SSD firmware - is just another reason why I like to keep backups on spinning rust for cold storage.
Also, I guess they got the timing wrong; these SSDs bricked before the 5 year warranty was up (: tsk, tsk
FunBunny2 - Friday, March 27, 2020 - link"backups on spinning rust for cold storage"
arguably, tape is more durable.
ballsystemlord - Friday, March 27, 2020 - linkNot when the pets get a hold of it. :)
InTheMidstOfTheInBeforeCrowd - Saturday, March 28, 2020 - linkYou should have checked the sysadmin certfications of your pets before adopting them. Due diligence, my man... ;-P
eastcoast_pete - Sunday, March 29, 2020 - linkI thought those certs looked shifty(: But then, they are great at shredding data on the hardware level..
Samus - Saturday, March 28, 2020 - linkIt is hard to wrap my head around this not being intentional. But by who? HPE and EMC have vested interest in keeping continuing support subscriptions in place, but this bug seems to be the direct result of SanDisk QA failure. SanDisk has nothing to gain by this bug, it actually hurts their reputation. So maybe HPE and EMC REQUESTED similar “features”?
InTheMidstOfTheInBeforeCrowd - Saturday, March 28, 2020 - linkIntentional? So the intention was to force a new firmware on the devices at time X or else they lose their data?
Why? Doens't matter... Never let questions like "Why?" get in the way of a "good" (=absurd) conspiracy theory.
FunBunny2 - Saturday, March 28, 2020 - link"conspiracy theory"
but wasn't that the explanation of early Intel SSD bricking?
Samus - Sunday, March 29, 2020 - linkThat’s exactly what this situation reminded me of. I am not a conspiracy theorist, my career requires me to be fact driven. But this just doesn’t add up when you consider such a ridiculous flaw in such a mission critical scenario, and that HPe and EMC are the only two enterprise suppliers in their segment that require continuing support subscriptions for out of warranty hardware (typically 1-3 years, in other words before this bug would materialize) when every other competitor only discontinues free firmware and ongoing driver support when hardware hits EOL.
Kvaern1 - Sunday, March 29, 2020 - linkMaking older SSD's/GPU's/whatever perform worse via driver or not delivering driver updates after a certain timeperiod has passed are examples of planned obsolence.
Secret planned drive bricking (or any other undocumented "deliberate" selfdestruction of any item you have procurred) is NOT planned obsolence, it's a planned crime.