Facebook’s servers house a massive amount of data, so it should come as no surprise that they regularly publish research reports using that data on everything from relationships to the color of the infamous dress. In this case, the study was not about the data itself, but the effect it has on the SSDs that it’s stored on. The researchers found that it wasn’t usage, as is commonly believed, that wears down flash memory, but temperature that has the most effect on data integrity.
The flash storage technology that enables SSDs provides some big advantages over the mechanical drives that are mostly used for data now. They have their own issues though, one of which is burnout. Flash memory is only rated for a limited number of read and write operations on each cell, and after that point the cell will have trouble storing the information accurately.
That’s not something that the home user would start to notice in their drives until after a decade of heavy usage. But when drives are being rewritten entirely every day or two, like in enterprise server settings, it starts to show up quickly, and can affect data integrity. That’s why the massive scale of the Facebook study allowed them to identify some trends in data integrity over time.
When a drive is first thrown into action, it has an initial period the researchers call “early detection” in which the memory controller tracks data loss and learns which cells aren’t as stable. This reduces the failures for some time, before the useful life of the drive starts to wear out. They found that drives that ran hotter tended to fail the fastest, while cooler drives had a longer effective lifetime.
The study has implications not just on the use of SSDs in server settings, but also on their design. If manufacturers know the way to build towards longer cell lifetime is better temperature management, they can build better cooling into the units, and into the servers that hold them.