Back on February 22 this year, AT&T suffered a massive outage after a botched network update. Now, months later, the full scale of the outage has been revealed — and it’s much bigger than we thought at the time.
“All voice and 5G data services for AT&T wireless customers were unavailable, affecting more than 125 million devices, blocking more than 92 million voice calls, and preventing more than 25,000 calls to 911 call centers,” the Federal Communications Commission (FCC) said in its report.
The 911 failures are the biggest problem here for reasons that are easy to understand. There’s nothing worse than trying to reach 911 in an emergency and being unable to get through. The FCC report does show that AT&T attempted to restore FirstNet (First Responder Network Authority) first before residential and commercial AT&T users, which sounds like the correct way to go about it.
However, the FCC also criticized AT&T for several failures on top of failing to test the implementation of the network change. The FCC pointed to a lack of oversight and controls to ensure test that processes were followed or that the processes themselves were insufficient. AT&T was also unprepared for the congestion caused by user devices attempting to reconnect to the network at the same time. AT&T’s network was insufficiently robust to mitigate the congestion.
The outage affected all 50 states and territories, including Washington, D.C., Puerto Rico, and the U.S. Virgin Islands. The FCC criticized AT&T in the report for not following best practices, which require all network changes to be thoroughly tested, reviewed, and approved before implementation. The outage took over 12 hours for AT&T to resolve and fully restore service.
If this sounds a little familiar, it’s because it’s very similar to the Crowdstrike outage, which took out Microsoft devices globally last week and also involved the rolling out of an untested update with faulty code that broke everything. It’s also rather familiar because Verizon suffered a similar outage with a flawed update file in December 2022, something that earned it a penalty from the FCC and subjected it to a consent decree.
To AT&T’s credit, the carrier at least acknowledged responsibility for the outage and provided impacted customers with a $5 credit, which likely didn’t even cover a fraction of most people’s phone bills. On the plus side, the consequences from the FCC are likely to be more severe, with the Public Safety and Homeland Security bureau referring the matter to the FCC Enforcement Bureau to investigate a potential violation of FCC rules.
If found responsible, AT&T could be made to pay a more significant fine and be subject to implementing a compliance plan, just like Verizon did last month when it had to pay a $1.05 million fine for its outage in December 2022. In terms of fixes, within two days of the outage, AT&T implemented new technical controls.
“This included scanning the network for any network elements lacking the controls that would have prevented the outage, and promptly putting those controls in place. AT&T has engaged in ongoing forensic work and implemented additional enhancements to promote network robustness and resilience,” the FCC said. The company also “implemented additional steps for peer review and adopted procedures to ensure that maintenance work cannot take place without confirmation that required peer reviews have been completed.”
This should help prevent future outages, at least for AT&T. However, it seems that the failures that lead to untested updates being pushed out are widespread across the industry, so hopefully, this will serve as a wake-up call for companies.