Airlines grounded flights. Operators of 911 lines could not respond to emergencies. Hospitals canceled surgeries. Retailers closed for the day. And the actions all traced back to a batch of bad computer code.
A flawed software update sent out by a little-known cybersecurity company caused chaos and disruption around the world Friday. The company, CrowdStrike, based in Austin, Texas, makes software used by multinational corporations, government agencies and scores of other organizations to protect against hackers and online intruders.
But when CrowdStrike sent its update Thursday to its customers that run Microsoft Windows software, computers began to crash.
The fallout, which was immediate and inescapable, highlighted the brittleness of global technology infrastructure. The world has become reliant on Microsoft and a handful of cybersecurity firms like CrowdStrike. So when a single flawed piece of software is released over the internet, it can almost instantly damage countless companies and organizations that depend on the technology as part of everyday business.
“This is a very, very uncomfortable illustration of the fragility of the world’s core internet infrastructure,” said Ciaran Martin, the former CEO of Britain’s National Cyber Security Center and a professor at the Blavatnik School of Government at Oxford University.
A cyberattack did not cause the widespread outage, but the effects Friday showed how devastating the damage can be when a main artery of the global technology system is disrupted. It raised broader questions about CrowdStrike’s testing processes and what repercussions such software firms should face when flaws in their code cause major disruptions.
While outages are common, often caused by technical errors or cyberattacks, the scale of what unfolded Friday was unparalleled.
“This is historic,” said Mikko Hypponen, the chief research officer at WithSecure, a cybersecurity company. “We haven’t had an incident like this.”
George Kurtz, CrowdStrike’s CEO, said that the company took responsibility for the mistake and that a software fix had been released. He warned that it could be some time before tech systems returned to normal.
“We’re deeply sorry for the impact that we’ve caused to customers, to travelers, to anyone affected by this,” he said in an interview Friday on NBC’s “Today” show.
Satya Nadella, Microsoft’s CEO, blamed CrowdStrike and said the company was working to help customers “bring their systems back online.” Apple and Linux machines were not affected by the CrowdStrike software update.
A White House official said the administration was in “regular contact” with CrowdStrike and had convened agencies to assess the impact of the outage on the federal government’s operations.
CrowdStrike, founded in 2011 by Kurtz and others, has built a reputation over the years as a firm that could solve even the toughest security problems. It was tapped to investigate a 2014 hack of Sony Pictures and the 2016 hack of the Democratic National Committee, which exposed Hillary Clinton’s emails.
But problems stemming from CrowdStrike’s products have surfaced before. In April, the company pushed a software update to customers running the Linux system that crashed computers, according to an internal CrowdStrike report sent to customers about the incident, which was obtained by The New York Times.
The bug, which did not appear to be related to Friday’s outage, took CrowdStrike nearly five days to fix, the report said. CrowdStrike promised to improve its testing process going forward, according to the report.
On Thursday, the tech issues began when Microsoft dealt with an outage on its cloud service system, Azure, which affected some airlines.
Then CrowdStrike sent an update for its software called Falcon Sensor, which scans a computer for intrusions and signs of hacking. If everything had gone according to plan, CrowdStrike’s software would have received minor improvements and customers would have hardly noticed.
Instead, when CrowdStrike’s faulty update reached computers running Microsoft Windows, it caused the machines to shut down and then endlessly reboot. Workers around the world were greeted with what is known as the “blue screen of death” on their computers. Insufficient testing at CrowdStrike was a likely source of the problem, experts said.
As computers restarted themselves over and over, known as the “doom loop,” there was little CrowdStrike could do to fix the problem. Tech staff at affected companies were faced with a choice: walk around to each machine and remove the bit of flawed code, or wait and hope for a solution from CrowdStrike.
The problems cascaded instantly. At Sydney Airport in Australia, travelers encountered delays and cancellations, as did those in Hong Kong, India, the United Arab Emirates, Berlin and Amsterdam. At least five U.S. airlines — Allegiant Air, American, Delta, Spirit and United — grounded all flights for a time, according to the Federal Aviation Administration.
Health care systems were crippled, forcing hospitals to cancel noncritical surgeries. In the United States, 911 lines went down in multiple states, though many of those problems were being resolved later Friday. Britain’s National Health Service also reported issues.
“We knew we had a catastrophe on our hands,” said B.J. Moore, the chief information officer for Providence Health, which has 52 hospitals in seven states. He said 15,000 servers were down and 40,000 of Providence’s 150,000 computers were affected, adding that it was “worse than a cyberattack.”
UPS and FedEx said they were affected. Customers with TD Bank, one of the biggest banks in the United States, reported issues accessing their online accounts. Several state and municipal court systems closed for the day because of the outage.
At CrowdStrike, engineers described an atmosphere of confusion as the company struggled to contain the damage.
Executives urged employees not to speculate on why the mistake happened and directed them to focus on a fix for the computers that were affected, said two engineers who spoke on condition of anonymity because they were not authorized to speak publicly. Computers not connected to the cloud required a physical fix to the error introduced by CrowdStrike, they said, which could take weeks.
Within several hours of the faulty software going out, CrowdStrike sent out a software patch as a fix that would stop computers from endlessly rebooting.
Lukasz Olejnik, an independent cybersecurity researcher and consultant, said the outage would still take time to resolve because a suggested solution for some organizations involved rebooting each computer manually into safe mode, deleting a specific file and then restarting the computer.
While that is a relatively straightforward process, security experts said, it may not be easy to do at scale. Those with organized and well-staffed information technology teams could potentially fix the issues more quickly, Olejnik said.
Unlike the iPhone software updates that Apple sends to customers, the incident highlighted information technology systems that operate in the background. The CrowdStrike issues were compounded because the software being updated performed critical cybersecurity tasks, giving it access to scan a computer to look for viruses and other malicious attacks.
Cybersecurity tools operate quietly in the background to defend computers against attacks. The software is frequently updated with new defenses as hackers develop fresh methods of attack, but constant updates mean there are many opportunities for mistakes to happen.
“One of the tricky parts of security software is it needs to have absolute privileges over your entire computer in order to do its job,” said Thomas Parenty, a cybersecurity consultant and a former U.S. National Security Agency analyst. “So if there’s something wrong with it, the consequences are vastly greater than if your spreadsheet doesn’t work.”
On Friday, the stock price of CrowdStrike, which reported $3 billion in annual revenue last year, closed down 11%.
© 2024 The New York Times Company