Risk & Repeat: Faulty CrowdStrike update causes global outage CrowdStrike: Content validation bug led to global outage

CrowdStrike outage shows business continuity still a must

Disaster recovery has centered on cyberattacks the past few years, but the CrowdStrike outage illustrates why companies can't forget about traditional business continuity.

A faulty update issued by cybersecurity vendor CrowdStrike brought numerous companies to their knees July 19, exposing critical vulnerabilities in disaster recovery plans.

IT analysts said the interconnected nature of SaaS, cloud services and modern applications contributed to the massive IT outage, which triggered a looping blue screen of death on devices running the Windows OS with CrowdStrike's threat detection software.

This connectivity in software and hardware is needed for organizations to operate and meet consumer demands, but they must also consider traditional business continuity plans and recovery, analysts said.

The advent of cloud services and automation has led to a lax attitude toward patching, updates and system access, according to Chris Steffen, vice president of research at Enterprise Management Associates. Customers that rely on vendor automation for updates should plan for worst-case scenarios and exert more control, both in business contracts and when rolling out software updates, he added.

"I don't know how seriously people are taking disaster recovery anymore because [with the cloud] that's someone else's problem," Steffen said. "If there's a silver lining to this [event], it'll get the attention of the not-technically-minded members of your board."

Automated automation

CrowdStrike has issued guidance on how to undo the error, which has affected an estimated 8.5 million devices, and now offers an automated remediation capability.

If there's a silver lining to this [event], it'll get the attention of the not-technically-minded members of your board.
Chris SteffenVice president of research, Enterprise Management Associates

Those tools might come as cold comfort for industries such as airlines, which suffered massive service disruptions and grounded flights for days due to the thousands of Windows endpoints that accepted the faulty update.

Companies that use security and threat detection software from vendors such as CrowdStrike are still better off automatically accepting updates despite this setback, Steffen said, as they patch for new vulnerabilities and cause little issue.

"We've been screaming for years about the necessity of having automatic updates," Steffen said. "Most of the world is starting to get that message, and then it happens that an automatic update destroys the world."

Automating updates is also needed to ensure that software continues to function across the entire organization's IT infrastructure, according to Jerome Wendt, president and founder of Data Center Intelligence Group.

Organizations adding software and complexity to their IT environments should understand a map of their resources, how each component of software or hardware could affect these services and the blast radius in the case of failure, he said.

"We have to live in the real world," Wendt said. "There's going to be interdependencies. … I don't put that on CrowdStrike. I put that on the organizations. That's on you to know what's going on."

Timeline for July 2024 CrowdStrike outage and its effects

Road to recovery

Ideally, a recovery plan for a software issue should include automated rollback capabilities, according to Keith Townsend, chief technology adviser at the Futurum Group.

"This is definitely one of those things you can prepare for," Townsend said. "A lot of this is standard stuff [for recovery]."

These rollbacks should separate the application data from the OS, enabling a more granular recovery without affecting data, he said.

In this case, deployment of automated remediation tools was further complicated by many of these systems running the CrowdStrike software on Windows desktops typically managed by employees, according to Steven McDowell, founder and chief analyst at NAND Research.

As remote or hybrid work becomes increasingly standardized, IT teams will need to make sure they have recovery options available for endpoints like these alongside servers and other hardware, since getting access on demand could be difficult, McDowell said.

"We're realizing the desktop is critical infrastructure," he said.

Organizations have spent the past decade considering backup and disaster recovery from the perspective of cyberattacks through ransomware, but traditional disaster and outages are still an ongoing threat, according to Mike Matchett, founder and president of Small World Big Data.

Developing a business continuity strategy might require additional hardware resources, a costly addition for some organizations. It might also mean finding ways to keep the business operational even if it comes down to using pen and paper. It's a hard lesson IT has learned again and again throughout history.

"It's maybe time to take a deeper look at your business continuity plan over disaster recovery," Matchett said. "If you're going to talk recovery, it's a multi-day process."

Tim McCarthy is a news writer for TechTarget Editorial covering cloud and data storage.

Next Steps

Microsoft, SecOps pros weigh kernel access post-CrowdStrike

Is today's CrowdStrike outage a sign of the new normal?

Microsoft: Faulty CrowdStrike update affected 8.5M devices

Dig Deeper on Disaster recovery planning and management

Data Backup
Storage
Security
CIO
Close