Vince Kuchar, CEO of RMC, & Brad Bekampis, Senior Cybersecurity Specialist
In July 2024, a major cybersecurity incident involving CrowdStrike’s endpoint detection and response (EDR) software sent shockwaves through industries around the globe. A seemingly innocuous update to CrowdStrike’s Falcon product resulted in the dreaded Blue Screen of Death (BSOD) for millions of devices, most notably affecting Delta Airlines, which suffered an estimated $500 million in operational losses as a result. While most of the ensuing conversation, including a recent hearing of the house Homeland Security Committee, focused on single points of failure in the IT realm, the incident raised critical questions for those in the operational technology (OT) sector – especially in environments where failures can lead to far more severe consequences than grounded flights.
The Critical Difference Between IT and OT
For organizations managing critical infrastructure – such as power plants, water treatment facilities, or pharmaceutical companies – the risks associated with applying standard IT approaches to OT environments are immense. A BSOD in an IT system may mean lost data or disrupted services, but in an OT system, such failures could result in operational shutdowns, safety hazards, or even the loss of life.
The CrowdStrike incident highlighted a fundamental truth: OT environments demand a different level of rigor when it comes to system updates, testing, and deployment. Unlike IT environments, where automation and speed often take priority, OT systems require a far more cautious, manual approach to minimize the risk of failure. For industries dependent on continuous operations, every system update must be carefully vetted in a controlled environment before being rolled out.
The Risk of Automation in Critical Infrastructure
CrowdStrike’s update, which involved kernel-level access, demonstrates the potential dangers of untested configurations in OT environments. While automation is a core component of OT operations, applying updates immediately after release without thorough testing can introduce significant vulnerabilities. For OT systems controlling critical infrastructure, delaying updates by 30, 60, or even 90 days can provide more stability and reduce risk by allowing more time for testing in real-world conditions. In environments where operational continuity is paramount, rushing updates into production can lead to costly outages, as seen with the BSOD debacle, which required manual reboots on a massive scale – an unacceptable outcome for critical infrastructure.
For companies managing OT systems, this incident raises essential questions:
- To what extent can we trust automated updates in OT environments?
- How do we ensure third-party vendors like CrowdStrike don’t inadvertently cause catastrophic failures?
- What is our backup plan to minimize operational impact in the event of an unexpected outage? Are we prepared to implement this plan?
RMC has long emphasized the importance of rigorous, staged testing in OT environments, recognizing that the stakes are simply too high to allow for the kinds of automated processes that are common in IT. Our approach ensures that updates, patches, and configurations are vetted thoroughly in a Quality Assurance environment, minimizing the chances of unplanned disruptions.
Lessons for OT and ICS Professionals
This event provides a cautionary tale for OT security professionals: when systems and processes are designed for IT environments and applied to OT, the results can be disastrous. Here are some key takeaways from the CrowdStrike incident for those managing critical infrastructure:
- Manual Control Over Updates: Automated rollouts may work in IT, but for OT, every update must be manually tested, confirmed, and deployed with caution. This ensures that any potential issues are caught before they have the chance to impact critical operations.
- Vendor Access and Oversight: Third-party vendors providing software deployments must be thoroughly vetted, and their access to critical systems should be tightly controlled. CrowdStrike’s ability to push updates directly into the kernel of millions of systems raises concerns about the potential for future supply chain vulnerabilities. Companies such as Rockwell, Siemens, Emerson, and other OT vendors often test their software updates in their labs prior to releasing approved versions.
- The Importance of Segmentation: OT systems should be segmented from IT environments to prevent updates from unexpectedly reaching critical OT systems. Many organizations still struggle with segmentation, leaving them vulnerable to the kind of widespread disruption we saw with this incident.
- Staged Rollouts: Only vendor tested and approved updates should be deployed in OT environments at designated times to avoid disruptions to operations. A small-scale deployment to non-critical systems provides a window to catch issues before they cascade across an entire organization.
Implications for RMC’s Clients: Why OT Security Requires a Different Approach
The recent CrowdStrike incident highlights a critical lesson for RMC’s clients, particularly in sectors like power, utilities, and pharmaceuticals: relying on IT-centric solutions for OT environments can introduce significant risks. The potential for downtime or operational failures in critical infrastructure is too great to apply solutions without rigorous testing and OT-specific considerations.
At RMC, we understand that OT systems demand a unique approach to cybersecurity and risk management. Our strategies are tailored to the specific needs of industries that require uninterrupted operations, balancing security with operational integrity. By focusing on manual control, staged updates, and comprehensive patch management, we protect your critical systems from both external threats and internal disruptions.
The key takeaway is clear: IT solutions alone won’t suffice in OT environments. For industries where downtime is unacceptable, the right approach combines security, careful planning, and extensive testing to ensure systems remain both secure and operational. RMC remains committed to providing this level of protection, safeguarding your infrastructure against evolving threats.
For more insights on how RMC helps secure OT environments, explore our latest resources and stay informed on best practices in OT cybersecurity, mission assurance, and risk management.
How can RMC help your organization?
Contact us today: sales@rmcglobal.com
Be sure to follow RMC on LinkedIn, and bookmark our News & Perspectives website to stay apprised of industry insights and topical advice on establishing cyber resiliency in OT environments.