Microsoft Global Outage – All You Need to Know
Table of contents
On July 19th, several Microsoft users woke up to a blue screen of death instead of a working Windows system. This failure caused a global outage that halted key operations worldwide, including airports, hospitals, and emergency services. There have been many theories about its cause, with some saying it was a cyber attack by Russia or China. However, this outage was unintentionally caused by Microsoft’s cyber security provider, CrowdStrike, one of the world’s leading cyber security companies.
What’s Behind Microsoft’s Global Outage?
In the wake of the event, CrowdStrike founder and CEO George Kurtz took to their blog to tender an apology, explaining the cause of the outage.
On July 19, at 04:09 UTC, CrowdStrike rolled out Falcon content, a sensor configuration update for Windows systems. This update protects the Falcon platform. The sensor configuration, called Channel Files, is usually updated several times a day to keep up with CrowdStrike’s discoveries of techniques and procedures.
However, the update contained a defect that they had missed. The defect triggered a logic error, which caused Windows systems to crash, leading to the system showcasing a blue screen of death. The outage affected users running Falcon sensors for Windows 7.11 and higher.
However, this failure doesn’t affect other operating systems like Linux and Mac. It only affected Microsoft Windows because the update was designed for it. So, while the fault was caused by the popular cybersecurity company CrowdStrike, Microsoft users bore the brunt of the failure. CrowdStrike also provided a detailed technical analysis of the incident.
About CrowdStrike
CrowdStrike is an American cybersecurity technology company located in Austin, Texas. The company provides several cyber security services including advanced malware detection, endpoint threat hunting, endpoint activity monitoring, endpoint lockdown and containment. They have helped to investigate major cyberattacks like the Sony Pictures hack in 2014 and Russian cyberattacks on the Democratic National Committee in 2015 and 2016.
CrowdStrike Falcon, one of its platforms, is a multipurpose platform that helps to stop breaches using a set of cloud-delivered technologies. Falcon utilizes robust solutions like next-generation antivirus (NGAV) and endpoint detection and response (EDR). CrowdStrike ranks first for endpoint security market share, making it the biggest provider, serving thousands of crucial organizations.
Besides the July 2024 failure, CrowdStrike has been involved in several small-scale failures, especially those involving Linux OS. This includes a May 2024 incident reported on the Rocky Linux forums where CrowdStrike software froze after an upgrade to Rocky Linux 9.4. There was also the Red Hat incident in June 2024.
The Effect of the Outage
Millions of people were affected following the Microsoft outage. However, according to Microsoft’s estimate, only 8.5 million Windows devices were affected, which represents less than 1% of Windows devices worldwide.
However, while this seems like a minute per cent, the outage effect was widespread because most critical infrastructure services and large organizations use CrowdStrike’s cyber security services, from hospitals to airports.
The most affected sectors were banking and aviation. Banks like the Bank of England couldn’t operate for hours, and airports worldwide, in Europe, Canada, the United States, and India, had to cancel flights or reschedule. Up to 3,000 flights were canceled and over 10,000 were delayed.
In addition to these, train services, restaurants, telecom companies, the stock exchange, and broadcast stations were also affected. Even emergency services experienced some issues, with some 911 services being operated manually. This outage affected tens of thousands of businesses.
Fixes to the Microsoft Outage
Hours after the crash, CrowdStrike released steps to fix the issue, with several users gaining access as early as that morning. However, the process is quite complex and may be difficult for regular users to perform.
This quick fix doesn’t work in all scenarios, and as a result, CrowdStrike and Microsoft engineers and experts have to manually fix some systems, leading to a slow recovery process. Meanwhile, some users have fixed theirs by continuously rebooting their systems.
A day after the outage, Microsoft also released a faster recovery tool than CrowdStrike. Two days later, CrowdStrike announced it was testing a faster recovery technique. The Department of Homeland Security said it has also been working hand in hand with CrowdStrike, Microsoft, and its critical infrastructure partners to address the system outages.
The Microsoft Recovery Tool was last updated on the 22nd as version 3.1. This tool has two repair options: ‘recover from WinPE,’ which produces boot media to help with the device repair, and ‘recover from safe mode,’ which produces boot media so the device can boot into safe mode.
On the other hand, CrowdStrike provided a dedicated hub, offering updated remediation guidance and best practices to resolve the error. Users can watch videos on how to use the host remediation.
For the latest updates, visit the CrowdStrike support portal or Microsoft Azure Status Dashboard. If the stated recovery method doesn’t work, CrowdStrike advises customers to contact their CrowdStrike representatives or Technical support.
Cybersecurity Threats Following the Outage
Several cybersecurity threats have emerged after the incident, as hackers have leveraged the outage to use social engineering techniques to deceive people. As a result of this cyber threat, the United States Cybersecurity and Infrastructure Security Agency (CISA) encouraged users to be vigilant and follow guidance for legitimate sources.
CrowdStrike, through its counter-adversary operations, also published a list of blogs on websites impersonating CrowdStrike and their methods.
Lessons Learnt From the Global Outage
The Microsoft global outage has raised several concerns, such as the effect of over-dependence on a single service provider.
Microsoft averages over one billion users per month, and following the global failure, it’s evident that thousands of organizations, including federal organizations, depend on Microsoft. Also, CrowdStrike owns about 24 percent of the endpoint security market share.
This overdependence is the reason for the system failure’s large-scale impact, grounding tens of thousands of organizations and creating millions of financial losses to several industries. Thus, organizations need to look into diversifying their tech infrastructure to prevent such a complete shutdown. There’s also the issue of cybersecurity’s vulnerability. While this outage resulted from an error, it presents the likely effect of a global cyberattack.
Such large-scale cyber attacks can shut down most of the world’s industries. Thus, there is a need to step up cybersecurity with rigorous methods to prevent malicious activities.
Another issue is the need for a thorough incident response plan to ensure quick recovery. There’s also a need for cyber insurance that considers losses besides cyberattacks, like unintentional acts. With organizations losing millions due to the crash, having cyber insurance will reduce the losses.