en

Why Enterprises Are Running AI on OpenStack Private Cloud

Three forces are converging in enterprise IT right now, and their intersection is forcing a decision that most CIOs did not expect to make this decade: where, exactly, do you run AI?

The first force is economics.Broadcom’s 2023 acquisition of VMware produced one of the most disruptive licensing shocks in enterprise infrastructure history. Reported cost increases ranging significantly, in some cases exceeding several hundred percent, depending on organization size and deployment topology. Enterprises that had optimized their VMware spend over years of careful purchasing now face the choice between absorbing costs that no longer fit any reasonable IT budget model, or migrating. Over 80% of OpenInfra members have already talked to customers about migrating workloads from VMware to OpenStack.

The second force is sovereignty. AI workloads run on training data, and training data is — by definition — your most valuable organizational asset. Running that data on infrastructure governed by foreign law creates a jurisdiction problem that no contractual commitment can resolve. European enterprises, in particular, are confronting a structural conflict between GDPR, the approaching EU AI Act, and the reality that most public cloud infrastructure is ultimately subject to US jurisdiction.

The third force is maturity. OpenStack, which spent the better part of a decade being dismissed as “too complex for production,” has quietly become the most capable open-source platform for the kind of heterogeneous, GPU-intensive compute that AI requires. It runs the UK’s Dawn AI supercomputer. It runs NVIDIA’s internal training infrastructure. It runs critical cloud services at Walmart, Bloomberg, and CERN. The narrative that OpenStack is not enterprise-ready has not been accurate for several years.These three forces are pushing enterprises toward the same decision: move AI workloads to OpenStack private cloud. And that decision, once made, immediately raises a question that most organizations are not prepared to answer: what does your backup strategy look like now?

Force 1: The VMware Cost Shock and the Migration Wave

To understand the current momentum toward OpenStack, you have to understand the scale of disruption that Broadcom’s VMware restructuring created. This was not a modest price adjustment. Starting April 10, 2025, the minimum VMware license purchase increased from 16 cores to 72 cores per product. For organizations with smaller deployments or distributed edge infrastructure, this single change produced extreme cost multipliers regardless of actual compute usage.

The bundling model compounded this. Broadcom consolidated VMware’s product catalog from approximately 8,000 SKUs to a few bundled offerings, primarily VMware Cloud Foundation and vSphere Foundation — forcing customers to purchase bundled products like NSX Networking and vSAN even if they do not need them. Organizations that had spent years purchasing only what they needed now found themselves paying for capabilities they have no use for, with no opt-out mechanism.

The result is a migration wave that is real, large, and accelerating. The OpenInfra Foundation reports that VMware migrations and public cloud repatriations are expected to significantly increase OpenStack adoption in the coming years.

For organizations planning AI investments, this migration wave has a specific implication: the infrastructure budget that was previously allocated to VMware licensing is now available for redeployment. And the destination of choice — for enterprises that want control over their AI stack — is OpenStack. Not because it is cheaper as a slogan, but because it is genuinely cheaper in practice, and because it provides capabilities that VMware, by design, does not: hardware-level GPU control, open API extensibility, and no dependency on a vendor that can unilaterally restructure its pricing model.

The enterprises making this shift are not doing so naively. They are trading known complexity for control. And control, when it comes to AI infrastructure, is not a preference. It is a strategic requirement.

Force 2: AI on Public Cloud Is a Sovereignty Problem

There is a conversation that European CIOs are having with their legal and compliance teams that goes roughly like this: “We have selected a hyperscaler with a European data center. Our data is physically in Frankfurt. Are we compliant?”

The answer is: not necessarily. And with AI workloads specifically, the compliance risk is significantly higher than for standard SaaS or infrastructure workloads.

The fundamental issue is jurisdictional. The US CLOUD Act of 2018 authorizes American authorities to compel US-headquartered technology companies to produce data they control, regardless of where that data is physically stored. AWS, Microsoft Azure, and Google Cloud are all American companies. Selecting an EU region in their consoles does not change the legal structure of who controls the underlying infrastructure. Sovereignty cannot be configured after the fact — it must be designed into the foundation through open, auditable platforms like OpenStack and Kubernetes, deployed on infrastructure the organization controls.

For AI workloads, this creates three specific risks that standard cloud workloads do not carry at the same severity:

  • Training data contains personal data

Many AI training datasets contain personal data — customer records, behavioral data, health information, financial transactions. Under GDPR, this data must be processed under conditions where the organization can demonstrate full control and the ability to honor data subject rights. When the infrastructure processing that data is subject to foreign jurisdiction, demonstrating that control becomes structurally more difficult. “We have a Data Processing Agreement with the cloud provider” is no longer a sufficient answer when the provider can be legally compelled to act contrary to that agreement.

  • The EU AI Act imposes data lineage requirements

The EU AI Act’s most consequential provisions take effect on August 2, 2026, covering high-risk AI systems in biometrics, critical infrastructure, employment, law enforcement, and more — with penalties reaching up to €35 million or 7% of global turnover. For high-risk AI systems, the Act requires documentation of training data provenance, validation datasets, and model development processes. This is not purely a technical requirement — it is an audit trail requirement. And that audit trail is significantly easier to maintain when the infrastructure is under your direct control, with full logging and immutable records, rather than mediated through a third-party cloud provider’s compliance documentation.

  • AI training data is a geopolitical asset

This may sound abstract, but it is increasingly concrete for regulated industries. The datasets used to train AI models in financial services, healthcare, defence, and critical infrastructure have strategic value that goes beyond commercial sensitivity. The question of who can access those datasets — and under what legal compulsion — is now a board-level governance question in many European enterprises, not an IT procurement question.

OpenStack private cloud resolves this structurally. When the infrastructure runs on hardware you own, in a data center you control, under a jurisdiction you have selected, the CLOUD Act problem is significantly reduced. There is no American intermediary who can be compelled to produce your data. The jurisdiction is the jurisdiction of the hardware, which is yours.

→ For the full regulatory analysis covering EU AI Act, GDPR, and DORA requirements mapped to specific backup and data protection configurations, see: EU AI Act, GDPR, and DORA: What OpenStack Operators Must Do Before August 2026.

Force 3: OpenStack Is Now Production-Ready for AI

The case against OpenStack for AI infrastructure used to have some merit. GPU scheduling was immature. The operational complexity of managing a distributed compute platform without dedicated engineering resources was real. The upgrade process, in the early years, was legitimately painful.

None of those objections hold today.

The Caracal release (OpenStack 2024.1) delivered production-grade GPU support that addresses the core requirements of AI training and inference workloads. New features include the ability to support vGPU live migrations in Nova — users can now move GPU workloads from one physical server to another with minimal impact (in supported configurations), something enterprises have been asking for because they want to manage costly GPU hardware as efficiently as possible. This is not a workaround or a preview feature. It is production capability used in production environments.

The GPU scheduling stack now includes PCI passthrough for direct hardware access, vGPU for multi-tenant GPU sharing, NVIDIA MIG (Multi-Instance GPU) support for partitioning high-end GPUs across multiple workloads (depending on hardware and driver stack), SR-IOV for AMD GPUs, and NUMA-aware placement that ensures GPU-intensive VMs are scheduled on hosts where the GPU and memory are on the same NUMA node — which matters for training job performance. For inference workloads, vLLM can be integrated with OpenStack-based environments to deliver LLM-as-a-Service with an OpenAI-compatible API endpoint, meaning tenants can consume model inference through familiar interfaces without managing GPU infrastructure directly.

The operational story has also improved. Distributions like Canonical Charmed OpenStack, Red Hat OpenStack Platform, and Platform9 Private Cloud Director remove most of the day-two operational complexity that made early OpenStack deployments challenging. They handle upgrades, security patches, and configuration management. The investment in specialized OpenStack engineering that was once a prerequisite for production deployment is now optional for organizations using a managed distribution.

The evidence of production-scale AI on OpenStack is no longer anecdotal. It is documented, public, and recent. FPT Smart Cloud built a significant OpenStack footprint spanning three regions, two zones per region, and more than 100 physical servers per zone — specifically for AI and HPC workloads — selecting OpenStack over VMware and Azure after extensive benchmarking. The OpenInfra AI Working Group exists specifically to accelerate this pattern across the broader enterprise ecosystem.

The maturity threshold has been crossed. For enterprises evaluating private AI infrastructure, OpenStack is not a compromise. It is one of the leading options.

What This Means for Your AI Infrastructure Strategy

The decision to run AI workloads on OpenStack private cloud is, at its core, a decision about where control sits in your technology stack. Public cloud provides flexibility and speed at the cost of control. OpenStack private cloud provides control at the cost of operational ownership.

For most organizations, the right answer is not either/or. Burst compute for non-sensitive workloads can run on public cloud. Training runs on sensitive data, inference for regulated applications, and any AI system that handles personal data under GDPR jurisdiction is typically better suited to infrastructure you control. This is the hybrid AI architecture pattern that is emerging as the practical norm across European enterprise.

The decision framework is straightforward:

AI Workload Characteristic Infrastructure Recommendation Rationale
Training on personal or sensitive data OpenStack private cloud GDPR, EU AI Act data lineage requirements; CLOUD Act risk elimination
Inference for regulated applications (finance, health, critical infrastructure) OpenStack private cloud Auditability, DORA resilience requirements, data residency
Non-sensitive experimentation, R&D, burst training Public cloud or hybrid Speed, flexibility, no long-term GPU commitment
Production inference at scale on non-sensitive data Either, based on TCO analysis Depends on volume, latency requirements, and cost model

For workloads in the first two rows of that table — which is where most enterprise AI investment is actually concentrated — OpenStack private cloud is the right answer, and the regulatory and commercial forces described above are making that conclusion increasingly obvious to CIOs who previously might have defaulted to public cloud.

The Backup Problem Nobody Planned For

Here is where the strategic picture intersects with an operational reality that most organizations discover too late.

Moving AI workloads to OpenStack private cloud solves the sovereignty problem. It solves the cost problem. It provides the control and auditability that regulated industries require. But it creates — or rather, reveals — a data protection gap that existed all along but was previously someone else’s problem.

On public cloud, backup is partially managed for you at the infrastructure level. S3 has versioning. RDS has automated snapshots. The hyperscaler’s infrastructure has redundancy built in. None of these are adequate data protection (redundancy is not backup, and “the cloud” losing a region is a documented event, not a theoretical risk), but they create a baseline that lulls many organizations into a false sense of security about their AI data.

On OpenStack private cloud, there is no baseline. You own the infrastructure. You own the backup responsibility. And the data you are now responsible for protecting includes some of your organization’s most valuable assets: training datasets that took months to build, model checkpoints that represent hundreds of hours of GPU compute, inference infrastructure that your business-critical applications depend on.

The organizations that move AI workloads to OpenStack and immediately ask “how do we protect this?” are the ones that avoid expensive lessons. The ones that treat backup as a follow-on project — something to figure out after the AI platform is running — are the ones that learn about the gap at the worst possible moment.

What changes about backup when you add AI workloads

AI workloads on OpenStack have specific data protection characteristics that standard VM backup approaches do not address:

  • Scale: Training datasets are large. A well-curated image recognition dataset can be hundreds of gigabytes. An NLP training corpus can be terabytes. Backup approaches that work for standard enterprise workloads become operationally and economically unviable when applied naively to AI data volumes. You need Change Block Tracking and intelligent incremental strategies — not nightly full image copies.
  • Heterogeneity: AI data does not live only on VM disks. Training datasets may live on Cinder block volumes, Ceph object storage, and Swift simultaneously. Protecting only the VM misses most of the valuable data. Protection needs to span the full storage layer of the OpenStack platform.
  • Regulation: The EU AI Act’s data lineage requirements mean backup is not just an operational concern — it is a compliance artifact. You need immutable, auditable backup records that can demonstrate to a regulator what data trained your AI system, when, and that the data has not been tampered with. This requires WORM-immutable backup destinations and comprehensive audit logging, not just recovery capability.
  • Kubernetes: If your AI platform includes Kubernetes-based inference or MLOps workloads alongside OpenStack VMs — which is the common pattern — you need unified protection that covers both layers. Running separate backup solutions for VMs and containers doubles operational overhead and creates coverage gaps at the boundary between them.

→ For the technical architecture of backup for AI workloads on OpenStack — including CBT configuration, Ceph RBD integration, and GPU instance handling — see: How to Protect GPU Workloads on OpenStack: Backup Architecture for AI Training Infrastructure.

Data Protection That Matches the Infrastructure Decision

Storware Backup and Recovery has supported OpenStack natively since 2019 — not as an afterthought integration, but as a foundational platform for which the product was specifically designed. The architecture is agentless, which matters for GPU instances where agent overhead is unacceptable. The storage integration covers the full OpenStack storage layer: Nova instances, Cinder volumes, and Ceph RBD directly, not through a generic mount-and-copy abstraction.

For the regulatory requirements introduced by the EU AI Act, GDPR, and DORA, Storware provides WORM-immutable backup destinations, IsoLayer air-gap protection against ransomware, AES encryption at rest, full RBAC and audit logging, and Recovery Plans with schedulable testing — the specific capabilities that compliance frameworks require, not generic backup marketing language.

The single-license model is worth noting for organizations managing heterogeneous OpenStack deployments. One Storware license covers all supported OpenStack distributions — Red Hat OpenStack Platform, Canonical Charmed OpenStack, vanilla upstream, Platform9, OpenMetal, Virtuozzo VHI — as well as Kubernetes and OpenShift container workloads. When your AI platform spans multiple OpenStack distributions and a Kubernetes inference layer, you do not need a separate license for each component.

The infrastructure decision — moving AI to OpenStack private cloud — is the right decision. The data protection posture needs to match it.

→ For the complete technical reference covering all backup requirements for AI workloads on OpenStack, see the pillar page: Backup and Data Protection for AI Workloads on OpenStack: The Complete Guide.

Ready to Protect Your OpenStack AI Infrastructure?

If you are evaluating OpenStack as the foundation for your private AI platform — or if you have already made that decision and are now asking the backup question — the right next step is a 30-minute architecture conversation. Not a demo of features you can read about. A conversation about your specific infrastructure, your regulatory context, and what a production-ready protection posture looks like for your environment.

Book a technical consultation →

Or start with a 60-day free trial — deploy Storware against your OpenStack environment and see the protection posture in place within hours.

text written by:

Paweł Piskorz, Presales Engineer at Storware