EU AI Act, GDPR & DORA: Data Protection Requirements for OpenStack AI
Table of contents
- Three Regulations, One Infrastructure Problem
- EU AI Act: What It Actually Requires from Your Infrastructure
- DORA: Already in Force, Already Being Enforced
- GDPR: The Foundation That AI Regulation Builds On
- NIS2: The One Regulation That Is Still Catching Organizations Off Guard
- Regulatory Requirements Mapped to Technical Controls
- Audit-Ready vs. Audit-Surviving: The Distinction That Matters
- Where to Start: A Compliance Readiness Checklist
- Frequently Asked Questions
- Compliance Is Not a Future Project
Compliance officers have a word for the situation I am about to describe. They call it “retroactive exposure.” It means discovering that something you have been doing for years — or not doing — turns out to have been a regulatory requirement all along, and you are now subject to enforcement for the period during which you were non-compliant.
For organizations running AI workloads on OpenStack private cloud, the August 2026 enforcement deadline for the EU AI Act creates a specific version of this problem. AI systems that have been in production for months or years — trained on datasets, producing decisions that affect people, operating in regulated domains — suddenly fall within a compliance framework that has specific, auditable requirements for data governance, record-keeping, and operational resilience. The systems already exist. The data has already been processed. The question is whether the infrastructure that protects that data can demonstrate compliance after the fact.
The answer, in most cases, is: not yet. But the gap between “not yet” and “compliance-ready” is smaller than many organizations assume, because the technical expectations introduced by the EU AI Act, GDPR, and DORA frequently converge on the same infrastructure layer: backup, auditability, and data protection.
Three Regulations, One Infrastructure Problem
The first thing to understand about the EU regulatory landscape for AI is that there is no clean separation between EU AI Act, GDPR, and DORA. They overlap. They reinforce each other. And they all make demands on your infrastructure that are, at their technical core, demands on how you protect and retain data.
Here is the convergence point: all three frameworks require that you can demonstrate, to an auditor, what happened to specific data — when it was processed, who had access, what state it was in at a given point in time, and that it has not been modified without authorization. This is, precisely, what a correctly configured backup and data protection system provides. The audit trail that satisfies GDPR’s accountability requirement is the same audit trail that satisfies the EU AI Act’s technical documentation requirement and DORA’s ICT risk register. The immutable backup destination that protects against ransomware also provides the tamper-evidence that a regulator demands for AI training data.
This is the insight that changes the budget conversation: backup infrastructure for AI workloads on OpenStack is not three separate compliance line items. It is one set of technical controls that simultaneously satisfies three separate regulatory obligations. That changes the cost-benefit calculation significantly.
→ For the financial framing of this calculation, see: The Hidden Cost of Unprotected AI Infrastructure.
EU AI Act: What It Actually Requires from Your Infrastructure
The EU AI Act (Regulation EU 2024/1689) entered into force on 1 August 2024 and becomes fully applicable on 2 August 2026 for high-risk AI systems listed in Annex III. These include AI systems used in biometrics, critical infrastructure management, education and vocational training, employment and worker management, essential private and public services, law enforcement, migration and border control, and the administration of justice.
One important nuance that most summaries miss: the European Commission proposed a “Digital Omnibus” package in late 2025 that could postpone Annex III high-risk obligations for some categories until December 2027. Organizations should not build compliance planning around this potential delay — prudent risk management treats 2 August 2026 as the binding deadline and treats any extension as a bonus. The organizations that will be prepared for an August 2026 audit are the ones planning for August 2026, not the ones waiting for a postponement that may not materialize.
What “high-risk” means in practice
The high-risk classification is broader than many organizations initially assume. A credit scoring model is high-risk. A CV screening tool that ranks job applicants is high-risk. An AI system used in workers’ management — scheduling, performance assessment, termination recommendations — is high-risk. An AI system that makes decisions affecting access to essential services — insurance, banking, healthcare — is high-risk. If your organization is deploying AI in any of these contexts and running that AI on OpenStack private cloud, the data protection requirements below apply to you.
The data governance obligation (Article 10)
Article 10 of the EU AI Act imposes data governance requirements on providers of high-risk AI systems. Training, validation, and testing datasets must be subject to appropriate data governance and management practices — covering the design choices, the data collection process, the data preparation operations (annotation, labelling, cleaning, enrichment, aggregation), and any known data limitations or biases.
Critically: this must be documented and demonstrable. Not described in a policy document. Demonstrable to a regulatory authority on request. The technical implementation of “demonstrable data governance” includes version-controlled datasets with provenance tracking, audit logs of data access and modification, and the ability to reconstruct what dataset, at what version, was used to train a specific model version. In practice, achieving this level of demonstrable governance typically requires a backup and data protection architecture capable of maintaining versioned, auditable copies of training datasets together with associated metadata and retention controls.
The technical documentation obligation (Article 11)
Before a high-risk AI system is placed on the market, providers must draw up technical documentation demonstrating compliance. This documentation must remain up to date throughout the system’s lifecycle and must be retained for ten years after the system is placed on the market or put into service. The documentation includes a general description of the AI system, a description of the elements of the AI system and the process for its development, information on monitoring and functioning, and the data, data collection and data preparation methodology.
Ten years. Organizations deploying high-risk AI systems should plan for long-term retention of relevant technical documentation, audit records, and associated training-data evidence over extended periods that may span many years. Short-term retention configurations — 30 days, 90 days, one year — do not satisfy this requirement for high-risk AI systems.
The record-keeping obligation (Article 12)
High-risk AI systems must be designed to enable automatic logging of events that are relevant for identifying risks and serious incidents. These logs must be retained for at least six months for deployers. For providers of high-risk AI systems, the retention period aligns with the technical documentation retention — effectively the lifecycle of the system.
Operationally: your AI platform on OpenStack needs comprehensive, tamper-evident logging of training runs, model versions, dataset versions, and inference operations. That logging needs to be retained in a way that cannot be retrospectively modified. WORM-immutable storage is one common implementation approach used to support tamper-evident retention requirements.
DORA: Already in Force, Already Being Enforced
Unlike the EU AI Act, DORA is not approaching — it is here. The Digital Operational Resilience Act (Regulation EU 2022/2554) has been in full force since January 17, 2025. Regulators are actively conducting supervisory reviews. For financial entities and their ICT providers running AI on OpenStack, DORA compliance is not a future planning exercise. It is a current operational requirement.
Who DORA applies to
DORA applies to financial entities — banks, investment firms, insurance companies, payment institutions, crypto-asset service providers, and twenty other categories of financial sector entities — and to their critical ICT third-party providers. This second category is significant: if your organization provides OpenStack-based private cloud infrastructure or managed backup services to financial entities, you may fall within DORA’s scope as a critical ICT provider, subject to direct supervision by European Supervisory Authorities and potential fines of up to €5 million or 1% of average daily worldwide turnover.
The specific backup requirements
DORA’s backup requirements are unusually specific for EU regulation — they go beyond principles and describe technical implementation. DORA requires financial entities to:
- Implement a documented backup policy specifying which data is backed up and the minimum backup frequency, with frequency determined by data criticality and sensitivity.
- Implement a third, immutable backup copy of data that is physically and logically segregated from both the primary data and the secondary backup — in practice, many regulated organizations implement air-gapped protection to satisfy operational resilience expectations and ransomware recovery requirements.
- Create systems to recover data within specified recovery time objectives — recovery objectives are expected to align with the entity’s defined criticality classifications, operational resilience strategy, and documented RTO/RPO commitments.
- Test backup and recovery systems at least annually through manual testing, with evidence retained to demonstrate testing occurred.
Let that third point sit for a moment. A third, immutable backup that is physically and logically segregated from primary and secondary sources. This is the regulatory definition of air-gap backup. IsoLayer — Storware’s air-gap protection mechanism — provides exactly this: a backup destination that is logically isolated from the production network, inaccessible during normal operations, and immutable once written. Organizations running AI infrastructure in financial services contexts that are subject to DORA need this capability. It is not a feature. It is a compliance requirement.
The ICT risk register requirement
DORA requires financial entities to maintain a comprehensive register of all ICT assets and ICT-related contractual arrangements with third-party providers. For AI infrastructure on OpenStack, this means documenting every component of your AI platform — compute nodes, storage clusters, network infrastructure, software components — along with their dependencies, failure modes, and backup/recovery configurations. The register must be kept current and must be submitted to regulators on request.
The audit trail that Storware’s backup platform produces — job logs, retention records, recovery test results, access audit logs — contributes directly to this register. A backup system that produces no audit evidence is operationally adequate but regulatory useless.
Penalties for non-compliance
DORA fines for financial entities reach up to 2% of global annual turnover. For critical ICT providers, up to €5 million or 1% of average daily worldwide turnover. Regulators can also suspend or terminate contracts between financial entities and their ICT service providers — which for a managed cloud or backup service provider is an existential penalty, not merely a financial one.
GDPR: The Foundation That AI Regulation Builds On
GDPR (Regulation EU 2016/679) is not new, and most compliance officers are familiar with its core requirements. What is less commonly understood is how GDPR’s requirements change when applied specifically to AI training data and AI inference infrastructure on OpenStack private cloud.
Training data as personal data processing
If your AI training datasets contain personal data — and for most enterprise AI applications they do, whether explicitly (customer records, transaction data, healthcare information) or implicitly (behavioral data, communication logs, activity histories) — GDPR’s data processing requirements apply to the entire training pipeline. This includes the storage infrastructure on which training data sits.
Article 5(1)(f) requires that personal data be processed in a manner that ensures appropriate security, including protection against accidental loss, destruction, or damage. For AI training data on OpenStack Cinder volumes and Ceph RBD clusters, “appropriate security” includes regular backup to independent storage, encryption at rest, and access controls. These are commonly accepted technical controls used to support GDPR security and resilience obligations for AI-related infrastructure and datasets.
The accountability principle and its documentation requirement
GDPR’s accountability principle (Article 5(2)) requires controllers to demonstrate compliance. “We have a privacy policy” is not demonstration. Demonstration means being able to show, to a Data Protection Authority on request, exactly what personal data you hold, where it is, who has access, how it is protected, and what would happen to it in a security incident. For AI training data specifically, this means being able to produce a complete lineage of how personal data moved from collection through preprocessing through training and into any model that makes automated decisions about people.
This is only achievable with infrastructure that maintains versioned, auditable records of data states over time. Backup with retention history — the ability to recover not just the current state but specific historical states of datasets — is a direct technical implementation of the GDPR accountability principle for AI training data.
The right to erasure and AI models
Here is the compliance problem that nobody has a clean answer to: the right to erasure under GDPR Article 17 requires that personal data be deleted on request. If that personal data was used to train an AI model, does the model itself need to be retrained without that individual’s data? The European Data Protection Board has issued guidance suggesting that in some cases, yes — if the model’s outputs can be attributed to specific training data, erasure may require model retraining.
This creates an unexpected backup requirement: you need to be able to restore training datasets to a specific historical state (pre-erasure request), identify the affected records, remove them, and either demonstrate that the model does not retain that information or retrain the model. None of this is achievable without versioned, long-retention backup of training datasets. The right to erasure has become, inadvertently, a backup retention design constraint.
NIS2: The One Regulation That Is Still Catching Organizations Off Guard
NIS2 (Directive (EU) 2022/2555) requires transposition into national law by October 2024 and applies to operators of essential and important entities — covering energy, transport, banking, financial market infrastructure, health, digital infrastructure, and more. Unlike DORA, which is narrowly scoped to financial services, NIS2’s “essential entities” category is wide enough to catch many organizations running AI infrastructure that would not consider themselves primarily “financial.”
NIS2’s backup requirements are less prescriptive than DORA’s but still concrete: appropriate technical and organizational measures to manage cybersecurity risks, including backup management and disaster recovery, as well as incident handling. “Appropriate” is interpreted by national competent authorities — and in the post-NIS2 environment, “appropriate” increasingly means what DORA already specifies, because many organizations increasingly align NIS2 resilience practices with DORA-style operational controls and recovery expectations.
The practical implication: organizations in NIS2 scope that have not yet treated backup as a compliance artifact — with documented policies, defined retention periods tied to data criticality, tested recovery procedures, and audit evidence — are exposed. Not in the distant future. Now.
Regulatory Requirements Mapped to Technical Controls
The table below maps common enterprise interpretations of these regulatory requirements to technical controls frequently used in regulated infrastructure environments. The examples shown are implementation patterns, not the only possible compliance approaches.
| Regulation | Specific Requirement | Technical Control Required | Storware Capability |
|---|---|---|---|
| EU AI Act Art. 10 | Training data governance, demonstrable provenance | Versioned, auditable dataset backup with metadata retention | Policy-based retention, backup versioning, audit log |
| EU AI Act Art. 11 | Technical documentation retained for 10 years | Long-term immutable backup of model artifacts and training records | WORM destinations, configurable retention up to years |
| EU AI Act Art. 12 | Automatic logging retained ≥6 months; tamper-evident | Immutable audit log with SIEM export capability | Full audit log, external SIEM support via API |
| DORA Art. 12(1) | Documented backup policy with criticality-based frequency | Tiered backup policies per data classification | Policy-driven backup with configurable schedules per workload |
| DORA Art. 12(1) | Third, immutable, physically/logically segregated backup copy | Air-gap backup destination, inaccessible from production | IsoLayer air-gap protection |
| DORA Art. 12(1) | Recovery within defined RTO; annual tested failover | Recovery Plans with schedulable automated testing | Recovery Plans, schedulable DR testing, test evidence log |
| GDPR Art. 5(1)(f) | Appropriate security for personal data; protection against loss | Regular backup, AES encryption, access controls | AES encryption at rest, RBAC, Keycloak MFA |
| GDPR Art. 5(2) | Demonstrable compliance (accountability) | Auditable record of data protection measures | Full audit log, configurable reporting |
| GDPR Art. 17 | Right to erasure; potential model retraining requirements | Versioned dataset backup enabling point-in-time recovery | Backup versioning, instant restore |
| NIS2 | Backup management and disaster recovery for essential entities | Documented backup policy, DR capability | Policy-based backup, Recovery Plans, DR testing |
Audit-Ready vs. Audit-Surviving: The Distinction That Matters
There is a meaningful difference between having backup infrastructure that is technically compliant and being able to demonstrate that compliance to a regulatory authority. Most organizations that have invested in backup systems are in the first category. Fewer are in the second.
Being audit-ready for EU AI Act, DORA, or GDPR purposes means being able to answer the following questions — on short notice, with documentary evidence, not from memory:
- What datasets were used to train model version X, and can you produce those datasets in their training-time state?
- What backup policy applies to your AI training infrastructure, and what was the last successful backup of each critical component?
- When did you last test recovery of your AI platform, and what evidence do you have that the test succeeded?
- Who has had access to your AI training data in the last 12 months, and can you produce an audit log of those accesses?
- Is your backup data immutable, and how would you demonstrate that it has not been modified since it was written?
- Where is your third backup copy, and can you demonstrate that it is physically and logically segregated from your primary systems?
If any of those questions produces a pause followed by “I would need to check with the infrastructure team,” you are audit-surviving — getting through reviews because nobody has asked exactly the right questions yet. That is not a stable position as regulatory enforcement matures.
The audit-ready position is producing documentary answers to all of those questions from your backup platform’s reporting interface within minutes of the request. Storware’s audit logging, retention reporting, and Recovery Plan execution records are designed to make this possible.
Where to Start: A Compliance Readiness Checklist
For organizations that need to close the gap between current backup infrastructure and regulatory compliance, the following checklist provides a structured starting point. It is ordered by compliance priority — the items at the top have the most immediate enforcement exposure:
| # | Action | Regulatory Driver | Priority |
|---|---|---|---|
| 1 | Classify your AI systems by EU AI Act risk level | EU AI Act Annex III | Immediate |
| 2 | Document your backup policy including criticality-based frequency per data type | DORA Art. 12(1) | Immediate (DORA in force) |
| 3 | Implement immutable, air-gapped third backup copy for critical AI data | DORA Art. 12(1) | Immediate (DORA in force) |
| 4 | Enable full audit logging for all backup and data access operations | GDPR Art. 5(2), EU AI Act Art. 12 | High |
| 5 | Configure WORM-immutable backup for AI training datasets and model artifacts | EU AI Act Art. 11, DORA Art. 12(1) | High |
| 6 | Implement Recovery Plans with schedulable automated testing | DORA Art. 12(1) | High |
| 7 | Configure long-term retention (minimum 6 months for logs; up to 10 years for high-risk AI documentation) | EU AI Act Art. 11, 12; DORA | Medium |
| 8 | Enable MFA and RBAC for all backup system access | GDPR Art. 5(1)(f), DORA | Medium |
| 9 | Integrate backup audit logs with SIEM for continuous monitoring | NIS2, DORA | Medium |
| 10 | Document dataset versions used for each model version (data lineage) | EU AI Act Art. 10 | Before August 2026 |
Items 2 and 3 have no future deadline — DORA has been in force since January 2025. Organizations within DORA scope that have not implemented resilient, well-documented backup and recovery controls may face significant compliance exposure and operational resilience gaps.
Frequently Asked Questions
Does the EU AI Act apply to AI systems that are already in production?
Yes. The EU AI Act applies to high-risk AI systems placed on the market or put into service before 2 August 2026. There is a transitional period, but organizations cannot assume that pre-existing systems are exempt — they must bring those systems into compliance by the applicable deadline. For Annex III high-risk systems, the operative deadline is 2 August 2026 (with a possible extension to December 2027 proposed under the Digital Omnibus package, but not yet confirmed). For AI systems embedded in regulated products (medical devices, machinery), the deadline is August 2027.
Does DORA apply to OpenStack private cloud infrastructure operated by a financial entity?
Yes. DORA’s ICT risk management requirements apply to the financial entity’s own ICT infrastructure, not only to third-party providers. An OpenStack private cloud operated by a bank for running AI workloads is in scope for DORA’s backup, resilience testing, and ICT risk register requirements. The financial entity is the accountable party for compliance with DORA for its own infrastructure.
What does DORA’s requirement for a “third, immutable backup copy” mean in practice?
DORA requires a backup copy that is physically and logically segregated from both the primary data and the secondary backup — meaning it cannot be accessible through the same network path or administrative interface as the primary data or secondary backup. This is commonly implemented using air-gapped or logically isolated backup architectures designed to prevent simultaneous compromise of all copies. An off-site copy that is reachable via the same network infrastructure is not sufficient — the segregation must be designed to prevent ransomware or a single administrative compromise from destroying all copies simultaneously.
What is the EU AI Act’s requirement for training data retention?
Article 11 requires technical documentation to be retained for ten years after the AI system is placed on the market. For high-risk AI systems, this documentation includes information about the training data methodology and data governance practices. While the regulation does not explicitly state that the training datasets themselves must be retained for ten years, many organizations interpret Article 10’s governance expectations as requiring long-term retention of sufficient evidence regarding training data provenance, versioning, and governance processes. Conservative compliance planning includes long-term retention of training dataset metadata and version records.
Does GDPR’s right to erasure require retraining AI models?
This remains one of the open questions in EU data protection law. The European Data Protection Board has indicated that where an AI model’s outputs can be meaningfully attributed to specific training data, and where those outputs produce decisions about data subjects, erasure of the training data may require model remediation. The practical answer for most organizations: document your training datasets sufficiently to identify whether specific individuals’ data was used, maintain dataset versioning that would allow retraining with removed records, and retain model versions with associated dataset version documentation. This does not necessarily mean you will need to retrain models — it means you will be able to demonstrate that you have assessed the question.
Compliance Is Not a Future Project
DORA is already in force. GDPR has always applied. The EU AI Act enforcement clock is running. The organizations that will navigate the August 2026 deadline without incident are the ones that started treating backup as a compliance artifact in 2025, not the ones that schedule a compliance sprint in July 2026.
Storware Backup and Recovery provides the technical controls mapped in this article — WORM immutability, IsoLayer air-gap protection, AES encryption, RBAC, Keycloak MFA, full audit logging, Recovery Plans with schedulable testing, and long-term configurable retention — as a unified platform for OpenStack AI infrastructure. The compliance architecture and the operational backup architecture are the same system.
If you need to understand what your current gap looks like against the requirements above, a 30-minute technical consultation will produce a specific assessment of your OpenStack environment against the regulatory checklist.
Book a compliance architecture consultation →
→ For the strategic context on why private OpenStack infrastructure is the right foundation for EU-compliant AI, see: Why Enterprises Are Running AI on OpenStack Private Cloud.
→ For the complete technical reference on OpenStack AI backup architecture, see the pillar page: Backup and Data Protection for AI Workloads on OpenStack: The Complete Guide.
