Synthetic Backup Provider

Quite often, we receive numerous inquiries and questions about synthetic backup providers and the advantages they offer. In response to this curiosity, we’ve put together a comprehensive overview to shed light on this topic. In this blog, we will delve into the world of synthetic backup providers, exploring their benefits and highlighting why they have become a game-changer in the realm of data protection. Whether you’re an IT professional, a business owner, or simply someone interested in understanding the latest advancements in technology, this article will provide you with valuable insights and knowledge. So, let’s embark on this journey together and gain a brief yet illuminating overview of synthetic backup providers and their remarkable advantages.

The problem

When Storware Backup & Recovery stores backups in a non-synthetic backup provider, you must be aware that full and incremental backups are kept separate. These can be separate files (when we talk about plain file systems), objects or object versions (when we use object storage), or other types of separate “files” of “file versions” (when we talk about legacy backup providers).

It means that when you have a full backup of a VM or storage instance, you’ll end up with just a few files in each backup – metadata, one or two files per disk, and that’s usually it. When an incremental backup is taken, increments are generally one or two files per disk, kept separately.

This approach has a significant advantage regarding portability – technically, you can use this approach with virtually any kind of storage – you need to map the backup artifacts to the “object” that the external system understands.

The problem, however, is during restore – as now you have to merge the chains of such files (full + set of incremental backups) before the VM can be restored.

The solutions

Synthetic backup providers can be implemented in several different ways. Everything is about handling incremental backups, so let’s start with a popular one – using snapshots of the backup storage.

The idea is that our backup storage creates a snapshot after each backup. So assuming we have a full backup and a previous snapshot, we expose this volume, write changes on top of the previous state and create a snapshot. The process is repeated after each backup. When we need to restore a VM, we can expose the selected snapshot, and all the data is in the merged form. It is worth noting that snapshots will only occupy the amount of space needed to keep changes, not the whole data set each time.

The technology behind snapshots differs – some vendors use file system-based ZFS snapshots, some may implement it from scratch, and others can use LVM or similar. Still, you’ll generally notice concepts of a volume and its snapshots.

In general, many vendors offer this way of keeping data. However, handling snapshots costs time. Usually, it would be best to have separate volumes for your instances (VMs, storage instances, etc.), which may require even thousands of volumes with their snapshot history to be maintained.

The second approach, which is not based on snapshot concepts (and is used by Storware Backup & Recovery), is based on the reflinks of the virtual copy feature. Now, instead of sharing data per volume (which usually correspond to instance), we want to share just data common to each file.

The idea of reflinks is just a copy-on-write approach, where we can request a quick copy of the full VM disk, and instantly we receive additional entries in our folder. At this point, no data copy occurs, but we have a separate copy of the file visible. The file system handles the rest when we write new data on top of this copy. Now we have two file versions, both in the merged state (so ready to be restored), but we only store each block once (blocks common to these two files are kept once only).

Newer kernels for XFS, NFS 4.2+, Btrfs, CIFS, OCFS2, overlays, and probably more in the future offer this feature. Note: currently, encryption is not supported on synthetic backup providers using Storware Backup & Recovery (but this capability can be provided by the underlying storage)

Additional benefits

Once we have merged data – on top of faster restores – two additional features derive from synthetic backup providers.

First one – the forever incremental approach. In a standard approach, we would have to worry about backup chains that would grow significantly over time and would be hard to be restored. Usually, that could be mitigated with periodic full backups. However, with a synthetic backup provider, each backup is independent – always in the state ready to be restored, so we don’t have to create periodic full backups. A small note here: we still recommend having such from time to time.

The second one is instant restore. While it is not the only requirement for this feature, without the option to have fast recovery of the data, you can’t expect “instant” recovery. So when the data is merged, we can expose backups to the Virtualization Platform over NFS and expose it as a “datastore” or “storage repository” and create a new VM pointing to the exposed (synthesized) VM disks.

Wrap up

Storware Backup and Recovery allows using a plain file system both (local and shared) to implement a synthetic backup provider. This allows backup software to offer a forever-incremental backup approach and Instant Restore capabilities. Using this type of provider, you have a significantly faster recovery.

Marcin Kubacki

text written by:

Marcin Kubacki, CSA at Storware