Backup on object stores v1
EDB Postgres Distributed for Kubernetes supports online/hot backup of PGD clusters through physical backup and WAL archiving on an object store. This means that the database is always up (no downtime required) and that point-in-time recovery (PITR) is available.
Common object stores
Multiple object stores are supported, such as AWS S3, Microsoft Azure Blob Storage, Google Cloud Storage, MinIO Gateway, or any S3-compatible provider. Given that EDB Postgres Distributed for Kubernetes configures the connection with object stores by relying on EDB Postgres for Kubernetes, see the EDB Postgres for Kubernetes common object stores for backups documentation for more information.
Important
The EDB Postgres for Kubernetes documentation's Cloud Provider configuration section is
available at spec.backup.barmanObjectStore
. In EDB Postgres Distributed for Kubernetes examples, the object store section is at a
different path: spec.backup.configuration.barmanObjectStore
.
WAL archive
WAL archiving is the process that sends WAL files to the object storage, and it's essential to execute online/hot backups or PITR. In EDB Postgres Distributed for Kubernetes, each PGD node is set up to archive WAL files in the object store independently.
The WAL archive is defined in the PGD Group spec.backup.configuration.barmanObjectStore
stanza,
and is enabled as soon as a destination path and cloud credentials are set.
You can choose to compress WAL files before they're uploaded and you can encrypt them.
You can also enable parallel WAL archiving:
For more information, see the EDB Postgres for Kubernetes WAL archiving documentation.
Scheduled backups
Scheduled backups are the recommended way to configure your backup strategy in EDB Postgres Distributed for Kubernetes.
When the PGD group spec.backup.configuration.barmanObjectStore
and .spec.backup.schedulers[].schedule
stanza is configured,
the operator selects one of the PGD data nodes as the elected backup node in which it creates a Scheduled Backup
resource.
The .spec.backup.schedulers[].method
field allows you to define the scheduled backup method. Two backup methods are supported:
volumeSnapshot
barmanObjectStore
(the default)
You can define more than one scheduler, but each method can be used by only one scheduler. That is, two schedulers aren't allowed to use the same method.
For object store backups, with the default barmanObjectStore
method, use the stanza
spec.backup.configuration.barmanObjectStore
to define the object store information for both backup and WAL archiving.
For more information, see Backup on object stores in the EDB Postgres for Kubernetes documentation.
To perform volumeSnapshot backups, you can select the volumeSnapshot
method.
Use the stanza
spec.backup.configuration.barmanObjectStore.volumeSnapshot
to define the volumeSnapshot configuration.
For more information, see Backup on volume snapshots in the EDB Postgres for Kubernetes documentation.
This example shows how to use the volumeSnapshot
method for backup. WAL archiving is still done onto the Barman object store.
For a comparison of these two backup methods, see Object stores or volume snapshots in the EDB Postgres for Kubernetes documentation.
The .spec.backup.schedulers[].schedule
field allows you to define a cron schedule, expressed
in the Go cron
package format:
If necessary, you can suspend scheduled backups by setting .spec.backup.schedulers[].suspend
to true
.
This setting prevents new backups from being scheduled.
If you want to execute a backup as soon as the ScheduledBackup
resource is created,
set .spec.backup.schedulers[].immediate
to true
.
.spec.backupOwnerReference
indicates the ownerReference
to use
in the created backup resources. The options are:
- none — Doesn't set an owner reference for created backup objects.
- self — Sets the
ScheduledBackup
object as owner of the backup. - cluster — Sets the cluster as owner of the backup.
Warning
The .spec.backup.cron
field is deprecated. Use
.spec.backup.schedulers
instead.
While you can still use .spec.backup.cron
, you can't use it
at the same time as .spec.backup.schedulers
.
Note
The EDB Postgres for Kubernetes ScheduledBackup
object contains the cluster
option to specify the
cluster to back up. This option currently isn't supported by EDB Postgres Distributed for Kubernetes and is
ignored if specified.
If an elected backup node is deleted, the operator transparently elects a new backup node
and reconciles the ScheduledBackup
resource accordingly.
Retention policies
EDB Postgres Distributed for Kubernetes can manage the automated deletion of backup files from the backup object store using retention policies based on the recovery window. This process also takes care of removing unused WAL files and WALs associated with backups that are scheduled for deletion.
You can define your backups with a retention policy of 30 days:
For more information, see the EDB Postgres for Kubernetes retention policies in the EDB Postgres for Kubernetes documentation.
Important
Currently, the retention policy is applied only for the elected Backup Node
backups and WAL files. Given that each other PGD node also archives its own WALs
independently, it's your responsibility to manage the lifecycle of those WAL files,
for example by leveraging the object storage data retention policy.
Also, if you have an object storage data retention policy set up on every PGD node
directory, make sure it's not overlapping or interfering with the retention policy managed
by the operator.
Compression algorithms
Backups and WAL files are uncompressed by default. However, multiple compression algorithms are supported. For more information, see the EDB Postgres for Kubernetes compression algorithms documentation.
Tagging of backup objects
It's possible to specify tags as key-value pairs for the backup objects, namely base backups, WAL files, and history files. For more information, see the EDB Postgres for Kubernetes documentation about tagging of backup objects.
On-demand backups of a PGD node
A PGD node is represented as single-instance EDB Postgres for Kubernetes Cluster
object.
As such, if you need to, it's possible to request an on-demand backup
of a specific PGD node by creating a EDB Postgres for Kubernetes Backup
resource.
To do that, see EDB Postgres for Kubernetes on-demand backups in the EDB Postgres for Kubernetes documentation.
Hint
You can retrieve the list of EDB Postgres for Kubernetes clusters that make up your PGD group
by running kubectl get cluster -l k8s.pgd.enterprisedb.io/group=my-pgd-group -n my-namespace
.