Client-side encrypted backups of Talos etcd to AWS S3

Client-side automated, encrypted backup of Talos Linux etcd to AWS S3 cloud storage

Talos Linux is a Linux distribution designed solely to run Kubernetes and has many advantages (security, run on-prem or cloud, easy to deploy). It's worth checking out if you use Kubernetes, especially if you're interested in on-prem or hybrid deployments.

Kubernetes on Talos uses etcd to store its state and, unlike cloud provider-managed Linux, you'll have to back this up yourself. Care must be taken to secure the backups, since etcd typically contains many secrets like passwords and API keys.

Here, I'll show how to schedule automated, client-side encrypted backups to an AWS S3 bucket (or any S3-compatible storage, like Backblaze B2). The approach uses the command-line tool talosctl and a dedicated Talos API account with limited privileges to create a snapshot file. The snapshot file is then encrypted client-side and backup up using restic.

Step 1: Clone the example repository

Use git to clone the example manifests that we'll adapt and use.

git clone https://github.com/alubbock/talos-etcd-backup
cd talos-etcd-backup

Step 2: Enable Talos API access from Kubernetes

On your local machine, set environment variables for any control plane node (replace with the node IP):

export TALOS_NODE=<control-plane-node-ip>
export TALOS_ENDPOINT=<control-plane-node-ip>

We can apply the talos-api-patch.yaml to enable Talos API access with the os:etcd:backup role from Kubernetes' kube-system namespace:

talosctl -n $TALOS_NODE -e $TALOS_ENDPOINT patch machineconfig --patch-file talos-api-patch.yaml

Step 3: Create an AWS S3 bucket and AWS API keys

Next, you'll need to create an S3 bucket and a user with keys to access it. Best practice is to use a dedicated account with access only to the one bucket - the restic documentation shows how to do this.

When complete, you should have:

  • A bucket name and AWS region name, which you can use to construct a URL, e.g. https://s3.eu-west-2.amazonaws.com/example-bucket
  • AWS access key ID which has access to the bucket
  • AWS secret access key for the above key ID

Step 4: Create a Secret with the S3 access key and restic encryption key

Open the talos-etcd-s3-secret.yaml file and set the bucket URL (prefixed with s3:), AWS access key ID and AWS secret access key. Note that the values should be base64 encoded, which you can do with:

echo -n "some value" | base64

You also need to set a base64-encoded encryption key for restic to use. This can be any long random string - you can generate one like this on Linux:

tr -dc 'A-Za-z0-9!"#$%&'\''()*+,-./:;<=>?@[\]^_`{|}~' </dev/urandom | head -c 32 | base64

Make sure to keep a copy of your restic encryption key somewhere safe, as you won't be able to access the Secret if etcd gets corrupted!

With the values in the secret set, apply the manifest:

kubectl apply -f talos-etcd-s3-secret.yaml

Step 5: Edit and install the backup script

Open the cronjob-backup-etcd.yaml file and set the values marked with "change me". Adjust the schedule to your needs using cron syntax - the default is to backup daily early in the morning (Talos uses UTC timezone) and set the CP_NODE_IP environment variable to the IP address of a control plane node.

When ready, apply the manifest using kubectl:

kubectl apply -f cronjob-backup-etcd.yaml

Step 6: Initialise the restic repository

Restic requires that you initialise the repository before first use. You can do this by generating a job from the cronjob, patching it to run restic init rather than restic backup, and applying it to run the job.

You can do this with a one liner, like this:

kubectl -n kube-system create job --from=cronjob/etcd-backup etcd-backup-init --dry-run=client -o yaml | kubectl patch --dry-run=client -f - --type=json -p '[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["init"]}]' -o yaml | kubectl apply -f -

You can check that the talosctl etcd snapshot worked like so:

kubectl logs -l job-name=etcd-backup-init -c talosctl

Example output:

etcd snapshot saved to "/data/etcd.snapshot" (27582496 bytes)
snapshot info: hash d8e1732f, revision 603822, total keys 2054, total size 27582464

And you can check that restic init worked like this:

kubectl logs -l job-name=etcd-backup-init -c restic

Example output:

created restic repository ab2fdd510b at s3:<your-url>

Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.

Step 7: Test the backup

You can manually create a job from the cronjob to test the backup like this:

kubectl create job --from=cronjob/etcd-backup etcd-backup-test

And you can check the job log to make sure the backup worked (you might need to wait a minute for it to complete):

kubectl logs -l job-name=etcd-backup-test -c restic

Example output:

no parent snapshot found, will read all files

Files:           1 new,     0 changed,     0 unmodified
Dirs:            1 new,     0 changed,     0 unmodified
Added to the repository: 26.308 MiB (2.303 MiB stored)

processed 1 files, 26.305 MiB in 0:04
snapshot 31fdc2cf saved

Step 8: Set a backup retention policy

Restic will store all of your backups using its own snapshot mechanism in the bucket, so you probably want to have a cronjob to set an appropriate retention policy to automatically delete older backups.

Open the cronjob-backup-etcd-prune.yaml and edit the values marked "change me" to your needs. The restic docs describe the arguments to the forget command and how they can be used to create a backup policy. Don't forget to include --prune at the end to actually delete expired snapshots.

Apply the prune cronjob like so:

kubectl apply -f cronjob-backup-etcd-prune.yaml

Make a note of the restore process

If you need to restore etcd, it's probably easiest to use restic on your local machine to download the etcd snapshot. You'll need to set the following environment variables:

  • RESTIC_REPOSITORY, e.g. s3:https://s3.eu-west-2.amazonaws.com/example-bucket
  • RESTIC_PASSWORD, your encryption key
  • AWS_ACCESS_KEY_ID, AWS key ID for your bucket
  • AWS_SECRET_ACCESS_KEY, AWS secret access key for your bucket

You can then restore to a local directory:

restic restore latest --target /tmp/etcd-restore

You can choose a version other than latest if needed - see the restic docs on restoring from backup.

You can then use your etcd snapshot to recover your cluster - see the Talos docs on disaster recovery for details.

Limitations and gotchas

A couple of things to be aware of:

  • Make sure to offset the times of the backup and prune cronjobs - restic has a locking mechanism, so only one job will be able to access the backup repository at a time.

  • Restic's incremental backup system won't work if the keys are comprimised and an attacker deletes all the files, so it's best to set a lifecycle policy on the bucket to retain files for e.g. 30 days after deletion.

Conclusion

I've shown how to backup Talos Linux's Kubernetes etcd state store to AWS S3 using a tool called restic. This approach should work with any S3-compatible storage and it encrypts backups client-side.

I also have a guide on encrypted backups of up your PVCs to S3 or Backblaze B2 as well.

I hope you've found the guide useful. Thanks for reading!

Related Posts