Talos Linux is a Linux distribution designed solely to run Kubernetes and has many advantages (security, run on-prem or cloud, easy to deploy). It's worth checking out if you use Kubernetes, especially if you're interested in on-prem or hybrid deployments.
Kubernetes on Talos uses etcd to store its state and, unlike cloud provider-managed Linux, you'll have to back this up yourself. Care must be taken to secure the backups, since etcd typically contains many secrets like passwords and API keys.
Here, I'll show how to schedule automated, client-side encrypted backups to an AWS S3 bucket (or any S3-compatible storage, like Backblaze B2). The approach uses the command-line tool talosctl and a dedicated Talos API account with limited privileges to create a snapshot file. The snapshot file is then encrypted client-side and backup up using restic.
Step 1: Clone the example repository
Use git to clone the example manifests that we'll adapt and use.
git clone https://github.com/alubbock/talos-etcd-backup
cd talos-etcd-backup
Step 2: Enable Talos API access from Kubernetes
On your local machine, set environment variables for any control plane node (replace with the node IP):
export TALOS_NODE=<control-plane-node-ip>
export TALOS_ENDPOINT=<control-plane-node-ip>
We can apply the talos-api-patch.yaml
to enable Talos API access with the os:etcd:backup
role from Kubernetes' kube-system
namespace:
talosctl -n $TALOS_NODE -e $TALOS_ENDPOINT patch machineconfig --patch-file talos-api-patch.yaml
Step 3: Create an AWS S3 bucket and AWS API keys
Next, you'll need to create an S3 bucket and a user with keys to access it. Best practice is to use a dedicated account with access only to the one bucket - the restic documentation shows how to do this.
When complete, you should have:
- A bucket name and AWS region name, which you can use to construct a URL, e.g. https://s3.eu-west-2.amazonaws.com/example-bucket
- AWS access key ID which has access to the bucket
- AWS secret access key for the above key ID
Step 4: Create a Secret with the S3 access key and restic encryption key
Open the talos-etcd-s3-secret.yaml
file and set the bucket URL (prefixed with s3:
), AWS access key ID and AWS secret access key. Note that the values should be base64 encoded, which you can do with:
echo -n "some value" | base64
You also need to set a base64-encoded encryption key for restic to use. This can be any long random string - you can generate one like this on Linux:
tr -dc 'A-Za-z0-9!"#$%&'\''()*+,-./:;<=>?@[\]^_`{|}~' </dev/urandom | head -c 32 | base64
Make sure to keep a copy of your restic encryption key somewhere safe, as you won't be able to access the Secret if etcd gets corrupted!
With the values in the secret set, apply the manifest:
kubectl apply -f talos-etcd-s3-secret.yaml
Step 5: Edit and install the backup script
Open the cronjob-backup-etcd.yaml
file and set the values marked with
"change me". Adjust the schedule to your needs
using cron syntax - the default is to backup daily early in the morning (Talos uses UTC timezone) and set the
CP_NODE_IP
environment variable to the IP address of a control plane node.
When ready, apply the manifest using kubectl
:
kubectl apply -f cronjob-backup-etcd.yaml
Step 6: Initialise the restic repository
Restic requires that you initialise the repository before first use.
You can do this by generating a job from the cronjob, patching it
to run restic init
rather than restic backup
, and applying it
to run the job.
You can do this with a one liner, like this:
kubectl -n kube-system create job --from=cronjob/etcd-backup etcd-backup-init --dry-run=client -o yaml | kubectl patch --dry-run=client -f - --type=json -p '[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["init"]}]' -o yaml | kubectl apply -f -
You can check that the talosctl
etcd snapshot worked like so:
kubectl logs -l job-name=etcd-backup-init -c talosctl
Example output:
etcd snapshot saved to "/data/etcd.snapshot" (27582496 bytes)
snapshot info: hash d8e1732f, revision 603822, total keys 2054, total size 27582464
And you can check that restic init
worked like this:
kubectl logs -l job-name=etcd-backup-init -c restic
Example output:
created restic repository ab2fdd510b at s3:<your-url>
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
Step 7: Test the backup
You can manually create a job from the cronjob to test the backup like this:
kubectl create job --from=cronjob/etcd-backup etcd-backup-test
And you can check the job log to make sure the backup worked (you might need to wait a minute for it to complete):
kubectl logs -l job-name=etcd-backup-test -c restic
Example output:
no parent snapshot found, will read all files
Files: 1 new, 0 changed, 0 unmodified
Dirs: 1 new, 0 changed, 0 unmodified
Added to the repository: 26.308 MiB (2.303 MiB stored)
processed 1 files, 26.305 MiB in 0:04
snapshot 31fdc2cf saved
Step 8: Set a backup retention policy
Restic will store all of your backups using its own snapshot mechanism in the bucket, so you probably want to have a cronjob to set an appropriate retention policy to automatically delete older backups.
Open the cronjob-backup-etcd-prune.yaml
and edit the values marked "change me" to your needs. The restic docs describe the arguments to the forget command
and how they can be used to create a backup policy. Don't forget to include --prune
at the end to actually delete expired snapshots.
Apply the prune cronjob like so:
kubectl apply -f cronjob-backup-etcd-prune.yaml
Make a note of the restore process
If you need to restore etcd, it's probably easiest to use restic on your local machine to download the etcd snapshot. You'll need to set the following environment variables:
RESTIC_REPOSITORY
, e.g.s3:https://s3.eu-west-2.amazonaws.com/example-bucket
RESTIC_PASSWORD
, your encryption keyAWS_ACCESS_KEY_ID
, AWS key ID for your bucketAWS_SECRET_ACCESS_KEY
, AWS secret access key for your bucket
You can then restore to a local directory:
restic restore latest --target /tmp/etcd-restore
You can choose a version other than latest
if needed - see the restic docs on restoring from backup.
You can then use your etcd snapshot to recover your cluster - see the Talos docs on disaster recovery for details.
Limitations and gotchas
A couple of things to be aware of:
-
Make sure to offset the times of the backup and prune cronjobs - restic has a locking mechanism, so only one job will be able to access the backup repository at a time.
-
Restic's incremental backup system won't work if the keys are comprimised and an attacker deletes all the files, so it's best to set a lifecycle policy on the bucket to retain files for e.g. 30 days after deletion.
Conclusion
I've shown how to backup Talos Linux's Kubernetes etcd state store to AWS S3 using a tool called restic. This approach should work with any S3-compatible storage and it encrypts backups client-side.
I also have a guide on encrypted backups of up your PVCs to S3 or Backblaze B2 as well.
I hope you've found the guide useful. Thanks for reading!