Full cluster backup and restore💣
With velero , you are able to do a full disaster recovery. In this case backing up a cluster and restoring all the api objects and associated volumes.
Before doing a cluster migration with velero please consider the following
- The cluster versions MUST match
- The velero server versions MUST match
- In the case of Bigbang , the flux versions MUST match
- Ideally , the cloud providers should match , however velero is able to do cross cloud provider migration with restic. Note however that support is in beta.
With these caveats in mind , we can proceed with the cluster migration.
At a high level the steps are;
- Backup the source cluster * exclude the following namespaces [ kube-system,flux,velero]
- Copy the secret used by the velero account into a file * This secret is used to enable connection for the new cluster to the bucket used to store backups
- Create a shell cluster with only the velero server and flux installed
- Create a BackupStorageLocation and VolumeSnapshotLocation crd in the destination cluster that points to the same location as the source cluster.
- Confirm that the destination cluster can see the backups created by the source cluster.
- Initiate a restore on the destination cluster.
- Perform validation and ensure that objects restored and are running correctly.
Before we begin💣
We are going to be using two clusters for the migration. ```ubuntu@ip-172-31-32-130:~$ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE dr-dogfood-admin@dr-dogfood dr-dogfood dr-dogfood-admin integration-dogfood-admin@integration-dogfood integration-dogfood integration-dogfood-admin velero
We have a source cluster `integration-dogfood` and a destination cluster `dr-dogfood`.
Both clusters are on the same version
We confirm the image versions of velero in both clusters
We also compare the the versions of flux in both clusters
Both clusters are deployed in AWS using [Konvoy](https://docs.d2iq.com/dkp/konvoy/1.6/install/install-aws/).
Now that we have satisfied the pre reqs we can go ahead with the migration.
## Back up the source cluster
We assume here that you have a source cluster with velero up and running with credentials that enable it connect to a bucket which will contain the backups
* To create a full backup on the source cluster (integration-dogfood), we will run the command
In this command, we are exluding the kube-system,flux , velero and default namespaces. You can exclude any namespaces by adding it to the comma separated list.
To confirm the backup succeeded , run the command
`velero backup get $backupname`
The important things to look out for are "STATUS" and "ERRORS' , this tells you if the backup was successful or not. You should see a status of "Completed" with no errors before proceeding.
### Copy secret from source cluster
Velero uses a secret to containing credentials to access the bucet containing backups. This secret will be needed by the destination cluster cluster
The text file containing the creds will be used to create a secret in the destination cluster (dr-dogfood).
### Install Velero and create a backup storage location
Earlier , we made the assumption that velero had already been installed in the destnation cluster. However , for completeness , you can run this command to install velero in the destination cluster
Where bucket is the storage bucket referenced by the source cluster, region is the region where the bucket is located , and the secret file is the name of the file we created from exporting the secret.
If successfully installed , you should see a secret in the destination cluster called "cloud-credentials" which contains the credentials needed to access the bucket containing backups.
### Create a backup and Storage location that points to the source
As mentioned earlier , we have to create on the destination cluster , 2 crds - [BackupStorageLocation](https://velero.io/docs/v1.6/api-types/backupstoragelocation/) and [VolumeSnapshotLocation](https://velero.io/docs/v1.6/api-types/volumesnapshotlocation/) which point to the same location as the source cluster.
The following manifests need to be created
#### crds
<details><summary>BackupStorageLocation</summary>
<p>
```yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: mybackupstoragelocation
namespace: velero
spec:
backupSyncPeriod: 2m0s
provider: aws
objectStorage:
bucket: bbotest
credential:
name: cloud-credentials
key: cloud
config:
region: us-gov-west-1
profile: "default"
The name
should match the storagelocation
in the source cluster. The provider
should be the same , the bucket
should be the same bucket
referenced by the source cluster. The credential
section should reference the secret created in the destantion cluster and the key
containing the credentials values. The profile
should be the same profile referenced in the secret.
VolumeSnapshotLocation
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: myvolumesnapshotlocation
namespace: velero
spec:
provider: aws
config:
region: us-gov-west-1
profile: "default"
The volume snapshot object references a provider
, region
and profile
which shuld match what is in the source cluster.
Confirm that the destination cluster can see the backup AND snapshot locations configured in the source.
ubuntu@ip-172-31-32-130:~$ velero get backup-location
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
mybackupstoragelocation aws bbotest Available 2021-10-01 17:31:44 +0000 UTC ReadWrite
ubuntu@ip-172-31-32-130:~$ velero get snapshot-location
NAME PROVIDER
myvolumesnapshotlocation aws
ubuntu@ip-172-31-32-130:~$ ktx
Switched to context "dr-dogfood-admin@dr-dogfood".
ubuntu@ip-172-31-32-130:~$ velero get snapshot-location
NAME PROVIDER
default aws
myvolumesnapshotlocation aws
ubuntu@ip-172-31-32-130:~$ velero get backup-location
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
default aws bbotest Available 2021-10-01 17:32:57 +0000 UTC ReadWrite true
mybackupstoragelocation aws bbotest Available 2021-10-01 17:32:57 +0000 UTC ReadWrite
ubuntu@ip-172-31-32-130:~$
From the output we can see that the destination cluster can see the backup location and the snapshot location. One final check is to query for backups from the destination cluster.
ubuntu@ip-172-31-32-130:~$ ktx dr-dogfood-admin@dr-dogfood
Switched to context "dr-dogfood-admin@dr-dogfood".
ubuntu@ip-172-31-32-130:~$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
flux-exclude-namespace Completed 0 0 2021-09-28 15:08:19 +0000 UTC 26d default <none>
flux-system Completed 0 0 2021-09-28 13:35:07 +0000 UTC 26d default <none>
gitbackup Completed 0 0 2021-09-29 23:44:07 +0000 UTC 28d default application=gitlab
mybackup Completed 0 0 2021-09-29 23:22:40 +0000 UTC 28d mybackupstoragelocation <none>
stest-20210926225855 Completed 0 0 2021-09-26 22:58:55 +0000 UTC 25d default <none>
From the output you can see that the destination cluster has access to all the backups taken by the source cluster and as such can perform a restore using any of them.
Initiate a restore💣
The next step is to initate a restore on the destination cluster.
velero restore create flux-test --from-backup flux-exclude-namespace
This will initiate a restore called flux-test
using the flux-exclude-namespace
created from the source cluster.
Validate the installation💣
Run some basic checks and validate the installation
Pod Validation
kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.status.phase = "Ready" or ([ .status.conditions[] | select(.type == "Ready") ] | length ) == 1 ) | .metadata.namespace + "/" + .metadata.name'
anchore/anchore-anchore-engine-analyzer-948bf69c5-j4p9b
anchore/anchore-anchore-engine-analyzer-948bf69c5-stbj5
anchore/anchore-anchore-engine-api-846ff78b8d-ft42h
anchore/anchore-anchore-engine-catalog-85f7f56d84-t5qbg
anchore/anchore-anchore-engine-policy-776fcf87d6-mf2pk
anchore/anchore-anchore-engine-simplequeue-546f96dc9f-wspdp
anchore/anchore-engine-upgrade-56gck
anchore/anchore-postgresql-6cc688ff54-mv9xt
argocd/argocd-argocd-application-controller-55649bc89b-v8srw
argocd/argocd-argocd-dex-server-d5888f96f-2hc5j
argocd/argocd-argocd-redis-bb-master-0
argocd/argocd-argocd-redis-bb-replicas-0
argocd/argocd-argocd-redis-bb-replicas-1
argocd/argocd-argocd-repo-server-668b778d94-2x6g2
argocd/argocd-argocd-server-84db64f469-ksmzb
argocd/redis-clean-upgrade-xnn55
ebscsiprovisioner/ebs-csi-controller-5757494575-62jjb
ebscsiprovisioner/ebs-csi-controller-5757494575-sknld
ebscsiprovisioner/ebs-csi-node-552hx
ebscsiprovisioner/ebs-csi-node-mxnr4
ebscsiprovisioner/ebs-csi-node-pkz7c
ebscsiprovisioner/ebs-csi-node-qnb2g
eck-operator/elastic-operator-0
flux-system/helm-controller-66cd66c8c5-7mxwr
flux-system/kustomize-controller-7b87fdd54f-lprqb
flux-system/notification-controller-585cd4cd84-xpn6z
flux-system/source-controller-5995bc4d45-5k2gh
gatekeeper-system/gatekeeper-audit-544674965b-4bfph
gatekeeper-system/gatekeeper-controller-manager-767b76448f-ckbvv
gatekeeper-system/gatekeeper-controller-manager-767b76448f-hxh9t
gatekeeper-system/gatekeeper-controller-manager-767b76448f-rzm4k
gitlab/gitlab-gitaly-0
gitlab/gitlab-gitlab-exporter-9945f54d7-cqsvp
gitlab/gitlab-gitlab-shell-67cc5789bd-55zml
gitlab/gitlab-gitlab-shell-67cc5789bd-jh2vc
gitlab/gitlab-migrations-2-kb5qn
gitlab/gitlab-minio-57d656bcc6-qphlf
gitlab/gitlab-minio-create-buckets-2-fx6cc
gitlab/gitlab-postgresql-0
gitlab/gitlab-redis-master-0
gitlab/gitlab-registry-548454b7c-4dmwm
gitlab/gitlab-registry-548454b7c-5nsml
gitlab/gitlab-runner-gitlab-runner-db7bbb6d4-5llml
gitlab/gitlab-sidekiq-all-in-1-v1-d85c5c557-vzbss
gitlab/gitlab-task-runner-688f85db85-qzhck
gitlab/gitlab-webservice-default-6598cfd455-7wkh8
gitlab/gitlab-webservice-default-6598cfd455-bxrn8
istio-operator/istio-operator-86b75869d7-74rpj
istio-system/istiod-754799c557-bm9xb
istio-system/public-ingressgateway-5784bc5f9c-p5vpl
jaeger/jaeger-69799db98-vz56s
jaeger/jaeger-jaeger-jaeger-operator-76f99ff6f4-nwdsv
kiali/bb-kiali-kiali-svc-patch-qkrqb
kiali/kiali-f888478b6-mvw72
kiali/kiali-kiali-kiali-operator-85d9cd8df8-nmpqq
konvoy/auto-provisioning-cm-6d477ccd99-4vbqq
konvoy/auto-provisioning-tfcb-694f68b69d-fpld5
konvoy/auto-provisioning-webhook-fc4c69798-nk5fb
kube-system/calico-kube-controllers-5c4bc597f-rdjqp
kube-system/calico-node-4xmv4
kube-system/calico-node-98hzr
kube-system/calico-node-ct6zh
kube-system/calico-node-f8w9n
kube-system/calico-node-sc8z7
kube-system/coredns-74ff55c5b-b85zt
kube-system/coredns-74ff55c5b-stpth
kube-system/etcd-ip-10-0-194-151.us-gov-west-1.compute.internal
kube-system/kube-apiserver-ip-10-0-194-151.us-gov-west-1.compute.internal
kube-system/kube-controller-manager-ip-10-0-194-151.us-gov-west-1.compute.internal
kube-system/kube-proxy-482lq
kube-system/kube-proxy-4ndhq
kube-system/kube-proxy-m6q4c
kube-system/kube-proxy-mtwc5
kube-system/kube-proxy-wncgh
kube-system/kube-scheduler-ip-10-0-194-151.us-gov-west-1.compute.internal
kubeaddons/kubeaddons-controller-manager-558b96466c-h89wn
logging/bb-logging-ek-upgrade-mb6ct
logging/logging-ek-es-data-0
logging/logging-ek-es-master-0
logging/logging-ek-kb-7dd8f7d79-6hcqp
logging/logging-ek-kb-7dd8f7d79-95rww
logging/logging-ek-kb-7dd8f7d79-ht4ql
logging/logging-fluent-bit-8cwh2
logging/logging-fluent-bit-nrfxx
logging/logging-fluent-bit-rhss5
logging/logging-fluent-bit-ts448
logging/opa-collector-565754766c-84jm9
mattermost-operator/mattermost-operator-59d8b4c8d-nx4qt
minio-operator/minio-operator-566597bcff-zbwjs
monitoring/alertmanager-monitoring-monitoring-kube-alertmanager-0
monitoring/monitoring-monitoring-grafana-68458d8f46-9476r
monitoring/monitoring-monitoring-kube-operator-857cbfd4c-jh5sv
monitoring/monitoring-monitoring-kube-state-metrics-75954d876b-jdxh6
monitoring/monitoring-monitoring-prometheus-node-exporter-fp5fl
monitoring/monitoring-monitoring-prometheus-node-exporter-j9s8b
monitoring/monitoring-monitoring-prometheus-node-exporter-q62bt
monitoring/monitoring-monitoring-prometheus-node-exporter-r565d
monitoring/monitoring-monitoring-prometheus-node-exporter-vxrj5
monitoring/prometheus-monitoring-monitoring-kube-prometheus-0
nexus-repository-manager/nexus-repository-manager-7c457c6c99-9fmbr
sonarqube/sonarqube-postgresql-0
sonarqube/sonarqube-sonarqube-785b5f5648-w9kq7
velero/velero-55ff8d446-c72tn
velero/velero-velero-7dd4999c99-djtzh
Persistent Volumes validation
kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
anchore anchore-postgresql Bound pvc-3952ca08-8db7-4b2f-b61a-b4e8de3a97d0 20Gi RWO awsebssciprovisioner 2d7h
argocd redis-data-argocd-argocd-redis-bb-master-0 Bound pvc-913bcd51-4217-47b7-b907-25fc42d66a43 8Gi RWO awsebssciprovisioner 2d7h
argocd redis-data-argocd-argocd-redis-bb-replicas-0 Bound pvc-7c1b0157-c20a-481f-8d42-81c3970c05e1 8Gi RWO awsebssciprovisioner 2d7h
argocd redis-data-argocd-argocd-redis-bb-replicas-1 Bound pvc-7b2ed397-6053-4d68-919d-f533e0b74556 8Gi RWO awsebssciprovisioner 2d7h
gitlab data-gitlab-postgresql-0 Bound pvc-07284c49-be0b-4b17-8b18-0b9b328163e0 8Gi RWO awsebssciprovisioner 2d7h
gitlab gitlab-minio Bound pvc-8bb6e963-cade-4759-a331-5dad02893c2e 10Gi RWO awsebssciprovisioner 2d7h
gitlab redis-data-gitlab-redis-master-0 Bound pvc-7d1e23cc-0293-4352-b6d0-f2c6609de238 8Gi RWO awsebssciprovisioner 2d7h
gitlab repo-data-gitlab-gitaly-0 Bound pvc-7b03e72d-693c-4bf2-8dfa-6cf94cdae97c 50Gi RWO awsebssciprovisioner 2d7h
logging elasticsearch-data-logging-ek-es-data-0 Bound pvc-4bdb7163-016f-453f-832f-e7500313b8fa 5Gi RWO awsebssciprovisioner 2d7h
logging elasticsearch-data-logging-ek-es-master-0 Bound pvc-93e41e33-266d-48fe-80a2-ce526c13e9b8 5Gi RWO awsebssciprovisioner 2d7h
nexus-repository-manager nexus-repository-manager-data Bound pvc-6709e19a-ab15-420e-97a5-e8701cf81b1b 8Gi RWO awsebssciprovisioner 2d7h
sonarqube data-sonarqube-postgresql-0 Bound pvc-b0b80e0a-558a-477b-a125-79b591ffb40a 20Gi RWO awsebssciprovisioner 2d7h