Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: disaster recovery docs #99

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
18 changes: 7 additions & 11 deletions .cspell.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,8 @@
{
"version": "0.1",
"allowCompoundWords": true,
"enabledLanguageIds": [
"json",
"jsonc",
"markdown",
"yaml",
"yml"
],
"ignoreRegExpList": [
"/'s\\b/"
],
"enabledLanguageIds": ["json", "jsonc", "markdown", "yaml", "yml"],
"ignoreRegExpList": ["/'s\\b/"],
"ignoreWords": [
"AGE-SECRET-KEY-1KTYK6RVLN5TAPE7VF6FQQSKZ9HWWCDSKUGXXNUQDWZ7XXT5YK5LSF3UTKQ",
"FPpLvZyAdAmuzc3N",
Expand Down Expand Up @@ -112,7 +104,10 @@
"favourite",
"WPUE",
"wsbtpg",
"uxqf"
"uxqf",
"xvjf",
"initdb",
"creds"
],
"language": "en",
"words": [
Expand Down Expand Up @@ -179,6 +174,7 @@
"prio",
"rabbitmq",
"rbac",
"rclone",
"redkubes",
"rego",
"repos",
Expand Down
78 changes: 78 additions & 0 deletions docs/for-ops/disaster-recovery/gitea.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
slug: gitea
title: Gitea repositories and database
sidebar_label: Gitea
---
## Introduction

Gitea stores the platform configuration (value repository), the workload catalog, and user-created repositories.

The recovery described here uses the application-level backup of Gitea, i.e. using the `gitea dump` command line. That backup type includes a current SQL dump of the database as well as all repositories' data. However, [Gitea documentation](https://docs.gitea.com/administration/backup-and-restore) recommends different methods for restoring the database, due to potential compatibility issues.

A restore using this backup is advised, if for some reason only Gitea has been affected by a severe operational event leading to data corruption or loss. It is also possible to restore only either the database or single repositories. After such a partial restore there may be mismatches between the repository information and the database however.

## Retrieving backups

While uploading and keeping backups on the configured object storage, there is also a local retention of these backups on a local volume for one day. After the local retention has expired, archives can be retrieved from the remote storage.

Note that `rclone` is installed on the first time upload of a Gitea backup. If not present, it can be obtained from the releases page at https://github.com/rclone/rclone/releases/. Following variables such as `$BUCKET_NAME` or storage authentication are pre-configured in the container, so they do not need to be changed.

```sh
##
## In the local terminal
##
kubectl exec -it -n gitea gitea-0 -- /bin/bash

##
## The following to be run in the remote container
##

## If needed, obtain and install Rclone
mkdir -p /backup/.bin
cd /backup/.bin
curl -fsSL -o rclone.zip https://github.com/rclone/rclone/releases/download/v1.69.0/rclone-v1.69.0-linux-amd64.zip
unzip -oj rclone.zip
cd /backup

## Optional, not required if backup is available locally
.bin/rclone lsf gitea:/$BUCKET_NAME # List files
.bin/rclone copy gitea:/$BUCKET_NAME/<backup-name>.tar.bz2 /backup/ # Retrieve file from remote

## Extract the backup
mkdir restore
tar xvjf <backup-name>.tar.bz2 -C restore
cd restore
```

## Restoring a single repository

Repositories are stored in the mounted container path `/data/git/gitea-repositories`, with the owning user or organization as a subdirectory. To restore a single repository, find the backup in the backup's `data/repos/<owner>` directory and copy it over to `/data/git/gitea-repositories/<owner>`.

Note it is not recommended to restore the `otomi/values` repository with this method after restoring a full cluster.

```sh
## ... commands above to obtain and extract the backup
cp -R repos/otomi/charts.git /data/git/gitea-repositories/otomi/
```

## Other assets

Gitea file assets such as avatar images are to be found in the `data` directory of the backup. Similarly, they can be copied to the `/data/` subdirectory as needed, e.g.

```sh
## ... commands above to obtain and extract the backup
cp -R data/avatars /data/
```

## Restoring the database

For restoring the database of Gitea, please refer to the [platform database instructions](platform-databases.md).

## Cleaning up

Remove any extracted files from the local backup directory to free up space. They are not removed automatically. Only compressed backups with the `.tar.bz2` are cleaned up after one day.

```sh
cd /backup
rm -R restore
```
29 changes: 29 additions & 0 deletions docs/for-ops/disaster-recovery/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
slug: overview
title: Disaster Recovery Overview
sidebar_label: Overview
---

## Prerequisites

This area covers some potential scenarios, when a complete or partial restore of the platform is required.

Note that application of this guide has the following prerequisites and limitations, that should be checked regularly:

* The following items should be backed up regularly by the platform administrator:
- The Kubernetes secret ending in "-wildcard-cert" in namespace "istio-system" (if installed via the Linode cloud console, or using your own certificate).
- The Kubernetes secret "otomi-sops-secrets" in namespace "otomi-pipelines".
- A download of the complete values in Platform -> Maintenance. Depending on whether these are downloaded with or without secrets, some passwords might have to be reset after recovery.
- Optionally manual backups of databases, as covered in this guide for the CloudNative PostgreSQL Operator, should be taken.
* Object storage needs to be set up for all backup types referred to. Credentials should be added to Platform Settings -> Object Storage.
* All backup types referred to should be activated in the Platform Settings -> Backup.
* This guide does not cover the partial or complete loss of attached object storage. For production environments, it is advised to set up additional object storage in a different region, where all contents of the platform object storage is mirrored to, and can be retrieved in the event of accidental deletes, data center availability issues etc. The transfer to and from these remote storage locations is not covered in this guide.
* Your workloads may store data in local storage, object storage, different types of databases, message queues etc. Due to the very individual nature of these storages, the backup and recovery strategy of these cannot be covered here.
* Currently it is not supported to restore a cluster in-place that has been provisioned directly using the Linode API or Console. Such an LKE cluster can be reprovisioned with the application platform through a Helm install. However, since the cluster ID changes, and the domain changes with that, adjustments need to be made to the values file before the restore. Also, you will need a domain name supported by externalDNS and add the credentials to the values file.
* All instructions assume you are generally familiar with essential Kubernetes tools such as `kubectl` and have access to the Kubernetes API. Usage of TUI applications such as `k9s` from the administration terminal is strongly recommended.

## Guides

* [Gitea](gitea.md): Restoring the platform's Gitea database and repositories from the application backup
* [Databases](platform-databases.md): Backup and restore of the CNPG databases
* [Reinstall](platform-reinstall.md): Restoring the complete platform, including settings and data
Loading