Skip to content

feat: disaster recovery docs #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions .cspell.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,8 @@
{
"version": "0.1",
"allowCompoundWords": true,
"enabledLanguageIds": [
"json",
"jsonc",
"markdown",
"yaml",
"yml"
],
"ignoreRegExpList": [
"/'s\\b/"
],
"enabledLanguageIds": ["json", "jsonc", "markdown", "yaml", "yml"],
"ignoreRegExpList": ["/'s\\b/"],
"ignoreWords": [
"AGE-SECRET-KEY-1KTYK6RVLN5TAPE7VF6FQQSKZ9HWWCDSKUGXXNUQDWZ7XXT5YK5LSF3UTKQ",
"FPpLvZyAdAmuzc3N",
Expand Down Expand Up @@ -112,7 +104,10 @@
"favourite",
"WPUE",
"wsbtpg",
"uxqf"
"uxqf",
"xvjf",
"initdb",
"creds"
],
"language": "en",
"words": [
Expand Down Expand Up @@ -179,6 +174,7 @@
"prio",
"rabbitmq",
"rbac",
"rclone",
"redkubes",
"rego",
"repos",
Expand Down
78 changes: 78 additions & 0 deletions docs/for-ops/disaster-recovery/gitea.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
slug: gitea
title: Gitea repositories and database
sidebar_label: Gitea
---
## Introduction

Gitea stores the platform configuration (in the `values` repository), the workload catalog (in the `charts` repository), and user-created repositories.

The recovery procedure described here uses the application-level backup of Gitea, i.e. using the `gitea dump` command line. This includes a current SQL dump of the database as well as all repositories and data. However, [Gitea documentation](https://docs.gitea.com/administration/backup-and-restore) recommends different methods for restoring the database, due to potential compatibility issues.

A restore using this backup is advised if for some reason only Gitea has been affected by a severe operational event leading to data corruption or loss. It is also possible to restore the complete the database or single repositories. Be aware that after a partial restore there may be mismatches between the repository information and the database.

## Retrieving backups

When uploading and storing backups in the configured object storage bucket, there is also a local retention of the backups on a local volume for one day. After the local retention has expired, archives can be retrieved from the remote storage.

Note that `rclone` is installed on the first time upload of a Gitea backup. If not present, it can be obtained from the releases page at https://github.com/rclone/rclone/releases/. Following variables such as `$BUCKET_NAME` or storage authentication are pre-configured in the container, so they do not need to be changed.

```sh
##
## In the local terminal
##
kubectl exec -it -n gitea gitea-0 -- /bin/bash

##
## The following to be run in the remote container
##

## If needed, obtain and install Rclone
mkdir -p /backup/.bin
cd /backup/.bin
curl -fsSL -o rclone.zip https://github.com/rclone/rclone/releases/download/v1.69.0/rclone-v1.69.0-linux-amd64.zip
unzip -oj rclone.zip
cd /backup

## Optional, not required if backup is available locally
.bin/rclone lsf gitea:/$BUCKET_NAME # List files
.bin/rclone copy gitea:/$BUCKET_NAME/<backup-name>.tar.bz2 /backup/ # Retrieve file from remote

## Extract the backup
mkdir restore
tar xvjf <backup-name>.tar.bz2 -C restore
cd restore
```

## Restoring a single repository

Repositories are stored in the mounted container path `/data/git/gitea-repositories`, with the owning user or organization as a subdirectory. To restore a single repository, find the backup in the backup's `data/repos/<owner>` directory and copy it over to `/data/git/gitea-repositories/<owner>`.

Note it is not recommended to restore the `otomi/values` repository with this method after restoring a full cluster.

```sh
## ... commands above to obtain and extract the backup
cp -R repos/otomi/charts.git /data/git/gitea-repositories/otomi/
```

## Other assets

Gitea file assets such as avatar images are to be found in the `data` directory of the backup. Similarly, they can be copied to the `/data/` subdirectory as needed, e.g.

```sh
## ... commands above to obtain and extract the backup
cp -R data/avatars /data/
```

## Restoring the database

For restoring the database of Gitea, please refer to the [platform database instructions](platform-databases.md).

## Cleaning up

Remove any extracted files from the local backup directory to free up space. They are not removed automatically. Only compressed backups with the `.tar.bz2` are cleaned up after one day.

```sh
cd /backup
rm -R restore
```
35 changes: 35 additions & 0 deletions docs/for-ops/disaster-recovery/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
slug: overview
title: Disaster Recovery Overview
sidebar_label: Overview
---

## Prerequisites

This area covers some potential scenarios, when a complete or partial restore of the platform is required.

This guide has the following prerequisites and limitations that should be checked regularly:

1. The following items should be backed up regularly by the platform administrator:
- The Kubernetes secret ending in "-wildcard-cert" in namespace "istio-system" (if installed via the Linode cloud console, or using your own certificate).
- The Kubernetes secret "otomi-sops-secrets" in namespace "otomi-pipelines".
- A download of the complete values in Platform -> Maintenance. Depending on whether these are downloaded with or without secrets, some passwords might have to be reset after recovery.
- Optionally manual backups of databases, as covered in this guide for the CloudNative PostgreSQL Operator, should be taken.

2. Object storage needs to be set up for all backup types referred to. Credentials should be added to Platform Settings -> Object Storage.

3. All backup types should be activated in the Platform Settings -> Backup.

4. This guide does not cover the partial or complete loss of attached object storage. For production environments, it is advised to set up additional object storage in a different region, where all contents of the platform object storage is mirrored to, and can be retrieved in the event of accidental deletes, data center availability issues etc. The transfer to and from these remote storage locations is not covered in this guide.

5. Workloads may store data in local storage, object storage, different types of databases, message queues etc. The backup and recovery strategy of Workload storage is not covered here.

6. Currently it is not supported to reinstall a cluster in-place that has been provisioned directly using the Linode API or Console. Such an LKE cluster must instead be reprovisioned with the application platform through a Helm install. However, since the cluster ID changes, the domainsuffix will also change. Adjustments need to be made to the values file before the restore. Also, you will need a domain name set up with a DNS provider supported by App Platform and the credentials should be added to the values file.

7. All instructions assume you are familiar with essential Kubernetes tools such as `kubectl` and have access to the Kubernetes API. Usage of TUI applications such as `k9s` from the administration terminal is strongly recommended.

## Guides

* [Gitea](gitea.md): Restoring the platform's Gitea database and repositories from the application backup
* [Databases](platform-databases.md): Backup and restore of the CNPG databases
* [Reinstall](platform-reinstall.md): Restoring the complete platform, including settings and data
Loading