-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pbckp 690 update readme #38
Open
demonolock
wants to merge
10
commits into
master
Choose a base branch
from
PBCKP-690-update-readme
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7bd6fc3
Base commit
Burus 40a60c8
update ptrack to the uppercase style
Burus 66ede22
update README
Burus 22d6948
update README
Burus 913e0fc
Update README.md
demonolock abc1cfc
update README
Burus e928c82
update README
Burus cd78a08
update readme and Authors
Burus 33be898
Update README.md
Burus f7ecdd0
Update AUTHORS.md
Burus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,21 @@ | ||
[](https://travis-ci.com/postgrespro/ptrack) | ||
[](https://codecov.io/gh/postgrespro/ptrack) | ||
[](https://github.com/postgrespro/ptrack/releases/latest) | ||
|
||
# ptrack | ||
## PTRACK allows speed up incremental backups for the huge PostgreSQL databases. | ||
|
||
## Overview | ||
## Overview | ||
|
||
Ptrack is a block-level incremental backup engine for PostgreSQL. You can [effectively use](https://postgrespro.github.io/pg_probackup/#pbk-setting-up-ptrack-backups) `ptrack` engine for taking incremental backups with [pg_probackup](https://github.com/postgrespro/pg_probackup) backup and recovery manager for PostgreSQL. | ||
PTRACK saves changes of physical blocks in the memory. You can [effectively use](https://postgrespro.github.io/pg_probackup/#pbk-setting-up-ptrack-backups) `PTRACK` engine for taking incremental backups by [pg_probackup](https://github.com/postgrespro/pg_probackup). | ||
|
||
It is designed to allow false positives (i.e. block/page is marked in the `ptrack` map, but actually has not been changed), but to never allow false negatives (i.e. loosing any `PGDATA` changes, excepting hint-bits). | ||
Current patch are available for [11](https://github.com/postgrespro/ptrack/blob/master/patches/REL_11_STABLE-ptrack-core.diff), [12](https://github.com/postgrespro/ptrack/blob/master/patches/REL_12_STABLE-ptrack-core.diff), [13](https://github.com/postgrespro/ptrack/blob/master/patches/REL_13_STABLE-ptrack-core.diff), [14](https://github.com/postgrespro/ptrack/blob/master/patches/REL_14_STABLE-ptrack-core.diff), [15](https://github.com/postgrespro/ptrack/blob/master/patches/REL_15_STABLE-ptrack-core.diff) | ||
|
||
Currently, `ptrack` codebase is split between small PostgreSQL core patch and extension. All public SQL API methods and main engine are placed in the `ptrack` extension, while the core patch contains only certain hooks and modifies binary utilities to ignore `ptrack.map.*` files. | ||
## Enterprise edition | ||
|
||
This extension is compatible with PostgreSQL [11](https://github.com/postgrespro/ptrack/blob/master/patches/REL_11_STABLE-ptrack-core.diff), [12](https://github.com/postgrespro/ptrack/blob/master/patches/REL_12_STABLE-ptrack-core.diff), [13](https://github.com/postgrespro/ptrack/blob/master/patches/REL_13_STABLE-ptrack-core.diff), [14](https://github.com/postgrespro/ptrack/blob/master/patches/REL_14_STABLE-ptrack-core.diff). | ||
Enterprise PTRACK are part of [Postgres Pro Backup Enterprise](https://postgrespro.ru/products/postgrespro/enterprise) and share posibility to track more than 100 000 tables and indexes per time without speed degradation with [CFS (compressed file system)](https://postgrespro.ru/docs/enterprise/15/cfs). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and shares |
||
Benchmarks are x5 time faster and useful for ERP and DWH with huge amounth of tables and relations between them. | ||
|
||
## Installation | ||
|
||
1) Get latest `ptrack` sources: | ||
1) Get latest `PTRACK` sources: | ||
|
||
```shell | ||
git clone https://github.com/postgrespro/ptrack.git | ||
|
@@ -43,23 +42,23 @@ echo "shared_preload_libraries = 'ptrack'" >> postgres_data/postgresql.conf | |
echo "ptrack.map_size = 64" >> postgres_data/postgresql.conf | ||
``` | ||
|
||
6) Compile and install `ptrack` extension | ||
6) Compile and install `PTRACK` extension | ||
|
||
```shell | ||
USE_PGXS=1 make -C /path/to/ptrack/ install | ||
``` | ||
|
||
7) Run PostgreSQL and create `ptrack` extension | ||
7) Run PostgreSQL and create `PTRACK` extension | ||
|
||
```sql | ||
postgres=# CREATE EXTENSION ptrack; | ||
``` | ||
|
||
## Configuration | ||
|
||
The only one configurable option is `ptrack.map_size` (in MB). Default is `0`, which means `ptrack` is turned off. In order to reduce number of false positives it is recommended to set `ptrack.map_size` to `1 / 1000` of expected `PGDATA` size (i.e. `1000` for a 1 TB database). | ||
The only one configurable option is `ptrack.map_size` (in MB). Default is `0`, which means `PTRACK` is turned off. In order to reduce number of false positives it is recommended to set `ptrack.map_size` to `1 / 1000` of expected `PGDATA` size (i.e. `1000` for a 1 TB database). | ||
|
||
To disable `ptrack` and clean up all remaining service files set `ptrack.map_size` to `0`. | ||
To disable `PTRACK` and clean up all remaining service files set `ptrack.map_size` to `0`. | ||
|
||
## Public SQL API | ||
|
||
|
@@ -102,7 +101,7 @@ postgres=# SELECT * FROM ptrack_get_change_stat('0/285C8C8'); | |
|
||
## Upgrading | ||
|
||
Usually, you have to only install new version of `ptrack` and do `ALTER EXTENSION 'ptrack' UPDATE;`. However, some specific actions may be required as well: | ||
Usually, you have to only install new version of `PTRACK` and do `ALTER EXTENSION 'ptrack' UPDATE;`. However, some specific actions may be required as well: | ||
|
||
#### Upgrading from 2.0.0 to 2.1.*: | ||
|
||
|
@@ -113,7 +112,7 @@ Usually, you have to only install new version of `ptrack` and do `ALTER EXTENSIO | |
|
||
#### Upgrading from 2.1.* to 2.2.*: | ||
|
||
Since version 2.2 we use a different algorithm for tracking changed pages. Thus, data recorded in the `ptrack.map` using pre 2.2 versions of `ptrack` is incompatible with newer versions. After extension upgrade and server restart old `ptrack.map` will be discarded with `WARNING` and initialized from the scratch. | ||
Since version 2.2 we use a different algorithm for tracking changed pages. Thus, data recorded in the `ptrack.map` using pre 2.2 versions of `PTRACK` is incompatible with newer versions. After extension upgrade and server restart old `ptrack.map` will be discarded with `WARNING` and initialized from the scratch. | ||
|
||
#### Upgrading from 2.2.* to 2.3.*: | ||
|
||
|
@@ -126,29 +125,33 @@ Since version 2.2 we use a different algorithm for tracking changed pages. Thus, | |
#### Upgrading from 2.3.* to 2.4.*: | ||
|
||
* Stop your server | ||
* Update ptrack binaries | ||
* Update `PTRACK` binaries | ||
* Start server | ||
* Do `ALTER EXTENSION 'ptrack' UPDATE;`. | ||
|
||
## Limitations | ||
|
||
1. You can only use `ptrack` safely with `wal_level >= 'replica'`. Otherwise, you can lose tracking of some changes if crash-recovery occurs, since [certain commands are designed not to write WAL at all if wal_level is minimal](https://www.postgresql.org/docs/12/populate.html#POPULATE-PITR), but we only durably flush `ptrack` map at checkpoint time. | ||
1. You can only use `PTRACK` safely with `wal_level >= 'replica'`. Otherwise, you can lose tracking of some changes if crash-recovery occurs, since [certain commands are designed not to write WAL at all if wal_level is minimal](https://www.postgresql.org/docs/12/populate.html#POPULATE-PITR), but we only durably flush `PTRACK` map at checkpoint time. | ||
|
||
2. The only one production-ready backup utility, that fully supports `ptrack` is [pg_probackup](https://github.com/postgrespro/pg_probackup). | ||
2. The only one production-ready backup utility, that fully supports `PTRACK` is [pg_probackup](https://github.com/postgrespro/pg_probackup). | ||
|
||
3. You cannot resize `ptrack` map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup. | ||
3. You cannot resize `PTRACK` map in runtime, only on postmaster start. Also, you will loose all tracked changes, so it is recommended to do so in the maintainance window and accompany this operation with full backup. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. loose -> lose |
||
|
||
4. You will need up to `ptrack.map_size * 2` of additional disk space, since `ptrack` uses additional temporary file for durability purpose. See [Architecture section](#Architecture) for details. | ||
4. You will need up to `ptrack.map_size * 2` of additional disk space, since `PTRACK` uses additional temporary file for durability purpose. See [Architecture section](#Architecture) for details. | ||
|
||
## Benchmarks | ||
|
||
Briefly, an overhead of using `ptrack` on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incremental `ptrack` backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More details [here](benchmarks). | ||
Briefly, an overhead of using `PTRACK` on TPS usually does not exceed a couple of percent (~1-3%) for a database of dozens to hundreds of gigabytes in size, while the backup time scales down linearly with backup size with a coefficient ~1. It means that an incremental `PTRACK` backup of a database with only 20% of changed pages will be 5 times faster than a full backup. More details [here](benchmarks). | ||
|
||
## Architecture | ||
|
||
We use a single shared hash table in `ptrack`. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enough `ptrack.map_size`. | ||
It is designed to allow false positives (i.e. block/page is marked in the `PTRACK` map, but actually has not been changed), but to never allow false negatives (i.e. loosing any `PGDATA` changes, excepting hint-bits). | ||
|
||
Currently, `PTRACK` codebase is split between small PostgreSQL core patch and extension. All public SQL API methods and main engine are placed in the `PTRACK` extension, while the core patch contains only certain hooks and modifies binary utilities to ignore `ptrack.map.*` files. | ||
|
||
We use a single shared hash table in `PTRACK`. Due to the fixed size of the map there may be false positives (when some block is marked as changed without being actually modified), but not false negative results. However, these false postives may be completely eliminated by setting a high enough `ptrack.map_size`. | ||
|
||
All reads/writes are made using atomic operations on `uint64` entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access, `ptrack` keeps a map (`ptrack.map`) since the last checkpoint intact and uses up to 1 additional temporary file: | ||
All reads/writes are made using atomic operations on `uint64` entries, so the map is completely lockless during the normal PostgreSQL operation. Because we do not use locks for read/write access, `PTRACK` keeps a map (`ptrack.map`) since the last checkpoint intact and uses up to 1 additional temporary file: | ||
|
||
* temporary file `ptrack.map.tmp` to durably replace `ptrack.map` during checkpoint. | ||
|
||
|
@@ -158,7 +161,7 @@ To gather the whole changeset of modified blocks in `ptrack_get_pagemapset()` we | |
|
||
## Contribution | ||
|
||
Feel free to [send pull requests](https://github.com/postgrespro/ptrack/compare), [fill up issues](https://github.com/postgrespro/ptrack/issues/new), or just reach one of us directly (e.g. <[Alexey Kondratov](mailto:[email protected]?subject=[GitHub]%20Ptrack), [@ololobus](https://github.com/ololobus)>) if you are interested in `ptrack`. | ||
Feel free to [send pull requests](https://github.com/postgrespro/ptrack/compare), [fill up issues](https://github.com/postgrespro/ptrack/issues/new), or just reach one of us directly (e.g. <[Alexey Kondratov](mailto:[email protected]?subject=[GitHub]%20Ptrack), [@ololobus](https://github.com/ololobus)>) if you are interested in `PTRACK`. | ||
|
||
### Tests | ||
|
||
|
@@ -176,9 +179,3 @@ docker-compose run tests | |
``` | ||
|
||
Available test modes (`MODE`) are `basic` (default) and `paranoia` (per-block checksum comparison of `PGDATA` content before and after backup-restore process). Available test cases (`TEST_CASE`) are `tap` (minimalistic PostgreSQL [tap test](https://github.com/postgrespro/ptrack/blob/master/t/001_basic.pl)), `all` or any specific [pg_probackup test](https://github.com/postgrespro/pg_probackup/blob/master/tests/ptrack.py), e.g. `test_ptrack_simple`. | ||
|
||
### TODO | ||
|
||
* Should we introduce `ptrack.map_path` to allow `ptrack` service files storage outside of `PGDATA`? Doing that we will avoid patching PostgreSQL binary utilities to ignore `ptrack.map.*` files. | ||
* Can we resize `ptrack` map on restart but keep the previously tracked changes? | ||
* Can we write a formal proof, that we never loose any modified page with `ptrack`? With TLA+? |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for
PostgreSQL