Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate k8s Postgresql Upgrades after 3.0.0 release #1748

Closed
artntek opened this issue Dec 1, 2023 · 5 comments
Closed

Automate k8s Postgresql Upgrades after 3.0.0 release #1748

artntek opened this issue Dec 1, 2023 · 5 comments
Assignees
Milestone

Comments

@artntek
Copy link
Contributor

artntek commented Dec 1, 2023

(discussed with @mbjones in ESS-DIVE Meeting)

We will release the 3.0.0 metacat helm chart with postgres v14 as a subchart.

The Metacat helm chart appears to work ok with postgres 14, 15, 16, when starting up with a new database.

However, upgrading between these major versions currently requires manual intervention.

It is up to individual operators to ensure the mounted data directory has been manually upgraded to be compatible with postgresql v14 before installing the helm chart.

After the v3.0.0 release, however, we will find a way to automate postgres upgrades (but only from the previous version to the new version), so that if operators stay up to date with metacat releases, their upgrades should be seamless

Some examples of container-specific solutions:

Official postgres documentation; see in particular:

@artntek
Copy link
Contributor Author

artntek commented Mar 6, 2025

Jing's manual upgrade notes for upgrade from pg10 to pg14

https://hpad.dataone.org/hh344k6yTuCEdhMH2IllYA

  1. Write down the current version of the PostgreSQL active cluster: pg_lsclusters
  2. Upgrade postgres( original 10 - port 5432; 14 - port 5435):
  • screen (if needed)

  • pg_lsclusters (make sure it has the cluster 14. Otherwise, please install it)

  • (make sure it is NOT the original metacat cluster):

    sudo pg_dropcluster 12 main --stop
    sudo pg_dropcluster 9.3 main --stop
  • sudo pg_dropcluster 14 main --stop

  • sudo pg_upgradecluster --method=upgrade -v 14 10 main (In the most cases, 10 is the original metacat cluster, but it may change. ) or at the slow way:
    sudo pg_upgradecluster -v 14 10 main
    now, 14 uses port 5432

  • Edit the file /etc/postgresql/10/main/start.conf to change its value from manual to disabled

  • Edit /etc/postgresql/14/main/pg_hba.conf
    comment out the sha lines in the pg_hba.conf file
    and add:

    host metacat metacat 127.0.0.1 255.255.255.255 password
  • sudo /etc/init.d/postgresql stop

  • sudo /etc/init.d/postgresql start

  • pg_lsclusters to make sure only Postgresql 14 is running

@artntek
Copy link
Contributor Author

artntek commented Mar 14, 2025

Notes

This process works when done manually. Need to adapt to k8s initContainer flow

  1. With old MC chart running (PG14), dump the data

    # Note `postgresqlDataDir: /bitnami/postgresql/14/main`
    
    kubectl exec ${RELEASE_NAME}-postgresql-0 
    
    rm -rf /bitnami/postgresql/14-dump
    
    time pg_dump -U metacat --format=directory --file=/bitnami/postgresql/14-dump \
    --jobs=20 metacat"
  2. helm upgrade to new MC chart

    • has PG 17 subchart
    • postgresqlDataDir: /bitnami/postgresql/17/main).
    • Ensure /bitnami/postgresql/17 does NOT exist before deploying (will be created on startup, with clean metacat DB)
  3. When PG 17 is running: restore the data

    • MC SHOULD WAIT FOR THIS TO BE DONE BEFORE STARTING (initContainer)
    # CHECK: May not actually be necessary provided metacat hasn't 
    # yet initialized the DB?
    dropdb -U postgres metacat
    
    # CHECK: May not actually be necessary provided metacat hasn't 
    # yet initialized the DB?
    createdb -U postgres metacat
    
    # CHECK: May not actually be necessary if dropdb/createdb not needed?
    psql -U postgres -d metacat -c "GRANT ALL ON SCHEMA public TO metacat;"
    
    time pg_restore -U metacat --data-only --dbname=metacat --format=directory \
         --jobs=20 /bitnami/postgresql/14-dump

TODO: Figure out Logic for startup scenarios

  1. /bitnami/postgresql/17 contains MC tables? Leave it alone and exit
  2. /bitnami/postgresql/17 doesn't contain MC tables? The...
  3. Look for /bitnami/postgresql/n-dump, where n < 17
  4. If found, pg_restore. Else exit (pg_dump needed first)

initcontainer Challenges

  • initcontainer needs access to the running postgres 17 database in the container deployed by the bitnami chart
  • if using psql, this means initcontainer needs psql installed (eg use another postrgres:17 image), and
  • if dropdb/createdb and/or GRANT ALL ON SCHEMA public steps are needed, then bitnami postgres will need to be configured to allow the postgres user (in addition to metacat user) to access it remotely (and they'll need to be done using psql instead of using dropdb/createdb)

@mbjones
Copy link
Member

mbjones commented Mar 15, 2025

Nice, glad to hear our discussed process works. I'd note that I think this could also work for the migration from non-k8s to k8s Metacat installs too, if we were to standardize the idea that the presence of a dump file in a particular location is an indication that k8s MC should restore that DB before continuing. Then, the process of migrating to k8s is 1) dump your current DB, and 2) arrange for that dump to be on the right place on the PVC to be mounted in the MC postgres container (plus much of the other housekeeping you listed for the 2.19 to 3.* upgrades).

One thing to note is that the dump file should probably be moved or removed after a successful restore so that it is not re-restored on the next startup (possibly deleting data along the way). Of course, that removel would need to be done carefully so we don't lose an important backup.

@artntek
Copy link
Contributor Author

artntek commented Mar 17, 2025

One thing to note is that the dump file should probably be moved or removed after a successful restore so that it is not re-restored on the next startup (possibly deleting data along the way). Of course, that removel would need to be done carefully so we don't lose an important backup.

The solution currently gets around this without the need for renaming/deleting dump files: it finds and restores a dump file ONLY if:

  • it can connect to the existing metacat database that the current chart is using
  • it can use psql to verify that the metacat tables are NOT present in that DB (i.e. the DB has never been initialized by metacat in the past).

@artntek
Copy link
Contributor Author

artntek commented Mar 18, 2025

Successfully tested scenarios using code in PR #2101:

  • Does $PGDATA (e.g. /bitnami/postgres/17) directory exist and contain PG_VERSION file?
    • no => exit & start metacat without changing DB
    • yes:
      • Do metacat tables exist in DB?
        • yes (or unable to connect) => exit & start metacat without changing DB
        • no:
          • Does NN-pg_dump directory exist?
            • no => exit & start metacat without changing DB
            • yes => Do a pg_restore
              • invalid dump? => exit & start metacat without changing DB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants