Skip to content

Server not properly shut down after initializing empty data directory on first start #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjakob opened this issue Feb 27, 2022 · 4 comments

Comments

@jjakob
Copy link

jjakob commented Feb 27, 2022

When first starting the container, after the entrypoint script completes an empty datadir initialization, the server process prints that it was not properly shut down and does an automatic recovery.

I see the docker-entrypoint.sh script runs pg_ctl -D "$PGDATA" -m fast -w stop, which should safely shut it down

pg_ctl -D "$PGDATA" -m fast -w stop

but after initialization is complete, the first time the container starts, it still does a recovery:

postgres_1  | 2022-02-27 10:38:59.778 UTC [22] LOG:  database system was interrupted; last known up at 2022-02-27 10:38:58 UTC
postgres_1  | 2022-02-27 10:38:59.907 UTC [22] LOG:  database system was not properly shut down; automatic recovery in progress
postgres_1  | 2022-02-27 10:38:59.913 UTC [22] LOG:  redo starts at 0/16FC688
postgres_1  | 2022-02-27 10:38:59.913 UTC [22] LOG:  invalid record length at 0/16FC790: wanted 24, got 0
postgres_1  | 2022-02-27 10:38:59.913 UTC [22] LOG:  redo done at 0/16FC738 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
postgres_1  | 2022-02-27 10:38:59.951 UTC [1] LOG:  database system is ready to accept connections
@tianon
Copy link
Member

tianon commented Mar 1, 2022

Can you provide a little bit more detail that would allow us to reproduce? This sounds to me like maybe you had the initialization fail and a restart policy on the container brought it back up?

(See #159 for a script adjustment that might help make that specific case more obvious if that's indeed what you're running into.)

@jjakob
Copy link
Author

jjakob commented Mar 31, 2022

I reproduced this now with the 14-alpine image ID 114818c12d10 (latest). It doesn't matter how the initdb.d scripts exit, even if they don't encounter any errors the recovery still takes place.

I don't think there's anything from my configuration that could affect that. Other than the secrets and create-non-superuser-db shell script, which I don't think is causing it either, since it doesn't exit with an error.

docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: postgres:14-alpine
    restart: unless-stopped
    stop_grace_period: 1m
    environment:
      - POSTGRES_NON_ROOT_USER: "myuser"
      - POSTGRES_NON_ROOT_DB: "mydb"
      - POSTGRES_PASSWORD_FILE: "/run/secrets/POSTGRES_PASSWORD"
      - POSTGRES_NON_ROOT_PASSWORD_FILE: "/run/secrets/POSTGRES_NON_ROOT_PASSWORD"
    volumes:
      - ./database_exports:/docker-entrypoint-initdb.d
      - postgres_data:/var/lib/postgresql/data
    secrets:
      - POSTGRES_PASSWORD
      - POSTGRES_NON_ROOT_PASSWORD

volumes:
  postgres_data:

secrets:
  POSTGRES_PASSWORD:
    file: "./secrets/live/POSTGRES_PASSWORD"
  POSTGRES_NON_ROOT_PASSWORD:
    file: "./secrets/live/POSTGRES_NON_ROOT_PASSWORD"

./database_exports/000-create-non-superuser-db.sh

#!/bin/bash -e

file_env 'POSTGRES_NON_ROOT_PASSWORD'

if [ "$POSTGRES_NON_ROOT_USER" ] && [ "$POSTGRES_NON_ROOT_PASSWORD" ]; then
        docker_process_sql -v non_root_user="$POSTGRES_NON_ROOT_USER" -v non_root_password="$POSTGRES_NON_ROOT_PASSWORD" <<-'EOSQL'
                CREATE ROLE :"non_root_user" WITH LOGIN PASSWORD :'non_root_password';
        EOSQL
fi

if [ "$POSTGRES_NON_ROOT_DB" ] && [ "$POSTGRES_NON_ROOT_USER" ]; then
        docker_process_sql -v non_root_db="$POSTGRES_NON_ROOT_DB" -v non_root_user="$POSTGRES_NON_ROOT_USER" <<-'EOSQL'
                CREATE DATABASE :"non_root_db" WITH OWNER :"non_root_user" TEMPLATE template0;
        EOSQL
fi

exit 0

Full startup log:

postgres_1  | The files belonging to this database system will be owned by user "postgres".
postgres_1  | This user must also own the server process.
postgres_1  |
postgres_1  | The database cluster will be initialized with locale "en_US.utf8".
postgres_1  | The default database encoding has accordingly been set to "UTF8".
postgres_1  | The default text search configuration will be set to "english".
postgres_1  |
postgres_1  | Data page checksums are disabled.
postgres_1  |
postgres_1  | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1  | creating subdirectories ... ok
postgres_1  | selecting dynamic shared memory implementation ... posix
postgres_1  | selecting default max_connections ... 100
postgres_1  | selecting default shared_buffers ... 128MB
postgres_1  | selecting default time zone ... UTC
postgres_1  | creating configuration files ... ok
postgres_1  | running bootstrap script ... ok
postgres_1  | sh: locale: not found
postgres_1  | 2022-03-31 19:45:35.847 UTC [23] WARNING:  no usable system locales were found
postgres_1  | performing post-bootstrap initialization ... ok
postgres_1  | initdb: warning: enabling "trust" authentication for local connections
postgres_1  | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1  | --auth-local and --auth-host, the next time you run initdb.
postgres_1  | syncing data to disk ... ok
postgres_1  |
postgres_1  |
postgres_1  | Success. You can now start the database server using:
postgres_1  |
postgres_1  |     pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1  |
postgres_1  | waiting for server to start....2022-03-31 19:45:37.187 UTC [29] LOG:  starting PostgreSQL 14.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
postgres_1  | 2022-03-31 19:45:37.189 UTC [29] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2022-03-31 19:45:37.194 UTC [30] LOG:  database system was shut down at 2022-03-31 19:45:36 UTC
postgres_1  | 2022-03-31 19:45:37.199 UTC [29] LOG:  database system is ready to accept connections
postgres_1  |  done
postgres_1  | server started
postgres_1  |
postgres_1  | /usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/000-create-non-superuser-db.sh
postgres_1  | CREATE ROLE
postgres_1  | CREATE DATABASE
postgres_1  |
postgres_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres_1  |
postgres_1  | 2022-03-31 19:45:38.163 UTC [1] LOG:  starting PostgreSQL 14.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
postgres_1  | 2022-03-31 19:45:38.163 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres_1  | 2022-03-31 19:45:38.163 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres_1  | 2022-03-31 19:45:38.166 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1  | 2022-03-31 19:45:38.171 UTC [19] LOG:  database system was interrupted; last known up at 2022-03-31 19:45:37 UTC
postgres_1  | 2022-03-31 19:45:38.261 UTC [19] LOG:  database system was not properly shut down; automatic recovery in progress
postgres_1  | 2022-03-31 19:45:38.264 UTC [19] LOG:  redo starts at 0/16FC688
postgres_1  | 2022-03-31 19:45:38.264 UTC [19] LOG:  invalid record length at 0/16FC790: wanted 24, got 0
postgres_1  | 2022-03-31 19:45:38.264 UTC [19] LOG:  redo done at 0/16FC738 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
postgres_1  | 2022-03-31 19:45:38.278 UTC [1] LOG:  database system is ready to accept connections

@yosifkit
Copy link
Member

yosifkit commented Apr 2, 2022

Since your log has these lines, it is definitely crashing/exiting right after or within your script.

postgres_1  | /usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/000-create-non-superuser-db.sh
postgres_1  | CREATE ROLE
postgres_1  | CREATE DATABASE
postgres_1  |
postgres_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization

Ah, remove the exit 0 from your script. Since it is being sourced, your exit is exiting the whole entrypoint script and thus the whole container without stopping the backgrouded postgres (so it is killed when the container is destroyed) and then compose happily starts a new container which is where the "skipping initialization" is coming from.

@jjakob
Copy link
Author

jjakob commented Apr 2, 2022

Yes, that is obviously it. I removed the exit 0 and the error is gone. Thanks.

@jjakob jjakob closed this as completed Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants