Skip to content

Sync current Greengage 6 to ADB#2708

Draft
Stolb27 wants to merge 16 commits into
adb-6.xfrom
sync2adb
Draft

Sync current Greengage 6 to ADB#2708
Stolb27 wants to merge 16 commits into
adb-6.xfrom
sync2adb

Conversation

@Stolb27

@Stolb27 Stolb27 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

VoidZeroNull0 and others added 16 commits June 1, 2026 13:05
What happened?
Local GUCs (SET LOCAL) get lost inside PLPG functions (see test case for
more details), thus writing commands may have unexpected results or fail.

Why it happens?
It happens because SET LOCAL is executed in a separate transaction,
because no DTX (2PC) is set up, so its effect is popped.

How do we fix this?
Let's make all SET LOCAL commands on segments be just SET, so that they
will last until master synchronizes them using previously saved value
(gp_guc_restore_list). Synchronization happens on next master's
transaction or when transaction control statement is met (COMMIT, ROLLBACK).
Specifically,syncing is handled inside AtEOXact_SPI(). Note - on master, SET
LOCAL is still executed as SET LOCAL.

Alternate solutions?
DTX transaction context could be set up on the SET step, so that all subsequent
writing commands get inside this new transaction and thus, get required GUC.

Tests?
Test case to check consistency and correct passage of local GUCs inside DO was
developed.

Changes from original commit?
As 6.x does not support control statements inside DO - removed in between SPI
GUC synchronization code from AtEOXact_SPI(), connected test. GUC syncing
happens on the next master's transaction using mechanism inside PostgresMain().

(cherry picked from commit 762fb85)

Ticket: GG-479

---------

Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
Co-authored-by: Viktor Kurilko <v.kurilko@arenadata.io>
Bump CI behave tests v28 to v35 changes:
- Detect `@skip` tag in Behave CI matrix generation
  - Replace `ls | grep` with glob to handle non-alphanumeric filenames
  - Split matrix into `run_matrix` / `skip_matrix` based on `@skip` tag
    detection on the `Feature:` line
  - Skipped features excluded from the run matrix and listed in the
    Job Summary of `generate-matrix` step
  - `continue-on-error: true` on SQL dump fetch step for `gpexpand` -
    failure surfaces in the test run itself
  - Add `set -e` to Allure report generation step

Task: CI-5599

- Look for ubuntu24.04 SQL dump artifact on 6.x.
  The `gpexpand` behave test was looking for `sqldump_ggdb6_ubuntu`
  but the SQL dump workflow now generates `sqldump_ggdb6_ubuntu24.04`
  after regression tests were fully switched to `ubuntu24`.

- Fix `artifact_name` and `artifact_archive_name` to include `ubuntu24.04`
  suffix for `6.x`, keeping legacy empty suffix for `7.x`.

Task: CI-5678

---------

Co-authored-by: Vladislav Pavlov <v.pavlov@arenadata.io>
This test checks that the coordinator gets exception info even in case
of troubles with python serialization. But the check relies on the
outher bug, caused by PyGreSQL wrapping to sub-directory. Replace
exception with explicitly unpickled one to rework PyGreSQL later.
Greengage has custom installation scripts for PyGreSQL to place module
to the dedicated direcory. There already was an issue with moving shared
library of the module to the proper directory level during migration to
python 3. Also, this approach makes impossible to use common
installation with pip. Further more, there is 10 years old issue with
pickling PyGreSQL exceptions:

```
>>> import pickle
>>> from pygresql import pg
>>> pickle.dumps(pg.DatabaseError())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'pg.DatabaseError'>: import
 of module 'pg' failed
```

This patch throw out additional directory layer for PyGreSQL and
implements a small shim python module to preserve compatibility with
existing Greengage codebase. So that `from pygresql.pg import DB` works
as expected.
At least this tools is used in the greengage_path.sh. Add this one to
the Docker containers and as the deb package dependency.
8f9aa03 switches all regression and isolation2 tests to use plpython3.
But some tests require additional python modules. Moreover, part of them
require PyGreSQL that depends on Greengage libPQ. And currently, this
module is built only for default python interpreter as a part of
Greengage build process. 8f9aa03 tries to pass Greengage PYTHONPATH to
plpython3. As a result, tests work correctly only on python 3 builds
(e.g. Ubuntu 24). Otherwise, plpython3 environment is polluted by
python 2 modules. To run isolation2 tests in the python 2 env we should
provide tests dependencies for python 3 additionally.

The biggest problem is PyGreSQL that depends on Greengage artifacts. To
solve this problem I suggest:
1. build PyGreSQL wheels on build Dockerfile stage if needed;
2. install them on test stage.

Also, source Greengage python environment explicitly to run python 2
cluster utilities in the plpython3 environment. Make it as local as
possible to avoid leaking. os.system has an issue with sourcing
greengage_path.sh when called inside plpython3u function. So I've
replaced this one with appropriate subprocess call.
This commit fixes minor faults at coverage configs and collection process.
Changes in current commit:

1. Update behave tests coverage collection with GPHOME variable, so remappings
   are done to each test.
2. Update regression tests script so it will collect and report coverage
   (otherwise we forced to do it in CI).
3. Add new remappings to configs for gpperfmon test and gp_replica_check.py
   file.
4. Add to exceptions new(moved) pg.py and pgdb.py files.
5. Update $GPHOME hardcode in coveragerc_unit file.
Event triggers are expected to work only at the master side. However gpexpand
copies content of pg_event_trigger table to the newly created segment(s).

This commit adds pg_event_trigger to the list of master only table and adds
behave test to verify it.

GG-526

Co-authored-by: Viktor Kurilko <v.kurilko@arenadata.io>
Previously gpstart utility couldn't start standby on directory specified with
trailing slash, like that: /path/to/standby/.
It was enough to add normalization on path, to fix the issue.
Adds Docker-based build environment for Rocky Linux 8 and 9,
on par with the existing Ubuntu support.

- add `ci/Dockerfile.rockylinux` - multi-stage build
(base/build/code/test)
- upd `README.Rhel-Rocky.bash` - OS-version-aware dependency
installation:
  - Python 2 + 3 for RHEL/Rocky 8, Python 3.11 for Rocky 9
  - Perl packages extended for RHEL/Rocky 9 (Opcode, Test-Simple,
Thread-Queue)
  - zstd built from source (unavailable as a package on RHEL/Rocky)
- `concourse/scripts/common.bash` - fix `os_id()` to correctly detect
  Rocky Linux; previously any RHEL-like system was returned as `centos`,
  making `rocky8`/`rocky9` CONFIGFLAGS unreachable dead code
- `ci/readme.md` - add Rocky Linux 8/9 build image instructions
- `README.linux.md` - add Version 9 to RHEL/Rocky description
- CI matrix - Rocky Linux 8/9 added for `build` workflow
- Bump build workflow to `v42`; changes:
  Add Rocky Linux support to reusable build workflow
  - Expose `TARGET_OS` as a job-level env var for use in shell scripts
  - Replace unconditional Ubuntu mirror resolution with a `case`:
    Ubuntu keeps the existing Azure mirror optimization, Rocky Linux has
    a stub for a future mirror when available
- Use `$TARGET_OS` instead of expression interpolation in `docker build`

Task: CI-5657
Some behave tests had problem with host key verification. They threw
Host key verification failed on attempt to ssh to some host.

It was happening because of lack of host keys in file known_hosts, which
population was happening in init_containers.sh.

ssh-keyscan was responsible for this task. It was successfully opening
sockets on hosts for each of 5 default algorithms for key authentication (see
ssh-keyscan code to understand details, get_keytypes holds these default
algorithms), but still was failing to gather required information out of these
sockets: because out of 5 opened sockets only 3 could provide keys by the
chosen algorithm (KT_RSA, KT_ECDSA, KT_ED25519). And these 3 sockets could be
closed after beginning of ssh-protocol, because server was always having
default sshd utility configuration, which included default settings of
MaxStartups (10:30:100, it means that if number of unauthorized connections
were bigger then 10, it could close the connection with 30% chance).

To fix this issue, patch suggests to change MaxStartups default setting.

Considering that this is infrastructural issue only and doesn't affect
anything in core, tests are not provided.
- updated the behave tests workflow to upload coverage artifacts;
- updated the regression tests workflow to upload coverage artifacts.
- added a coverage job that runs on PR's and depends on succesfull
completion of both behave and regression tests.

Task: CI-5692
Replace inline test-docker bash job with composite action tests/install/deb:

- OS version auto-detection for Greengage apt repository
- per-package `dpkg -s` report written to GitHub Actions job summary
- full installation report uploaded as workflow artifact
- remove unused `test_lima` input

Task: CI-5835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants