Skip to content

Conversation

@LuiggiTenorioK
Copy link
Member

@LuiggiTenorioK LuiggiTenorioK commented Mar 6, 2025

Closes #1285
Supersedes #1914

PR to enable Postgres support in Autosubmit.

The database was not normalized. We moved the tables as close to what they were as possible, to minimize the risk of bugs. For Sqlite everything behaves the same as before. For Postgres, we adopted the approach of one Postgres DB Schema per experiment.

Note, too, that this change does not move the Pickle file. That file (the job list) will remain on disk for now, but in a future change will be moved to the DB but in a simplified manner (i.e. we will not dump the whole object as a blob, it will be probably a structured graph for the job list).

Tasks

  • Enable CI job to test using a Postgres instance
  • Migrate db_structure
  • Check if there are any columns or tables missing (e.g. workflow_commit)
    • add workflow_commit in db_structure
    • database/db_common.py
      • get_experiment_id
      • _save_experiment (maybe not needed if only called from sqlite functions... it uses sqlite exception check, checking)
      • _check_experiment_exists (ditto)
      • _update_experiment_descrip_version (ditto)
      • _get_autosubmit_version (ditto)
      • _last_name_used (ditto)
      • _delete_experiment (ditto)
      • _update_database (ditto)
      • _get_experiment_id (ditto)
    • database/db_manager.py (select_all_where had been deleted, probably clean-up with vulture)
    • database/db_structure.py
    • experiment/detail_updater.py
    • experiment/experiment_common.py
    • experiment/experiment_common.py
    • history/database_managers/database_manager.py
    • history/database_managers/database_models.py
    • history/database_managers/experiment_history_db_manager.py
    • history/database_managers/experiment_status_db_manager.py
    • migrate/migrate.py (it's raising an exception, and it's not supposed to work in PG for now)
    • provenance/rocrate.py (check what will happen with the pkl files, if these are replaced)
    • e.g. tests from the other branch Postgres Support #1914
    • code (e.g. history module) Postgres Support #1914
    • NOTE: do not move the blob pkl from the old branch as that's being reworked by Dani in job_list pickle -> DB/yaml Migration #2211
    • NOTE: If Job metrics retrieval #2031 is merged, we have to add its support to Postgres
    • Need to port this change from @dbeltrankyl in the experiment manager Fix database issues #2345
  • Add flags to configure db backend in autosubmit configure
  • Check test coverage
  • Manual tests (see section below)
  • Documentation
  • Changelog

Manual tests

As this change touches important parts of the system, we are performing a few additional manual tests, despite the unit and integration tests already added.

To test locally, the following Docker container is used. Note that it intentionally uses arguments different from those used in tests and from the default values. That's to validate that we do not have anything hardcoded that just works by accident.

$ docker run --rm \
    -e POSTGRES_PASSWORD=lavanda \
    -e POSTGRES_USER=sound \
    -e POSTGRES_DB=testingsomething \
    -p 5432:5432 \
    postgres 

My ~/.autosubmitrc:

[database]
backend = postgres
connection_url = postgresql://sound:lavanda@localhost:5432/testingsomething
path = /home/bdepaula/autosubmit
filename = autosubmit.db

[local]
path = /home/bdepaula/autosubmit

[globallogs]
path = /home/bdepaula/autosubmit/logs

[structures]
path = /home/bdepaula/autosubmit/metadata/structures

[historicdb]
path = /home/bdepaula/autosubmit/metadata/data

[historiclog]
path = /home/bdepaula/autosubmit/metadata/logs

[autosubmitapi]
url = http://192.168.11.91:8081 # Replace me?

Note

@LuiggiTenorioK updated the configure subcommand, and now one can simply run (thanks BTW!)

$ autosubmit configure --database-backend postgres --database-conn-url "postgresql://>sound:lavanda@localhost:5432/testingsomething"
Writing configuration file...
Configuration file written successfully: 
 /home/kinow/.autosubmitrc
Directories configured successfully: 
 /home/kinow/autosubmit 
 /home/kinow/autosubmit 
 /home/kinow/autosubmit/logs 
 /home/kinow/autosubmit/metadata/structures 
 /home/kinow/autosubmit/metadata/data 
 /home/kinow/autosubmit/metadata/logs

That should give you the same .autosubmitrc as above, but with less typing :)

Every test is being performed with two experiments, so that we can verify database integrity. One test is performed, and then the database values are dumped. Then the next test is performed, and another database dump is done. The dumps are diffed, and the database values are compared with the database client.

These are the test workflows being used: A dummy workflow, the auto-mhm workflow from RO-Crate, a few examples from our documentation, the ClimateDT workflow.

The following operations are performed for every workflow during the tests. Creation (expid), create the graph (create), refresh the experiment project sources (refresh), run the workflow (run), recover jobs (recovery), monitoring (monitor), commands generation (inspect), variable generation (report), stats report (stats), set status of tasks (setstatus), stop workflow (stop), and deletion (delete).

expid is tested twice, for the first workflow and then a second time for its copy (i.e. we are testing both expid and expid -y).

Dummy is a local project, but auto-mhm and ClimateDT are Git projects. ClimateDT uses Git hooks.

The deletion of experiments is important because autosubmit.py has logic to delete job_data.sql, and we need to verify what will happen when using Postgres. One experiment, the auto-mhm, contains RO-Crate configuration, so that will validate the provenance gathering in the workflow too.

Further testing will be performed in EDITO later.

  • Postgres locally
    • configure
    • install
    • Dummy
      • expid + copy
      • create
      • refresh
      • inspect
      • run
      • monitor
      • report
      • stats
      • setstatus
      • recovery
      • stop
      • delete
    • auto-mhm
      • expid + copy
      • create
      • refresh
      • inspect
      • run
      • monitor
      • report
      • stats
      • setstatus
      • recovery
      • stop
      • delete
    • ClimateDT workflow
      • expid + copy
      • create
      • refresh
      • inspect
      • run
      • monitor
      • report
      • stats
      • setstatus
      • recovery
      • stop
      • delete
    • Examples from docs
  • Postgres in EDITO
    • TBD

After @dbeltrankyl 's joblist -> DB change, we need to test:

  • autosubmit create puts the job data into the correct tables
  • autosubmit delete removes it ; the current code is leaving job_pkl polluted, but that table is expected to be deleted when Dani's work is merged. Still, we need to test this.

See @LuiggiTenorioK 's note on how to build the container configured for Postgres: #2406 (comment)

@LuiggiTenorioK LuiggiTenorioK force-pushed the as-postgres branch 3 times, most recently from 85b14a4 to 430758d Compare March 6, 2025 15:17
@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2025

Codecov Report

❌ Patch coverage is 85.40066% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.55%. Comparing base (926b321) to head (9bcf9ec).

Files with missing lines Patch % Lines
autosubmit/database/db_common.py 59.44% 76 Missing and 12 partials ⚠️
autosubmit/autosubmit.py 71.21% 28 Missing and 10 partials ⚠️
...database_managers/experiment_history_db_manager.py 98.26% 3 Missing ⚠️
autosubmit/job/metrics_processor.py 89.65% 3 Missing ⚠️
autosubmit/job/job_list_persistence.py 97.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2187      +/-   ##
==========================================
+ Coverage   63.73%   65.55%   +1.81%     
==========================================
  Files          82       84       +2     
  Lines       19332    19697     +365     
  Branches     3758     3814      +56     
==========================================
+ Hits        12321    12912     +591     
+ Misses       6077     5840     -237     
- Partials      934      945      +11     
Flag Coverage Δ
fast-tests 65.55% <85.40%> (+1.81%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kinow

This comment was marked as outdated.

@LuiggiTenorioK

This comment was marked as outdated.

@kinow

This comment was marked as outdated.

@kinow kinow self-assigned this Mar 24, 2025
@kinow

This comment was marked as outdated.

@LuiggiTenorioK

This comment was marked as outdated.

@kinow

This comment was marked as outdated.

@kinow
Copy link
Member

kinow commented Mar 26, 2025

Fixed one test, then the integration test run command started failing locally, so I started reviewing what was going on and merging the autosubmit_config fixture with the fixture/setup code in that integration code. Work-in-progress here: #2250

@kinow kinow force-pushed the as-postgres branch 2 times, most recently from 35f091b to 5275f2b Compare April 17, 2025 05:52
@kinow

This comment was marked as outdated.

@kinow kinow force-pushed the as-postgres branch 2 times, most recently from 7f031d6 to 66f05e1 Compare April 17, 2025 05:58
@kinow
Copy link
Member

kinow commented Apr 17, 2025

The tests were passing locally when executed individually. I think @LuiggiTenorioK mentioned some time ago that there was a possibility of some race condition between the two PG tests we have at the moment. So I bit the bullet and implemented part of #2250 here, using TestContainers' Postgres context-manager in a fixture, launching a container with Postgres for each test, using a random port, updating the connection string URL, and running in parallel.

The tests in test_db_common are a bit slow, so after fixing the pipeline I'll investigate if we can speed that up in #2250 .

@kinow kinow force-pushed the as-postgres branch 2 times, most recently from fe8c689 to 85a35f5 Compare April 17, 2025 07:55
@kinow

This comment was marked as outdated.

@LuiggiTenorioK
Copy link
Member Author

Bug found when trying workflow with vertical wrapper 🐛 :

Traceback (most recent call last):
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2302, in run_experiment
    job_list, submitter , exp_history, host , as_conf, platforms_to_test, packages_persistence, _ = Autosubmit.prepare_run(expid, notransitive, start_time, start_after, run_only_members)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2175, in prepare_run
    os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR,
FileNotFoundError: [Errno 2] No such file or directory: '/home/ltenorio/autosubmit_pg/a000/pkl/job_packages_a000.db'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ltenorio/projects/autosubmit/autosubmit/scripts/autosubmit.py", line 105, in main
    return_value = Autosubmit.run_command(args)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 747, in run_command
    return Autosubmit.run_experiment(args.expid, args.notransitive,args.start_time,args.start_after, args.run_only_members, args.profile)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2307, in run_experiment
    raise AutosubmitCritical("Error in run initialization", 7014, str(e))  # Changing default to 7014
autosubmit.log.log.AutosubmitCritical:  

Trace: [Errno 2] No such file or directory: '/home/ltenorio/autosubmit_pg/a000/pkl/job_packages_a000.db'
 [CRITICAL] Error in run initialization [eCode=7014]

It is in the prepare_run, and it seems that could be solved with an if. It is here

# Check if the user wants to continue using wrappers and loads the appropriate info.
if as_conf.experiment_data.get("WRAPPERS",None) is not None:
os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR,
expid, "pkl", "job_packages_" + expid + ".db"), 0o644)

@dbeltrankyl
Copy link
Collaborator

dbeltrankyl commented Oct 3, 2025

Thanks @kinow !

#2187 (comment)

Following this comment:

I won't be there in person on Monday, so I would prefer to perform the merge into the master on Tuesday to also check the rebase of my branch (I foresee issues mainly with the test configuration-related changes).

@kinow
Copy link
Member

kinow commented Oct 3, 2025

Bug found when trying workflow with vertical wrapper 🐛 :

Traceback (most recent call last):
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2302, in run_experiment
    job_list, submitter , exp_history, host , as_conf, platforms_to_test, packages_persistence, _ = Autosubmit.prepare_run(expid, notransitive, start_time, start_after, run_only_members)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2175, in prepare_run
    os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR,
FileNotFoundError: [Errno 2] No such file or directory: '/home/ltenorio/autosubmit_pg/a000/pkl/job_packages_a000.db'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ltenorio/projects/autosubmit/autosubmit/scripts/autosubmit.py", line 105, in main
    return_value = Autosubmit.run_command(args)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 747, in run_command
    return Autosubmit.run_experiment(args.expid, args.notransitive,args.start_time,args.start_after, args.run_only_members, args.profile)
  File "/home/ltenorio/projects/autosubmit/autosubmit/autosubmit.py", line 2307, in run_experiment
    raise AutosubmitCritical("Error in run initialization", 7014, str(e))  # Changing default to 7014
autosubmit.log.log.AutosubmitCritical:  

Trace: [Errno 2] No such file or directory: '/home/ltenorio/autosubmit_pg/a000/pkl/job_packages_a000.db'
 [CRITICAL] Error in run initialization [eCode=7014]

It is in the prepare_run, and it seems that could be solved with an if. It is here

# Check if the user wants to continue using wrappers and loads the appropriate info.
if as_conf.experiment_data.get("WRAPPERS",None) is not None:
os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR,
expid, "pkl", "job_packages_" + expid + ".db"), 0o644)

Ah, first 🐛 of this review 🙂 I'll check it on Monday. Thanks @LuiggiTenorioK !

@kinow
Copy link
Member

kinow commented Oct 3, 2025

Thanks @kinow !

#2187 (comment)

Following this comment:

I won't be there in person on Monday, so I would prefer to perform the merge into the master on Tuesday to also check the rebase of my branch (I foresee issues mainly with the test configuration-related changes).

Sounds good to me, @dbeltrankyl !

@LuiggiTenorioK
Copy link
Member Author

Checking for similar cases like the one mentioned in #2187 (comment)

The same will happen in autosubmit monitor and autosubmit setstatus:

#Visualization stuff that should be in a function common to monitor , create, -cw flag, inspect and so on
if not noplot:
from .monitor.monitor import Monitor
if as_conf.get_wrapper_type() != 'none' and check_wrapper:
packages_persistence = JobPackagePersistence(expid)
os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR,
expid, "pkl", "job_packages_" + expid + ".db"), 0o775)
packages_persistence.reset_table(True)

if len(as_conf.experiment_data.get("WRAPPERS", {})) > 0 and check_wrapper:
# Class constructor creates table if it does not exist
packages_persistence = JobPackagePersistence(expid)
# Permissions
os.chmod(os.path.join(BasicConfig.LOCAL_ROOT_DIR, expid, "pkl", "job_packages_" + expid + ".db"), 0o644)
# Database modification
packages_persistence.reset_table(True)

@dbeltrankyl dbeltrankyl mentioned this pull request Oct 6, 2025
25 tasks
@dbeltrankyl
Copy link
Collaborator

Added as autosubmit/4.1.16-dev-postgres-1a2fcf4 into the climate-dt

( VERSION file points to 4.1.15, tho)

@dbeltrankyl
Copy link
Collaborator

dbeltrankyl commented Oct 6, 2025

I see this warning

/appl/AS/4.1.16-dev-postgres-1a2fcf4/lib64/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
backends.update(_get_backends("networkx.backends"))

Aside from it, the sqlite database run seems to work in the climate-dt

@kinow
Copy link
Member

kinow commented Oct 6, 2025

Huh, hopefully that warning isn't anything serious. I've fixed the issues reported by @LuiggiTenorioK . Luiggi was right about the if, I think. We had another chmod in the same file but that was already protected with the if BasicConfig..... == 'sqlite': chmod`. I've replicated it to prevent the call when using Postgres.

@LuiggiTenorioK
Copy link
Member Author

From #2627 (comment), I added the details table to be created when using autosubmit install.

@kinow
Copy link
Member

kinow commented Oct 7, 2025

I will rebase and resolve the conflicts later today. I think they might be coming from the ruff merge we did recently (@VindeeR we'll see if the linter is working!)

@kinow
Copy link
Member

kinow commented Oct 7, 2025

From #2627 (comment), I added the details table to be created when using autosubmit install.

Thanks!!

@kinow kinow force-pushed the as-postgres branch 2 times, most recently from 8166c62 to b1518bc Compare October 7, 2025 14:33
@kinow
Copy link
Member

kinow commented Oct 7, 2025

Linter failed, then I executed this on my machine ruff check $(git diff --name-only --cached -- '*.py'). Fixed each issue, and I think it should pass now (BTW, some test-integration builds are finishing in under 3 minutes :D)

@kinow kinow moved this from In Progress to Review in Autosubmit project Oct 7, 2025
@LuiggiTenorioK
Copy link
Member Author

LuiggiTenorioK commented Oct 7, 2025

Additional issue: the experiment_structure is not created, and it seems to happen when the metadata/structure dir doesn't exist.

@kinow
Copy link
Member

kinow commented Oct 8, 2025

I'm adding the ExperimentStructureTable to the list of tables to be created when the database is created. Same as @LuiggiTenorioK did with the DetailsTable. I'm also removing the call to create the table from save_structure and get_structure in the db_structure.py file, as that won't have to be performed (a CREATE IF NOT EXISTS) anymore.

@kinow kinow force-pushed the as-postgres branch 4 times, most recently from 59232fe to 1f5c059 Compare October 8, 2025 10:31
Add Postgres support to Autosubmit

Implemented with SQLAlchemy, abstract/OOP/protocols, a new configuration key DATABASE_BACKEND.

Tested with Docker and TestContainers. More tests added (unit and integration).
@kinow kinow merged commit 9368e2a into master Oct 8, 2025
21 checks passed
@github-project-automation github-project-automation bot moved this from Review to Done in Autosubmit project Oct 8, 2025
@kinow kinow deleted the as-postgres branch October 8, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

destine DestinE related edito EDITO related enhancement New feature or request new feature Use this label to plan and request new features

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Postgres layer option

6 participants