Skip to content

lotss hr

Leah edited this page Aug 30, 2024 · 16 revisions

Group members: Leah, Neal

Monday

Goal for the day: get split-directions working with toil.

Progress:

  • Neal has got toil-cwl-runner to work on galahad. Doing this can be quite cluster-specific, and dependent on python setup/environment (toil-cwl-runner is pip-installed). Current kludges needed on galahad:

    Use python3.12 and set it up in an environment; will need urllib3 downgrade by pip3 to 1.26.6; conda install nodejs

    Needs the _ddsel directories manually deleting between runs

    Needs the vlbi-cwl.sif and node_alpine.sif copying back into the main singularity directory from the ../pull directory (after the first time when they are created and the pipeline crashes because they exist)

    Needs everything in real directories for the software directories, soft links do not work

  • Leah has been using Jurjen's alternative workflow split-directions-toil.cwl after resolving a couple issues:

    apptainer appears to have a bug where it will accept APPTAINERENV_PREPEND_PATH but it is not used correctly; the env variable SINGULARITYENV_PREPEND_PATH does work with apptainer and should be used instead.

    --preserve-entire-environment doesn't work -- use --preserve-environment VAR1 VAR2 VAR3 etc for all of the APPTAINERENV variables

    see issue here: https://git.astron.nl/RD/VLBI-cwl/-/issues/27 for minor changes that need to happen until VLBI-cwl is updated

    also needed to include lofar_helpers path in the create_ms_list.py run

  • There is now a python script which can be incorporated into the monitor script which splits the image_catalogue.csv file into chunks of 10 and submits one split-directions-toil job per chunk; this is done as an array job and is being tested with a limit of 2 array jobs at a time.

Tuesday

  • Progressing on getting chunked split-directions to run with toil. Troubleshooting but both Leah and Neal had runs started by the end of the day.

  • Testing of python chunking to create array jobs successful on both cosma and galahad.

  • RuntimeError checking implemented for collect_solutions in automated processing script.

Wednesday

Goal for the day: have some directions for a field ...

  • testing of 2 directions on cosma (to make sure no more toil or path surprises)

  • converting automated processing to use toil all the way through

Thursday

  • continued testing on galahad and cosma. started testing on spider

  • everything in automated processing converted to toil (will need to test)

Wrap up

The goal was to help advance the automated post-processing of LoTSS. This is done using scripts in https://github.com/LOFAR-VLBI/lotss-hba-survey and is mostly a massive book-keeping exercise, for which it's hard to demonstrate progress with pretty images.

At the beginning of the week, we were just starting to test toil for running split directions using chunks of catalogue instead of an entire catalogue, because it is still a limitation that DP3 explode has issues if you give it too many directions at once. During the week, we have accomplished:

  • not working on galahad (Manchester) --> successfully working!

  • working up to concatenate on cosma (Durham) --> successfully working!

image

The 50% success rate we think comes from the cosma file system rather than toil, as some of the chunks do indeed finish successfully and we have some weird i/o errors on the unsuccessful ones. Possibly we need to move to using scratch space.

  • toil[cwl] installed on spider to start processing there (but waiting for TGSS to come back)

  • initial testing of check_verify_delaycalibrator.py which is the manual intervention for letting the automated processing know the delay calibrator is ok.

image

  • conversion of all automated scripts to use toil

  • built in error catching for downloading solutions

  • added functionality to download lotss inputs (for clusters with no internet access from worker nodes)

Next steps

  • test all steps on a new field

  • test lotss-subtract step on P210+37

  • self-calibrate the targets for P210+37

  • check astrometry and flux calibration for P210+37

  • test using scratch space instead of shared file system

Clone this wiki locally