Skip to content

Commit a66805d

Browse files
Merge pull request #95 from UCL-ARC/heatherkellyucl-patch-1
Added last email about Myriad filesystem
2 parents 6fe9b3e + f58f1aa commit a66805d

File tree

1 file changed

+72
-11
lines changed

1 file changed

+72
-11
lines changed

mkdocs-project-dir/docs/Status_page.md

Lines changed: 72 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -779,15 +779,76 @@ This page outlines that status of each of the machines managed by the Research C
779779

780780
**Removal of old filesystem**
781781

782-
`/old_lustre` will be available for three months, until 9am on Monday 7 July. It will then be unmounted and you will not be able to access it any longer.
782+
`/old_lustre` will be available for three months, until 9am on Monday 7 July. It will then be
783+
unmounted and you will not be able to access it any longer.
783784

784785
**Myriad at risk for first week**
785786

786-
Myriad should be considered exposed to potential issues for the first week of running a full workload with the new filesystem, and so there might be interruptions to service if anything goes wrong or needs tuning differently.
787+
Myriad should be considered exposed to potential issues for the first week of running a full
788+
workload with the new filesystem, and so there might be interruptions to service if anything goes
789+
wrong or needs tuning differently.
787790

788791
The new filesystem is GPFS (IBM Storage Scale) and not Lustre, for those who are interested.
789792

790-
Additional FAQs will be added here based on questions we receive.
793+
Additional FAQs will be added here based on questions we receive.
794+
795+
- 2025-04-14 - **Myriad filesystem update and issues with symlinks**
796+
797+
This is a quick rundown of what else happened on Myriad last week and then some tips for problems
798+
people have been having.
799+
800+
After the new filesystem went live, we had a few issues on Wednesday and Thursday where some jobs
801+
were causing nodes to crash which was in turn causing the gpfs client to hang - which you will have
802+
seen as timeouts or very slow access on the login nodes. The hangs also meant that a few people had
803+
their new home directories only half-created, so didn't have a home directory that belonged to them
804+
when they logged in. We changed some configuration on the compute nodes to fix the issue (the jobs
805+
causing the problem were running out of virtual memory). People who had the home directory issue
806+
should have been sorted out on Thursday and Friday - let us know if anyone else still gets an error
807+
about their home directory not existing.
808+
809+
We were running more smoothly by Friday. Issues like these are why we said the rest of that week was
810+
at risk, as there was likely to be something that needed adjusting once real jobs started.
811+
812+
**Symbolic links and Scratch**
813+
814+
You start out with an empty normal directory called Scratch in your home. What I had not considered
815+
is if you rsync the whole of your oldhome back in, then it will rsync the old Scratch symlink
816+
(shortcut) from oldhome and replace the empty Scratch directory with it. This only happens because
817+
that directory is empty.
818+
819+
We have had tickets from some of you about finding that files are read-only that you think you have
820+
copied - it is because they are still really on the old filesystem.
821+
822+
If you do an ls -al in your home you will be able to see if you have ended up with something similar
823+
to this:
824+
825+
```
826+
lrwxrwxrwx 1 cceahke staff 24 Sep 10 2024 Scratch -> /lustre/scratch/scratch/cceahke
827+
```
828+
829+
That shows you that Scratch is a symlink and is pointing to a location on the old filesystem.
830+
831+
To fix, delete the symlink and recreate Scratch as a directory:
832+
833+
```
834+
rm Scratch
835+
mkdir Scratch
836+
```
837+
838+
You can then go ahead and rsync the contents of oldscratch into Scratch so they are copied onto the
839+
new filesystem correctly. You cannot accidentally delete the contents of oldscratch since it is
840+
read-only.
841+
842+
If you have not rsynced your home yet, you could add the `--safe-links` option to rsync, which tells
843+
it to ignore any symbolic links that point outside the copied tree and any symlinks that are
844+
absolute paths. So when copying home, the symlink to `/lustre/scratch/scratch` should then be
845+
ignored:
846+
847+
```
848+
rsync --safe-links -r -a ~/oldhome ~
849+
```
850+
851+
We are catching up on the quota and shared space requests we have received.
791852
792853
793854
### Kathleen
@@ -1021,10 +1082,10 @@ This page outlines that status of each of the machines managed by the Research C
10211082
10221083
To use:
10231084
1024-
```
1025-
module load beta-modules
1026-
module load test-stack/2025-02
1027-
```
1085+
```
1086+
module load beta-modules
1087+
module load test-stack/2025-02
1088+
```
10281089
10291090
After that, when you type `module avail` there will be several sections of additional modules at
10301091
the top of the output.
@@ -1033,11 +1094,11 @@ module load test-stack/2025-02
10331094
we expect people to use directly visible and lots of their dependencies are hidden. These will
10341095
show up if you search for that package specifically, for example:
10351096
1036-
```
1037-
module avail libpng
1097+
```
1098+
module avail libpng
10381099
-------------------------- /shared/ucl/apps/spack/0.23/deploy/2025-02/modules/applications/linux-rhel7-cascadelake --------------------------
1039-
libpng/1.6.39/gcc-12.3.0-iopfrab
1040-
```
1100+
libpng/1.6.39/gcc-12.3.0-iopfrab
1101+
```
10411102
10421103
This module does not show up in the full list but is still installed. It has a hash at the end
10431104
of its name `-iopfrab` and this will change over time with different builds.

0 commit comments

Comments
 (0)