lib: Make random number generation thread safe #6480

marisn · 2025-10-10T09:19:30Z

As random file generation keeps internal state in a static variable, it is prone to failures as multiple threads would attempt to update sate simultaneously. This PR adds a mutex lock around internal state update calls thus preventing clashes when called from multiple threads.

Contains code to be merged as PR #6479
Also contains necessary changes in Make/CMake to compile gis lib with pthread support.

Should fix issues observed in r.mapcalc after thread support was merged #5742

wenzeslaus · 2025-10-10T12:41:39Z

The code changes look straightforward. Do you have an idea about performance? I agree we need the change in any case, but at least knowing whether we need a better solution in the future would be good.

wenzeslaus · 2025-10-10T13:27:35Z

Good news. This makes the failing rand function tests in #6476 pass at least on Linux.

As random number generator tracks its internal state as a static variable, it is necessary to guard its updates with mutex to prevent misbehavour when multiple threads change values simultaneously

marisn · 2025-10-17T15:45:59Z

The code changes look straightforward. Do you have an idea about performance? I agree we need the change in any case, but at least knowing whether we need a better solution in the future would be good.

I haven't tested but locked part is small and should be fast (just update, no IO), thus I would expect impact on speed to be negligible.

wenzeslaus

I think the locking needs to expand to reading of the shared state, it is both write and read which need to be isolated.

More generally, I'm afraid we didn't finalize the r.mapcalc's rand behavior with multiple threads (and seeds). (Or did we?) If r.mapcalc is using these functions directly as before parallelization, the random numbers placed in the raster will depend on the order of threads, i.e., what rows will come first, but I did not review the relevant code now. (An ultimate fix there would be different "context" objects for the random numbers, per row (for maximum reproducibility) or per thread (for at least seed+nprocs reproducibility).

wenzeslaus · 2025-10-17T20:04:17Z

lib/gis/lrand48.c

 *
 * \param[in] seedval 32-bit integer used to seed the PRNG
+ *
+ * This function is thread-safe.


Something like the following should indicate that this applies only for certain builds. I don't know if you can actually have a combination when the thread safety matters while not having pthreds.

Suggested change

* This function is thread-safe.

* This function is thread-safe (if threading is enabled, i.e., `HAVE_PTHREAD` is defined).

or: ...if pthreads is available,...

Disagree. This makes no sense. Of course there is no thread safety when code is compiled without threads! This is a programmer documentation that is meant to signal other developers that it is safe to call this function from threaded code. Attempts to compile core GRASS without pthreads and calling from a module with pthreads should not be considered in documentation of each function.

wenzeslaus · 2025-10-17T20:23:04Z

lib/gis/lrand48.c

 void G_srand48(long seedval)
 {
    uint32 x = (uint32) * (unsigned long *)&seedval;

+#ifdef HAVE_PTHREAD
+    pthread_mutex_lock(&lrand48_mutex);
+#endif
+
    x2 = (uint16)HI(x);
    x1 = (uint16)LO(x);
    x0 = (uint16)0x330E;
    seeded = 1;
+
+#ifdef HAVE_PTHREAD
+    pthread_mutex_unlock(&lrand48_mutex);
+#endif
 }


Calling this from multiple threads will reset the global state. So, while the locking will prevent creation of a broken global state and it will prevent from reading a half-baked state, it will not work as expected. If you call this from multiple threads, they will replace the previous thread random state. Only the main thread should call this function and typically would call it once unless resetting the seed - that should be documented.

You are correct. I added a note to the documentation. Mutex guards should remain in place, but seed initialization always should be called only from main process/single thread. I called multiple times, it would result in "last wins" state.

wenzeslaus · 2025-10-17T20:45:32Z

lib/gis/lrand48.c

 static void G__next(void)
 {
+#ifdef HAVE_PTHREAD
+    pthread_mutex_lock(&lrand48_mutex);
+#endif
+


This approach only locks the advancing, so there will be no broken state. However, consider this scenario:

in Thread 1, mrand48 calls next

in Thread 1, next locks and does the advancing

in Thread 1, next unlocks and returns

in Thread 2, mrand48 calls next

in Thread 2, next locks and does the advancing

in Thread 2, mrand48 reads the static variable values

in Thread 1, mrand48 reads the static variable values which are the same ones as in Thread 2, so the Thread 1 values were never used

Alternatively, Thread 1 can be reading the half-baked state which is being overwritten by Thread 2.

Either the locking needs to happen at the level of mrand48-type functions, i.e., include both reading and writing (or lock separately from writing), or G__next (or its wrapper) needs to return the values rather than rely on the global state.

I prefer the solution which limits the global state and uses explicit return values (or even passes a state around), so my choice would be to return (pass) values from G__next. (In a more substantial rewrite scenario, an explicit "context" object would make the sharing of state explicit.)

You are spot-on about too narrow scope of mutex. I changed code to include also consumption of state inside the mutex.

The new implementation makes RNG serial – independently from number of threads calling those functions, number sequence always will be the same.

If G__next would return a state object, each thread would have its own state and, with an identical seed, result in an identical stream of random numbers – not good at all. If state is passed around between calls (e.g. G_lrand48(struct state)) we are even worse as then locking/sharing of state would have to be implemented by each caller itself and we still are facing thread synchronization problem. Did I miss something?

The new implementation makes RNG serial – independently from number of threads calling those functions, number sequence always will be the same.

Awesome, thanks. That's what I had in mind. Least changes in terms of code structure, although most mutex additions.

If G__next would return a state object, each thread would have its own state and, with an identical seed, result in an identical stream of random numbers – not good at all.

This requires starting the different states with different seeds. This works well for different stochastic replicates of the same simulation. It would require additional decisions on how to behave within one raster.

If state is passed around between calls (e.g. G_lrand48(struct state)) we are even worse as then locking/sharing of state would have to be implemented by each caller itself and we still are facing thread synchronization problem.

The mutex would have to be part of that state and access would have to be only through functions, basically an OOP encapsulation. In other words, every variable from the library would have to go to the state.

Edit: That is if we want the locking for the states, but my original idea was that independent states don't need locking because there is nothing shared between them, and they are simply not suitable to be used across threads. In other words, a not thread-safe API meant to be used with different instances in different threads which would be then a thread-safe usage pattern.

lib/gis/lrand48.c

wenzeslaus · 2025-10-17T20:59:16Z

I haven't tested but locked part is small and should be fast (just update, no IO), thus I would expect impact on speed to be negligible.

No IO, but it is called for every cell in r.mapcalc, no? Still better slow with overall faster code than a segfault, and rand() is already much slower than let's say a constant (I did not test recently). If it is slow, we could use a solution with an explicit state (context) which may allow us to avoid locking, while having a separate locking version of the API.

multiple threads As random number generation is guarded by a mutex, it makes generation a serial operation – always leading to the same sequence of numbers

marisn · 2025-10-20T07:31:47Z

This is the worst case scenario – code calls random number generation and discards produced numbers (thread creation + locking overhead):

Total random calls per run: 100,000,000
Runs per thread count: 5

Threads	Avg. Time (s)	Overhead vs. 1 Thread
1	0.8202	+0.0000s (+0.00%)
2	3.3390	+2.5187s (+307.07%)
4	3.2877	+2.4674s (+300.81%)
6	3.5505	+2.7303s (+332.86%)
8	4.3752	+3.5549s (+433.40%)
10	4.9465	+4.1263s (+503.05%)
12	5.7690	+4.9488s (+603.32%)
16	6.2315	+5.4113s (+659.71%)

marisn · 2025-10-20T07:43:43Z

No IO, but it is called for every cell in r.mapcalc, no? Still better slow with overall faster code than a segfault, and rand() is already much slower than let's say a constant (I did not test recently). If it is slow, we could use a solution with an explicit state (context) which may allow us to avoid locking, while having a separate locking version of the API.

RN generation now is always serial – outcome order is guaranteed. Consumption is another topic. I lean towards just stating that for fully reproducible outcome one should set parameters for single thread execution.
Generally it is out of scope of this PR and should be discussed separately with more people involved.

wenzeslaus · 2025-10-20T13:54:31Z

A random sequence can contain duplicates, so your test is failing, but it is because you don't have to (can't) have that requirement.

AssertionError: 10000 != 9996 : Duplicate values found in multi-threaded run, indicating a race condition.

I think only the comparison of sorted lists gives you a true idea of what happened.

wenzeslaus · 2025-10-20T14:23:05Z

...outcome order is guaranteed.

From library perspective, but never in the actual usage.

Consumption is another topic.

So I would say just don't overstate the outcome order guarantees in the doc.

I lean towards just stating that for fully reproducible outcome one should set parameters for single thread execution.

That would be my choice at this point as well.

Generally it is out of scope of this PR and should be discussed separately with more people involved.

I agree.

wenzeslaus · 2025-10-20T14:12:32Z

lib/gis/testsuite/test_lrand48.py

+"""Test of gis library lrand48 PRNG thread-safety
+
+@author Maris Nartiss
+@author Gemini


No need. Human author is, or at least should be, the owner of the code in ways which matter to us. Approach to copyright differs in between countries and I would not even venture into it.

If there is a important provenance info, it should go to the commit message (and/or PR description).

We will not be inviting Gemini to a code sprint or to join the development team, nor expecting more insights into a code it generated comparing to any other code.

wenzeslaus · 2025-10-20T14:12:59Z

lib/gis/testsuite/test_lrand48.py

+        # --- Define ctypes function signatures ---
+        G_srand48.argtypes = [ctypes.c_long]
+        G_srand48.restype = None
+
+        G_lrand48.argtypes = []
+        G_lrand48.restype = ctypes.c_long
+


Isn't this ctypesgen job? Isn't that already done?

wenzeslaus · 2025-10-20T14:19:50Z

lib/gis/testsuite/test_lrand48.py

+        G_lrand48.argtypes = []
+        G_lrand48.restype = ctypes.c_long
+
+        # --- 1. Single-threaded execution ---


While often seen in generated code, I think our experience is that these ASCII comment graphic is hard to maintain and inherently inconsistent (thinking about old code we cleaned in GRASS).

nilason · 2025-10-20T15:48:20Z

lib/gis/CMakeLists.txt

+find_package(Threads)
+
 set(grass_gis_DEFS "-DGRASS_VERSION_DATE=\"${GRASS_VERSION_DATE}\"")
+if(Threads_FOUND)
+  list(APPEND grass_gis_DEFS "HAVE_PTHREAD")
+endif()
+


All this should be replaced by the addition of check_target(Threads::Threads HAVE_PTHREAD) to this file:

grass/cmake/modules/CheckDependentLibraries.cmake

Line 222 in 76ee406

check_target(PROJ::proj HAVE_PROJ_H)

marisn requested review from echoix and nilason October 10, 2025 09:19

github-actions bot added C Related code is in C libraries CMake labels Oct 10, 2025

wenzeslaus changed the title ~~threads: make random number generation thread safe~~ lib: Make random number generation thread safe Oct 10, 2025

This was referenced Oct 10, 2025

tests: Add tests for error states in r.mapcalc #6476

Merged

build: define more typical HAVE_PTHREAD if pthreads library can be used #6479

Merged

lib: correctly guard random number generation when run threaded

5ccafac

As random number generator tracks its internal state as a static variable, it is necessary to guard its updates with mutex to prevent misbehavour when multiple threads change values simultaneously

marisn force-pushed the rand_thread branch from 7b85a2e to 5ccafac Compare October 17, 2025 15:33

wenzeslaus requested changes Oct 17, 2025

View reviewed changes

lib: ensure reproducible stream of random numbers when called from

2c4592d

multiple threads As random number generation is guarded by a mutex, it makes generation a serial operation – always leading to the same sequence of numbers

github-actions bot added Python Related code is in Python tests Related to Test Suite labels Oct 20, 2025

wenzeslaus reviewed Oct 20, 2025

View reviewed changes

nilason requested changes Oct 20, 2025

View reviewed changes

	* This function is thread-safe.
	* This function is thread-safe (if threading is enabled, i.e., `HAVE_PTHREAD` is defined).

Uh oh!

lib: Make random number generation thread safe #6480

Are you sure you want to change the base?

lib: Make random number generation thread safe #6480

Conversation

marisn commented Oct 10, 2025

Uh oh!

wenzeslaus commented Oct 10, 2025

Uh oh!

wenzeslaus commented Oct 10, 2025

Uh oh!

marisn commented Oct 17, 2025

Uh oh!

wenzeslaus left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenzeslaus Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenzeslaus commented Oct 17, 2025

Uh oh!

marisn commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marisn commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenzeslaus commented Oct 20, 2025

Uh oh!

wenzeslaus commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenzeslaus Oct 20, 2025 •

edited

Loading

marisn commented Oct 20, 2025 •

edited

Loading

marisn commented Oct 20, 2025 •

edited

Loading