Skip to content

Conversation

cyliang368
Copy link
Contributor

@cyliang368 cyliang368 commented Jul 26, 2025

This PR should solve #5776.

If we want to solve writing conflicts to the same place (stderr), it must be serial.

@github-actions github-actions bot added C Related code is in C libraries labels Jul 26, 2025
@petrasovaa
Copy link
Contributor

Thanks! There is probably no better solution, but I think this still needs some benchmark whether this has any impact on the speed.

@marisn
Copy link
Contributor

marisn commented Jul 27, 2025

This should not be merged, as:

@cyliang368
Copy link
Contributor Author

Thanks! There is probably no better solution, but I think this still needs some benchmark whether this has any impact on the speed.

This should not be merged, as:

  • it has a serious speed penalty

It is a very small part, so I believe its impact could be negligible.

I simply ran v.surf.rst and r.univar a few times manually, and I could not tell whether the difference was made by this change or some uncertainty from OS.

I would like to get some benchmarks on r.mapcalc after #5742 is merged.


@marisn Can you elaborate on why you think it does not solve #5776?

@cyliang368
Copy link
Contributor Author

I ran the benchmark of r.mapcalc on main and this branch. This modification does not dramatically slow down the computation. This modification sometimes yields less speedup, but more often yields more speedup. I attached some examples and simple scripts below, and the IPython notebook in the zip file stores all the comparison figures.

benchmark.zip

@petrasovaa could you test this on your end to double-check?

image image image image

@HuidaeCho
Copy link
Member

I agree with @marisn because it can significantly slow down parallel performance. The above benchmark testing is just one of many possible cases and higher speedup is, I believe, just a noise because that's not possible theoretically or at least cannot easily be explained when critical sections are introduced.

@HuidaeCho
Copy link
Member

@cyliang368 Could you test it again without calling G_percent() at all and compare the three?

@wenzeslaus
Copy link
Member

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work. If they don't, better not to show anything, or resort to some heuristic like showing progress from the first thread only.

1 similar comment
@wenzeslaus
Copy link
Member

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work. If they don't, better not to show anything, or resort to some heuristic like showing progress from the first thread only.

@marisn
Copy link
Contributor

marisn commented Aug 28, 2025

You are correct. When multiple threads call G_percent, the output will not be correct as each thread will push % counter to its progress position (thread work position in all work queue) instead of overall progress (aggregate of total work done).
Removing G_percent from multithreading code sounds like radical idea as then we lose all progress reporting as more code is ported to use threads. Better approach would be to create G_percent_mt() that is thread safe and can be called instead.

@wenzeslaus
Copy link
Member

Better, but an if condition checking for the first thread would be easier for 8.5, no?

@cyliang368
Copy link
Contributor Author

cyliang368 commented Aug 28, 2025

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work.

@wenzeslaus you are right. However, I will say using 1 thread can make things worse. Let's say we specify nprocs=50 and the loop is running i=0:100. OS and OpenMP do not guarantee which thread will take which i, and which thread will finish first.

In the first batch (i=0:50), the first thread is assigned to run i=24, but it actually is the 37th thread that finishes the task in the loop. It will still print out 24%, which is not the correct progress (37%).

In the second batch (i=50:100), the first thread is assigned to run i=72. Regardless of which thread finishes the loop in the end, we can only see 72% printed out by the first thread, which is incorrect.

Unless @marisn has a better idea, I think 1 thread makes things worse.


This problem is not from G_percent itself. The problem is caused by how we provide a counter to G_percent. Currently, most of the modules use the counter (e.g. i or j in for statements) in G_percent. Again, OS and OpenMP do not guarantee which thread will take which i, and which thread will finish first. If we want to fix it, we should have a new counter in the loop in every parallelized module. This counter will be updated when a thread finishes the loop block, and we make G_percent take this value, which reflects the correct progress.


If we really care about computational performance and believe locks are a bottleneck, I would say it might be better not to print out everything. The fact is that fprintf itself contains a lock, so a lock was already there before this PR.

However, I still believe fprintf and locking won't significantly affect the runtime, as the computational part occupies most of it. Further profiling may be needed to verify this.

@nilason
Copy link
Contributor

nilason commented Aug 28, 2025

Technically, as being C11, we can make use of _Atomic for the counter, if that could be part of a solution.

@wenzeslaus
Copy link
Member

If you are running it as part of a Python script or in a notebook, you will not see any of that cautiously computed percentage. What about just showing a rotating thingy from the first thread to show the process is still alive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants