Skip to content

Commit 4f01199

Browse files
committed
Add grammar pass
1 parent 16174ae commit 4f01199

File tree

2 files changed

+71
-73
lines changed

2 files changed

+71
-73
lines changed

_data/authors.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,4 +386,4 @@ stevenoonan:
386386
linkedin: https://www.linkedin.com/in/embeddeddev/
387387
image: /img/author/stevenoonan.png
388388
blurb:
389-
is Principal Embedded Systems Engineer at Nomo International, Inc.
389+
is a Principal Embedded Systems Engineer at Nomo International, Inc.

_posts/2025-02-20-why-sleep-for-is-broken-on-esp32.md

Lines changed: 70 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ at version v4.4 of Espressif's
3232
(IDF). After about two days of work, the project compiled and ran in IDF v5.3.
3333
Except, something was not quite right. Things were sluggish. FreeRTOS's
3434
[system state API](https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html#_CPPv420uxTaskGetSystemStatePC12TaskStatus_tK11UBaseType_tPC27configRUN_TIME_COUNTER_TYPE)
35-
indicated that one the two Xtensa LX6 processors on the ESP32 was pinned at 100%
36-
usage. The culprits were two high priority threads, both of which had a
35+
indicated that one of the two Xtensa LX6 processors on the ESP32 was pinned at
36+
100% usage. The culprits were two high-priority threads, both of which had a
3737
relatively small loop with a 10 millisecond sleep between each iteration.
3838

3939
Sleeping for 10 milliseconds should leave plenty of time to do other things.
@@ -42,7 +42,7 @@ Plus, this code worked fine in IDF v4 without an issue. What did IDF v5 do?
4242
## FreeRTOS on the ESP32
4343

4444
Let's go into some background first. Experienced embedded engineers will likely
45-
find parts of this section as review, but we need to be solid in our
45+
find parts of this section as a review, but we need to be solid in our
4646
understanding of the fundamentals before moving on.
4747

4848
### System Tick Period
@@ -104,14 +104,14 @@ period of 10 milliseconds. Exactly how much time the thread will be sleep is
104104
unknown to this code.
105105
106106
So to be clear: sleeping _"1 tick"_ does not mean it will sleep for one tick
107-
_period of time_. In fact, it will basically always sleep some amount less than
108-
one tick period, and sometimes it will basically not sleep at all.
107+
_period of time_. In fact, it will always sleep some amount less than one tick
108+
period, and sometimes it will not sleep at all.
109109
110110
So then, is calling `vTaskDelay(1)` in a small loop a problem? Not
111111
intrinsically. Like most things, it's fine if you know what you are doing. If
112112
you want to ensure a thread (technically called a
113113
[Task](https://www.freertos.org/Documentation/02-Kernel/02-Kernel-features/01-Tasks-and-co-routines/00-Tasks-and-co-routines)
114-
in FreeRTOS) exectues its logic is performed on every tick, this a way to do it.
114+
in FreeRTOS) executes its logic is performed on every tick, this a way to do it.
115115
There are no problems so long as the logic is not sensitive to tick jitter and
116116
its processing only requires a small fraction of the tick period.
117117

@@ -141,9 +141,9 @@ what we want, because passing `0` to `vTaskDelay()` does nothing at all.
141141

142142
Okay, but what if we _really_ need to wait for less than a tick period? We can
143143
[busy wait](https://en.wikipedia.org/wiki/Busy_waiting). The IDF has
144-
`esp_rom_delay_us()` which does just that. But this is an
144+
`esp_rom_delay_us()`, which does just that. But this is an
145145
[Internal API](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/internal-unstable.html)
146-
which we don't have source code for, and it's functionality may change or be
146+
which we don't have source code for, and its functionality may change or be
147147
removed in the future. ESP does give us `usleep()` though, and that allows
148148
precise delays. The caveat here is that the FreeRTOS scheduler is still free to
149149
context switch out of the busy loop due to the important bit I said above about
@@ -152,7 +152,7 @@ the FreeRTOS scheduler (and perhaps all interrupts) for that period.
152152

153153
The other option is to use a free running timer and trigger an interrupt when it
154154
expires. The interrupt can signal FreeRTOS to wake the waiting task. This is
155-
great so long as your context switch overheads allow it, and you have a free
155+
great as long as your context switch overheads allow it and you have a free
156156
hardware timer to do it.
157157

158158
All of this is to say the following:
@@ -165,7 +165,7 @@ All of this is to say the following:
165165

166166
Alright then, so what does all this have to do with our issue related to a 10
167167
millisecond delay? Well, I am not actually using `vTaskDelay(1)` directly.
168-
Instead I am using C++ thread APIs, and therefore the actual code causing all
168+
Instead, I am using C++ thread APIs, and therefore, the actual code causing all
169169
the CPU usage is `std::this_thread::sleep_for(10ms)`.
170170

171171
Espressif provides
@@ -177,7 +177,7 @@ So how does something like an ESP32 MCU support C++ threads anyway?
177177

178178
### `std::thread`, POSIX Threads, and FreeRTOS Tasks, oh my!
179179

180-
Espressif's IDF compiles with GCC, and therefore uses
180+
Espressif's IDF compiles with GCC and therefore uses
181181
[The GNU C++ Library](https://gcc.gnu.org/onlinedocs/libstdc++/) (`libstdc++`).
182182
`libstdc++`'s implementation of `std::thread` is built on top of
183183
[`pthreads`](https://en.wikipedia.org/wiki/Pthreads), but it is up to the
@@ -195,7 +195,7 @@ If that's not enough, we also need to consider the
195195
[C standard library](https://en.wikipedia.org/wiki/C_standard_library). IDF uses
196196
[newlib](https://en.wikipedia.org/wiki/Newlib) and has custom platform
197197
implementations of its sleeping functions. Thankfully, `newlib` is not
198-
precompiled and you can easily find it's source code in the `components/newlib`
198+
precompiled and you can easily find its source code in the `components/newlib`
199199
directory of an IDF install.
200200

201201
My problem appears in IDF v5 because it upgraded GCC from 8.4 to 13.2, and
@@ -236,7 +236,7 @@ If you are following along in your own IDF project, you can find the
236236
`xtensa-esp32-elf/esp-2021r2-patch5-8.4.0/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/8.4.0/thread`
237237
(or similar, depending on the version).
238238
239-
IDF does not define `_GLIBCXX_USE_NANOSLEEP`, so, no `nanosleep()`. That
239+
IDF does not define `_GLIBCXX_USE_NANOSLEEP`, so no `nanosleep()`. That
240240
shouldn't be too surprising for an MCU implementation. Therefore, we go straight
241241
to `__sleep_for(__s, __ns)`.
242242
@@ -304,11 +304,11 @@ Let's clean it up by taking out the parts not compiled for the ESP32:
304304
305305
Pretty simple: get the number of seconds and call `sleep()`, then get the number
306306
of microseconds and call `usleep()`. Both of these are calls into `newlib`. We
307-
will get to those in a moment. First let's look at the IDF v5's `sleep_for()`.
307+
will get to those in a moment. First, let's look at the IDF v5's `sleep_for()`.
308308
309309
### IDF 5.x: `std::this_thread::sleep_for()`
310310
311-
Once again `sleep_for()` is a wrapper for `__sleep_for()`. Here is the newer
311+
Once again, `sleep_for()` is a wrapper for `__sleep_for()`. Here is the newer
312312
code
313313
([see online](https://github.com/gcc-mirror/gcc/blob/1c0ecf06c0c9f3268ba52d4027e9b78aae4acb51/libstdc%2B%2B-v3/src/c%2B%2B11/thread.cc#L235)):
314314
@@ -341,22 +341,21 @@ __sleep_for(chrono::seconds __s, chrono::nanoseconds __ns)
341341
}
342342
```
343343

344-
Now we have a `while` loop. In it `usleep()` is called, but afterwards an
345-
explicit check is made to determine if the minimal amount of time has actually
346-
gone by. The only way out of the loop is if `now` is at or after the `target`
347-
time.
344+
Now we have a `while` loop. In it, `usleep()` is called, but afterward, there is
345+
an explicit check to determine if the minimal amount of time has actually gone
346+
by. The only way out of the loop is if `now` is at or after the `target` time.
348347

349348
Why the change? This is because `nanosleep()` may return early if a signal is
350-
delivered to the thread during its sleep. The change is found on
351-
[commit cfef4c3](https://github.com/gcc-mirror/gcc/commit/cfef4c324ac300c0ad120f0fcee376159de84a0c)
352-
The relevant commit comment is the following:
349+
delivered to the thread during its sleep. The change is in
350+
[commit cfef4c3](https://github.com/gcc-mirror/gcc/commit/cfef4c324ac300c0ad120f0fcee376159de84a0c),
351+
with the relevant commit comment as the following:
353352

354353
```git
355354
Loop while sleep call is interrupted and until steady_clock
356-
shows requested duration has elapsed.
355+
shows requested duration has elapsed.
357356
```
358357

359-
ESP IDF doesn't support signals so the change should have no impact on the
358+
ESP IDF doesn't support signals, so the change should have no impact on the
360359
ESP32. Oh, but it does!
361360

362361
## IDF Newlib
@@ -403,7 +402,7 @@ int clock_gettime(clockid_t clock_id, struct timespec *tp)
403402
}
404403
```
405404
406-
Calling `esp_time_impl_get_time()` gets a value based on a free running hardware
405+
Calling `esp_time_impl_get_time()` gets a value based on a free-running hardware
407406
timer with at least 1 us accuracy. So that is how the monotonic clock is
408407
implemented on the ESP32. Note that this clock is completely independent of the
409408
timer that is controlling the FreeRTOS system tick interrupt. No problems there.
@@ -442,15 +441,15 @@ int usleep(useconds_t us)
442441
```
443442
444443
Remember when I said that `esp_rom_delay_us()` is an internal API and we should
445-
use `usleep()` instead? Well `usleep()` is just a wrapper to
444+
use `usleep()` instead? Well, `usleep()` is just a wrapper to
446445
`esp_rom_delay_us()` for short periods (e.g. less than one tick period). We can
447446
consider `usleep()` the "public" exposure of `esp_rom_delay_us()`, but only when
448447
the specified time is less than a system tick period. As mentioned above, this
449448
is a busy wait, and since it does not disable the scheduler, it still allows
450-
other threads _of equal or higher priority_ to run. So the timing represents a
449+
other threads _of equal or higher priority_ to run. So, the timing represents a
451450
guaranteed _minimum_ only. More importantly, if there are other threads of lower
452-
priority it will _not_ context switch during this busy time. It will just sit in
453-
the thread until the wait is over.
451+
priority, it will _not_ context switch during this busy time. It will just sit
452+
in the thread until the wait is over.
454453
455454
This is all good. A guaranteed minimum is how I expect `usleep()` to work.
456455
@@ -478,25 +477,24 @@ calculations would give us `vTaskDelay(2)`:
478477
4. But since we are using integers, it is just `2`
479478
480479
But what does `2` mean? It means two occurrences of the tick interrupt. Once
481-
again we can ask: when is the next tick interrupt from the moment that
480+
again, we can ask: when is the next tick interrupt from the moment that
482481
`vTaskDelay(2)` is called? The tick interrupt could be in 1 nanosecond all the
483482
way up to 10 milliseconds from now. After that, the second tick will be the
484483
system tick period of 10 milliseconds. So for a tick count of `2`, we will
485484
actually sleep between between 10 and 20 milliseconds.
486485
487-
Despite the fact that the comment says it is rounding up to compensate for the
488-
first tick potentially not blocking at all, the compensation does not account
489-
for the worse case minimal timing. In the example I gave, a 15 millisecond
490-
request will sometimes only sleep for 10 milliseconds. Likewise, a 10
491-
millisecond `usleep()` will sometimes sleep about 0 milliseconds. The greatest
492-
_potential_ differential comes with calling `usleep()` with a multiple of the
493-
tick period. In that case, the actual time slept may be short by an entire tick
494-
period.
495-
496-
Whether this "shorter than specified" time matters depends on requirements of
497-
course. In my experience a lot of application level logic would probably be fine
498-
with a "10 to 20" millisecond wait. Hardware driver logic and audio pipelines on
499-
the other have specific realtime characteristics where this would absolutely be
486+
Even though the comment says it is rounding up to compensate for the first tick
487+
potentially not blocking at all, the compensation does not account for the
488+
worse-case minimal timing. In the example I gave, a 15-millisecond request will
489+
sometimes only sleep for 10 milliseconds. Likewise, a 10 millisecond `usleep()`
490+
will sometimes sleep about 0 milliseconds. The greatest _potential_ differential
491+
comes with calling `usleep()` with a multiple of the tick period. In that case,
492+
the time spent may be short by an entire tick period.
493+
494+
Whether this "shorter than specified" time matters depends on requirements. In
495+
my experience, a lot of application-level logic would probably be fine with a
496+
"10 to 20" millisecond wait. On the other hand, hardware driver logic and audio
497+
pipelines have specific realtime characteristics where this would absolutely be
500498
a problem.
501499
502500
According to `man 3 sleep` and
@@ -512,19 +510,19 @@ sleep more if needed.
512510
> scheduling of other activity by the system.
513511
514512
The fact that ESP32's IDF `usleep()` can be short by up to one system tick
515-
period means doesn't follow the specification.
513+
period means it doesn't follow the specifications.
516514
517515
## Would You Like Precision or Efficiency?
518516
519517
Let's recap.
520518
521519
In IDF v4, calling `std::this_thread::sleep_for()` calls `usleep()` once. When
522-
`std::this_thread::sleep_for(10ms)` is executed, it calls `vTaskDelay(1)` and
520+
`std::this_thread::sleep_for(10ms)` is executed, it calls `vTaskDelay(1),` and
523521
the thread will sleep between 0 and 10 milliseconds. It will usually sleep for
524522
less than the time specified.
525523
526524
In IDF v5, calling `std::this_thread::sleep_for(10ms)` almost always calls
527-
`usleep()` _twice_. The first time will use `vTaskDelay(1)` and it will usually
525+
`usleep()` _twice_. The first time will use `vTaskDelay(1)`, and it will usually
528526
sleep for less than the time specified. Then, back in `libstdc++`
529527
`__sleep_for()`, the monotonic clock will be checked and it will be seen that
530528
some fractional component of 10 milliseconds remains, causing a second call to
@@ -539,48 +537,48 @@ Let's break it down further. For a 10 millisecond delay:
539537
540538
But once again, saying `vTaskDelay(1)` means "wait for 1 tick interrupt." That
541539
will almost always be less than 10 milliseconds from now. When `vTaskDelay(1)`
542-
returns, somewhere between 0.000001 to 10.000000 milliseconds will have actually
543-
transpired. The newer version of `__sleep_for()` in IDF v5 / GCC 13 will double
544-
check that the correct time has actually elapsed according to the monotonic
545-
clock. That check will fail, so it calls `usleep()` again, this time with the
546-
remainder of the 10 milliseconds. The specified time is less than a system tick
547-
period, so the blocking `esp_rom_delay_us()` is now called.
540+
returns, somewhere between 0.000001 and 10.000000 milliseconds will have
541+
actually transpired. The newer version of `__sleep_for()` in IDF v5 / GCC 13
542+
will double-check that the correct time has actually elapsed according to the
543+
monotonic clock. That check will fail, so it calls `usleep()` again, this time
544+
with the remainder of the 10 milliseconds. The specified time is less than a
545+
system tick period, so the blocking `esp_rom_delay_us()` is now called.
548546
549547
So what about time slicing? Even if `esp_rom_delay_us()` blocks, the FreeRTOS
550548
scheduler can switch to another task. Firstly, if this thread is of a higher
551549
priority, _no lower priorities will ever run_. But even if everything is of the
552550
same priority, the CPU will just switch back to the blocking call on the next
553-
round robin, continuing the blocking wait. Which in our current scenario is
551+
round robin, continuing the blocking wait. In our current scenario, this is
554552
horribly inefficient, unnecessary, and unexpected.
555553
556554
Any call to `sleep_for()` greater than the tick period has this problem because
557-
the tick interrupt is completely asynchronous to the `sleep_for()` call. This
558-
means when the scheduler returns from `vTaskDelay()` some random remainder of
559-
time will be done with `esp_rom_delay_us()` in order to sleep for the _precise_
560-
amount of time requested.
555+
the tick interrupt is asynchronous to the `sleep_for()` call. This means when
556+
the scheduler returns from `vTaskDelay()` some random remainder of time will be
557+
done with `esp_rom_delay_us()` in order to sleep for the _precise_ amount of
558+
time requested.
561559
562560
The new version of `sleep_for()` is much more precise, but it is at the cost of
563-
computing efficiency on the ESP32, because some fraction of the tick period will
561+
computing efficiency on the ESP32 because some fraction of the tick period will
564562
be _busy waited_ instead of yielded. That is very bad to do on an MCU.
565563
566564
Of course, none of this is transparent to the application code, and I doubt it
567565
was something intentional from Espressif. It is just a consequence of upgrading
568-
to a new version of GCC combined with the way IDF implements `usleep()`.
566+
to a new version of GCC combined with how IDF implements `usleep()`.
569567
570568
## Is `usleep()` broken in IDF v5?
571569
572570
Did Espressif actually implement `usleep()` wrong? Yes. It needs to be fixed.
573571
574-
For periods at or longer than system tick, `usleep()` can return before the
572+
For periods at or longer than the system tick, `usleep()` can return before the
575573
specified time. It shouldn't do that. It must error on the side of sleeping too
576574
long to ensure it _never_ sleeps to little. So yes, it is broken in my view.
577575
`stdlibc++` isn't to blame.
578576
579577
Since `usleep()` is sometimes short by 1 system tick period, we could just add
580-
another tick count to our calculations. That would cause `usleep()` to sometimes
581-
take longer, but never shorter than the time specified. This would prevent
582-
`__sleep_for()` from calling `usleep()` a second time with a faction of a system
583-
tick period, so no more blocking busy wait.
578+
another tick count to our calculations. That would sometimes cause `usleep()` to
579+
take longer, but never shorter than the specified time. This would prevent
580+
`__sleep_for()` from calling `usleep()` a second time with a fraction of a
581+
system tick period, so no more blocking busy wait.
584582
585583
We can be a little more sophisticated by taking a cue from the newer
586584
`__sleep_for()` and keep checking the monotonic clock. Something like this:
@@ -610,18 +608,18 @@ int usleep(useconds_t us)
610608
}
611609
```
612610

613-
With this approach, small sleep times are precise (as they always were), and
611+
With this approach, short sleep times are precise (as they always were), and
614612
longer sleep times are non-blocking and always as long or longer than the
615613
requested time. By testing the monotonic clock, the extra tick is only applied
616614
when needed.
617615

618616
I opened a [PR](https://github.com/espressif/esp-idf/pull/15132) with this
619617
change to IDF. Hopefully it gets approved.
620618

621-
Since I didn't want to work with a custom, patched IDF, I ended up replacing all
622-
calls to `std::this_thread::sleep_for()` with our own function. It has the same
623-
default signature, with the optional ability to specify a "sleep strategy." We
624-
can now force the custom `sleep_for()` to busy wait or to yield:
619+
Since I didn't want to work with a custom, patched IDF, I replaced all calls to
620+
`std::this_thread::sleep_for()` with our own function. It has the same default
621+
signature, with the option to specify a "sleep strategy." We can now force the
622+
custom `sleep_for()` to busy wait or to yield:
625623

626624
```c++
627625
enum class SleepStrategy {
@@ -675,12 +673,12 @@ while allowing more control over how to perform the sleep when using C++.
675673
I cut my teeth on bare metal C code where _everything_ was statically allocated.
676674
No `malloc()`. No floating point math because there was no FPU. Custom linker
677675
scripts. Debugging using GPIO pins and an oscilloscope. Using precalculated
678-
value tables to save a few microseconds in an ISR. We ran at 24 Mhz. At that
679-
time, C was luxurious because the "old" stuff was running at 1 Mhz usually in
676+
value tables to save a few microseconds in an ISR. We ran at 24 MHz. At that
677+
time, C was luxurious because the "old" stuff was running at 1 MHz, usually in
680678
assembler.
681679

682680
Back then common wisdom was that C++ was just not appropriate for
683-
microcontrollers. Of course that was a long time ago, before "modern C++" and
681+
microcontrollers. Of course, that was a long time ago, before "modern C++" and
684682
before MCUs were clocked at hundreds of MHz with L1 caching and multistage
685683
instruction pipelines.
686684

0 commit comments

Comments
 (0)