@@ -32,8 +32,8 @@ at version v4.4 of Espressif's
32
32
(IDF). After about two days of work, the project compiled and ran in IDF v5.3.
33
33
Except, something was not quite right. Things were sluggish. FreeRTOS's
34
34
[ system state API] ( https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html#_CPPv420uxTaskGetSystemStatePC12TaskStatus_tK11UBaseType_tPC27configRUN_TIME_COUNTER_TYPE )
35
- indicated that one the two Xtensa LX6 processors on the ESP32 was pinned at 100%
36
- usage. The culprits were two high priority threads, both of which had a
35
+ indicated that one of the two Xtensa LX6 processors on the ESP32 was pinned at
36
+ 100% usage. The culprits were two high- priority threads, both of which had a
37
37
relatively small loop with a 10 millisecond sleep between each iteration.
38
38
39
39
Sleeping for 10 milliseconds should leave plenty of time to do other things.
@@ -42,7 +42,7 @@ Plus, this code worked fine in IDF v4 without an issue. What did IDF v5 do?
42
42
## FreeRTOS on the ESP32
43
43
44
44
Let's go into some background first. Experienced embedded engineers will likely
45
- find parts of this section as review, but we need to be solid in our
45
+ find parts of this section as a review, but we need to be solid in our
46
46
understanding of the fundamentals before moving on.
47
47
48
48
### System Tick Period
@@ -104,14 +104,14 @@ period of 10 milliseconds. Exactly how much time the thread will be sleep is
104
104
unknown to this code.
105
105
106
106
So to be clear: sleeping _"1 tick"_ does not mean it will sleep for one tick
107
- _period of time_. In fact, it will basically always sleep some amount less than
108
- one tick period, and sometimes it will basically not sleep at all.
107
+ _period of time_. In fact, it will always sleep some amount less than one tick
108
+ period, and sometimes it will not sleep at all.
109
109
110
110
So then, is calling `vTaskDelay(1)` in a small loop a problem? Not
111
111
intrinsically. Like most things, it' s fine if you know what you are doing. If
112
112
you want to ensure a thread (technically called a
113
113
[ Task] ( https://www.freertos.org/Documentation/02-Kernel/02-Kernel-features/01-Tasks-and-co-routines/00-Tasks-and-co-routines )
114
- in FreeRTOS) exectues its logic is performed on every tick, this a way to do it.
114
+ in FreeRTOS) executes its logic is performed on every tick, this a way to do it.
115
115
There are no problems so long as the logic is not sensitive to tick jitter and
116
116
its processing only requires a small fraction of the tick period.
117
117
@@ -141,9 +141,9 @@ what we want, because passing `0` to `vTaskDelay()` does nothing at all.
141
141
142
142
Okay, but what if we _ really_ need to wait for less than a tick period? We can
143
143
[ busy wait] ( https://en.wikipedia.org/wiki/Busy_waiting ) . The IDF has
144
- ` esp_rom_delay_us() ` which does just that. But this is an
144
+ ` esp_rom_delay_us() ` , which does just that. But this is an
145
145
[ Internal API] ( https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/internal-unstable.html )
146
- which we don't have source code for, and it's functionality may change or be
146
+ which we don't have source code for, and its functionality may change or be
147
147
removed in the future. ESP does give us ` usleep() ` though, and that allows
148
148
precise delays. The caveat here is that the FreeRTOS scheduler is still free to
149
149
context switch out of the busy loop due to the important bit I said above about
@@ -152,7 +152,7 @@ the FreeRTOS scheduler (and perhaps all interrupts) for that period.
152
152
153
153
The other option is to use a free running timer and trigger an interrupt when it
154
154
expires. The interrupt can signal FreeRTOS to wake the waiting task. This is
155
- great so long as your context switch overheads allow it, and you have a free
155
+ great as long as your context switch overheads allow it and you have a free
156
156
hardware timer to do it.
157
157
158
158
All of this is to say the following:
@@ -165,7 +165,7 @@ All of this is to say the following:
165
165
166
166
Alright then, so what does all this have to do with our issue related to a 10
167
167
millisecond delay? Well, I am not actually using ` vTaskDelay(1) ` directly.
168
- Instead I am using C++ thread APIs, and therefore the actual code causing all
168
+ Instead, I am using C++ thread APIs, and therefore, the actual code causing all
169
169
the CPU usage is ` std::this_thread::sleep_for(10ms) ` .
170
170
171
171
Espressif provides
@@ -177,7 +177,7 @@ So how does something like an ESP32 MCU support C++ threads anyway?
177
177
178
178
### ` std::thread ` , POSIX Threads, and FreeRTOS Tasks, oh my!
179
179
180
- Espressif's IDF compiles with GCC, and therefore uses
180
+ Espressif's IDF compiles with GCC and therefore uses
181
181
[ The GNU C++ Library] ( https://gcc.gnu.org/onlinedocs/libstdc++/ ) (` libstdc++ ` ).
182
182
` libstdc++ ` 's implementation of ` std::thread ` is built on top of
183
183
[ ` pthreads ` ] ( https://en.wikipedia.org/wiki/Pthreads ) , but it is up to the
@@ -195,7 +195,7 @@ If that's not enough, we also need to consider the
195
195
[ C standard library] ( https://en.wikipedia.org/wiki/C_standard_library ) . IDF uses
196
196
[ newlib] ( https://en.wikipedia.org/wiki/Newlib ) and has custom platform
197
197
implementations of its sleeping functions. Thankfully, ` newlib ` is not
198
- precompiled and you can easily find it's source code in the ` components/newlib `
198
+ precompiled and you can easily find its source code in the ` components/newlib `
199
199
directory of an IDF install.
200
200
201
201
My problem appears in IDF v5 because it upgraded GCC from 8.4 to 13.2, and
@@ -236,7 +236,7 @@ If you are following along in your own IDF project, you can find the
236
236
`xtensa-esp32-elf/esp-2021r2-patch5-8.4.0/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/8.4.0/thread`
237
237
(or similar, depending on the version).
238
238
239
- IDF does not define `_GLIBCXX_USE_NANOSLEEP`, so, no `nanosleep()`. That
239
+ IDF does not define `_GLIBCXX_USE_NANOSLEEP`, so no `nanosleep()`. That
240
240
shouldn't be too surprising for an MCU implementation. Therefore, we go straight
241
241
to `__sleep_for(__s, __ns)`.
242
242
@@ -304,11 +304,11 @@ Let's clean it up by taking out the parts not compiled for the ESP32:
304
304
305
305
Pretty simple: get the number of seconds and call `sleep()`, then get the number
306
306
of microseconds and call `usleep()`. Both of these are calls into `newlib`. We
307
- will get to those in a moment. First let's look at the IDF v5's `sleep_for()`.
307
+ will get to those in a moment. First, let's look at the IDF v5's `sleep_for()`.
308
308
309
309
### IDF 5.x: `std::this_thread::sleep_for()`
310
310
311
- Once again `sleep_for()` is a wrapper for `__sleep_for()`. Here is the newer
311
+ Once again, `sleep_for()` is a wrapper for `__sleep_for()`. Here is the newer
312
312
code
313
313
([see online](https://github.com/gcc-mirror/gcc/blob/1c0ecf06c0c9f3268ba52d4027e9b78aae4acb51/libstdc%2B%2B-v3/src/c%2B%2B11/thread.cc#L235)):
314
314
@@ -341,22 +341,21 @@ __sleep_for(chrono::seconds __s, chrono::nanoseconds __ns)
341
341
}
342
342
```
343
343
344
- Now we have a ` while ` loop. In it ` usleep() ` is called, but afterwards an
345
- explicit check is made to determine if the minimal amount of time has actually
346
- gone by. The only way out of the loop is if ` now ` is at or after the ` target `
347
- time.
344
+ Now we have a ` while ` loop. In it, ` usleep() ` is called, but afterward, there is
345
+ an explicit check to determine if the minimal amount of time has actually gone
346
+ by. The only way out of the loop is if ` now ` is at or after the ` target ` time.
348
347
349
348
Why the change? This is because ` nanosleep() ` may return early if a signal is
350
- delivered to the thread during its sleep. The change is found on
351
- [ commit cfef4c3] ( https://github.com/gcc-mirror/gcc/commit/cfef4c324ac300c0ad120f0fcee376159de84a0c )
352
- The relevant commit comment is the following:
349
+ delivered to the thread during its sleep. The change is in
350
+ [ commit cfef4c3] ( https://github.com/gcc-mirror/gcc/commit/cfef4c324ac300c0ad120f0fcee376159de84a0c ) ,
351
+ with the relevant commit comment as the following:
353
352
354
353
``` git
355
354
Loop while sleep call is interrupted and until steady_clock
356
- shows requested duration has elapsed.
355
+ shows requested duration has elapsed.
357
356
```
358
357
359
- ESP IDF doesn't support signals so the change should have no impact on the
358
+ ESP IDF doesn't support signals, so the change should have no impact on the
360
359
ESP32. Oh, but it does!
361
360
362
361
## IDF Newlib
@@ -403,7 +402,7 @@ int clock_gettime(clockid_t clock_id, struct timespec *tp)
403
402
}
404
403
```
405
404
406
- Calling `esp_time_impl_get_time()` gets a value based on a free running hardware
405
+ Calling `esp_time_impl_get_time()` gets a value based on a free- running hardware
407
406
timer with at least 1 us accuracy. So that is how the monotonic clock is
408
407
implemented on the ESP32. Note that this clock is completely independent of the
409
408
timer that is controlling the FreeRTOS system tick interrupt. No problems there.
@@ -442,15 +441,15 @@ int usleep(useconds_t us)
442
441
```
443
442
444
443
Remember when I said that `esp_rom_delay_us()` is an internal API and we should
445
- use `usleep()` instead? Well `usleep()` is just a wrapper to
444
+ use `usleep()` instead? Well, `usleep()` is just a wrapper to
446
445
`esp_rom_delay_us()` for short periods (e.g. less than one tick period). We can
447
446
consider `usleep()` the "public" exposure of `esp_rom_delay_us()`, but only when
448
447
the specified time is less than a system tick period. As mentioned above, this
449
448
is a busy wait, and since it does not disable the scheduler, it still allows
450
- other threads _of equal or higher priority_ to run. So the timing represents a
449
+ other threads _of equal or higher priority_ to run. So, the timing represents a
451
450
guaranteed _minimum_ only. More importantly, if there are other threads of lower
452
- priority it will _not_ context switch during this busy time. It will just sit in
453
- the thread until the wait is over.
451
+ priority, it will _not_ context switch during this busy time. It will just sit
452
+ in the thread until the wait is over.
454
453
455
454
This is all good. A guaranteed minimum is how I expect `usleep()` to work.
456
455
@@ -478,25 +477,24 @@ calculations would give us `vTaskDelay(2)`:
478
477
4. But since we are using integers, it is just `2`
479
478
480
479
But what does `2` mean? It means two occurrences of the tick interrupt. Once
481
- again we can ask: when is the next tick interrupt from the moment that
480
+ again, we can ask: when is the next tick interrupt from the moment that
482
481
`vTaskDelay(2)` is called? The tick interrupt could be in 1 nanosecond all the
483
482
way up to 10 milliseconds from now. After that, the second tick will be the
484
483
system tick period of 10 milliseconds. So for a tick count of `2`, we will
485
484
actually sleep between between 10 and 20 milliseconds.
486
485
487
- Despite the fact that the comment says it is rounding up to compensate for the
488
- first tick potentially not blocking at all, the compensation does not account
489
- for the worse case minimal timing. In the example I gave, a 15 millisecond
490
- request will sometimes only sleep for 10 milliseconds. Likewise, a 10
491
- millisecond `usleep()` will sometimes sleep about 0 milliseconds. The greatest
492
- _potential_ differential comes with calling `usleep()` with a multiple of the
493
- tick period. In that case, the actual time slept may be short by an entire tick
494
- period.
495
-
496
- Whether this "shorter than specified" time matters depends on requirements of
497
- course. In my experience a lot of application level logic would probably be fine
498
- with a "10 to 20" millisecond wait. Hardware driver logic and audio pipelines on
499
- the other have specific realtime characteristics where this would absolutely be
486
+ Even though the comment says it is rounding up to compensate for the first tick
487
+ potentially not blocking at all, the compensation does not account for the
488
+ worse-case minimal timing. In the example I gave, a 15-millisecond request will
489
+ sometimes only sleep for 10 milliseconds. Likewise, a 10 millisecond `usleep()`
490
+ will sometimes sleep about 0 milliseconds. The greatest _potential_ differential
491
+ comes with calling `usleep()` with a multiple of the tick period. In that case,
492
+ the time spent may be short by an entire tick period.
493
+
494
+ Whether this "shorter than specified" time matters depends on requirements. In
495
+ my experience, a lot of application-level logic would probably be fine with a
496
+ "10 to 20" millisecond wait. On the other hand, hardware driver logic and audio
497
+ pipelines have specific realtime characteristics where this would absolutely be
500
498
a problem.
501
499
502
500
According to `man 3 sleep` and
@@ -512,19 +510,19 @@ sleep more if needed.
512
510
> scheduling of other activity by the system.
513
511
514
512
The fact that ESP32's IDF `usleep()` can be short by up to one system tick
515
- period means doesn't follow the specification .
513
+ period means it doesn't follow the specifications .
516
514
517
515
## Would You Like Precision or Efficiency?
518
516
519
517
Let's recap.
520
518
521
519
In IDF v4, calling `std::this_thread::sleep_for()` calls `usleep()` once. When
522
- `std::this_thread::sleep_for(10ms)` is executed, it calls `vTaskDelay(1)` and
520
+ `std::this_thread::sleep_for(10ms)` is executed, it calls `vTaskDelay(1), ` and
523
521
the thread will sleep between 0 and 10 milliseconds. It will usually sleep for
524
522
less than the time specified.
525
523
526
524
In IDF v5, calling `std::this_thread::sleep_for(10ms)` almost always calls
527
- `usleep()` _twice_. The first time will use `vTaskDelay(1)` and it will usually
525
+ `usleep()` _twice_. The first time will use `vTaskDelay(1)`, and it will usually
528
526
sleep for less than the time specified. Then, back in `libstdc++`
529
527
`__sleep_for()`, the monotonic clock will be checked and it will be seen that
530
528
some fractional component of 10 milliseconds remains, causing a second call to
@@ -539,48 +537,48 @@ Let's break it down further. For a 10 millisecond delay:
539
537
540
538
But once again, saying `vTaskDelay(1)` means "wait for 1 tick interrupt." That
541
539
will almost always be less than 10 milliseconds from now. When `vTaskDelay(1)`
542
- returns, somewhere between 0.000001 to 10.000000 milliseconds will have actually
543
- transpired. The newer version of `__sleep_for()` in IDF v5 / GCC 13 will double
544
- check that the correct time has actually elapsed according to the monotonic
545
- clock. That check will fail, so it calls `usleep()` again, this time with the
546
- remainder of the 10 milliseconds. The specified time is less than a system tick
547
- period, so the blocking `esp_rom_delay_us()` is now called.
540
+ returns, somewhere between 0.000001 and 10.000000 milliseconds will have
541
+ actually transpired. The newer version of `__sleep_for()` in IDF v5 / GCC 13
542
+ will double- check that the correct time has actually elapsed according to the
543
+ monotonic clock. That check will fail, so it calls `usleep()` again, this time
544
+ with the remainder of the 10 milliseconds. The specified time is less than a
545
+ system tick period, so the blocking `esp_rom_delay_us()` is now called.
548
546
549
547
So what about time slicing? Even if `esp_rom_delay_us()` blocks, the FreeRTOS
550
548
scheduler can switch to another task. Firstly, if this thread is of a higher
551
549
priority, _no lower priorities will ever run_. But even if everything is of the
552
550
same priority, the CPU will just switch back to the blocking call on the next
553
- round robin, continuing the blocking wait. Which in our current scenario is
551
+ round robin, continuing the blocking wait. In our current scenario, this is
554
552
horribly inefficient, unnecessary, and unexpected.
555
553
556
554
Any call to `sleep_for()` greater than the tick period has this problem because
557
- the tick interrupt is completely asynchronous to the `sleep_for()` call. This
558
- means when the scheduler returns from `vTaskDelay()` some random remainder of
559
- time will be done with `esp_rom_delay_us()` in order to sleep for the _precise_
560
- amount of time requested.
555
+ the tick interrupt is asynchronous to the `sleep_for()` call. This means when
556
+ the scheduler returns from `vTaskDelay()` some random remainder of time will be
557
+ done with `esp_rom_delay_us()` in order to sleep for the _precise_ amount of
558
+ time requested.
561
559
562
560
The new version of `sleep_for()` is much more precise, but it is at the cost of
563
- computing efficiency on the ESP32, because some fraction of the tick period will
561
+ computing efficiency on the ESP32 because some fraction of the tick period will
564
562
be _busy waited_ instead of yielded. That is very bad to do on an MCU.
565
563
566
564
Of course, none of this is transparent to the application code, and I doubt it
567
565
was something intentional from Espressif. It is just a consequence of upgrading
568
- to a new version of GCC combined with the way IDF implements `usleep()`.
566
+ to a new version of GCC combined with how IDF implements `usleep()`.
569
567
570
568
## Is `usleep()` broken in IDF v5?
571
569
572
570
Did Espressif actually implement `usleep()` wrong? Yes. It needs to be fixed.
573
571
574
- For periods at or longer than system tick, `usleep()` can return before the
572
+ For periods at or longer than the system tick, `usleep()` can return before the
575
573
specified time. It shouldn't do that. It must error on the side of sleeping too
576
574
long to ensure it _never_ sleeps to little. So yes, it is broken in my view.
577
575
`stdlibc++` isn't to blame.
578
576
579
577
Since `usleep()` is sometimes short by 1 system tick period, we could just add
580
- another tick count to our calculations. That would cause `usleep()` to sometimes
581
- take longer, but never shorter than the time specified. This would prevent
582
- `__sleep_for()` from calling `usleep()` a second time with a faction of a system
583
- tick period, so no more blocking busy wait.
578
+ another tick count to our calculations. That would sometimes cause `usleep()` to
579
+ take longer, but never shorter than the specified time . This would prevent
580
+ `__sleep_for()` from calling `usleep()` a second time with a fraction of a
581
+ system tick period, so no more blocking busy wait.
584
582
585
583
We can be a little more sophisticated by taking a cue from the newer
586
584
`__sleep_for()` and keep checking the monotonic clock. Something like this:
@@ -610,18 +608,18 @@ int usleep(useconds_t us)
610
608
}
611
609
```
612
610
613
- With this approach, small sleep times are precise (as they always were), and
611
+ With this approach, short sleep times are precise (as they always were), and
614
612
longer sleep times are non-blocking and always as long or longer than the
615
613
requested time. By testing the monotonic clock, the extra tick is only applied
616
614
when needed.
617
615
618
616
I opened a [ PR] ( https://github.com/espressif/esp-idf/pull/15132 ) with this
619
617
change to IDF. Hopefully it gets approved.
620
618
621
- Since I didn't want to work with a custom, patched IDF, I ended up replacing all
622
- calls to ` std::this_thread::sleep_for() ` with our own function. It has the same
623
- default signature, with the optional ability to specify a "sleep strategy." We
624
- can now force the custom ` sleep_for() ` to busy wait or to yield:
619
+ Since I didn't want to work with a custom, patched IDF, I replaced all calls to
620
+ ` std::this_thread::sleep_for() ` with our own function. It has the same default
621
+ signature, with the option to specify a "sleep strategy." We can now force the
622
+ custom ` sleep_for() ` to busy wait or to yield:
625
623
626
624
``` c++
627
625
enum class SleepStrategy {
@@ -675,12 +673,12 @@ while allowing more control over how to perform the sleep when using C++.
675
673
I cut my teeth on bare metal C code where _ everything_ was statically allocated.
676
674
No ` malloc() ` . No floating point math because there was no FPU. Custom linker
677
675
scripts. Debugging using GPIO pins and an oscilloscope. Using precalculated
678
- value tables to save a few microseconds in an ISR. We ran at 24 Mhz . At that
679
- time, C was luxurious because the "old" stuff was running at 1 Mhz usually in
676
+ value tables to save a few microseconds in an ISR. We ran at 24 MHz . At that
677
+ time, C was luxurious because the "old" stuff was running at 1 MHz, usually in
680
678
assembler.
681
679
682
680
Back then common wisdom was that C++ was just not appropriate for
683
- microcontrollers. Of course that was a long time ago, before "modern C++" and
681
+ microcontrollers. Of course, that was a long time ago, before "modern C++" and
684
682
before MCUs were clocked at hundreds of MHz with L1 caching and multistage
685
683
instruction pipelines.
686
684
0 commit comments