FreeRTOS vs Bare-metal comparision STM32

Hello. I am trying to compare efficiency of program written in bare-metal and based on FreeRTOS. While doing tests I noticed that functions called from tasks are executed faster than that called not from task (called before the scheduler starts). I use STM32F429I-DISCO1 board with STM32F429ZI MCU. FreeRTOS’s configuration is generated by CubeMX. I use arm-atollic-eabi-gcc compiller which is given with Attolic TrueStudio. I turned off any optimization (-O0 flag). FreeRTOS is in version 9.0.0. I measure function’s execution time with processor cycle count DWT->CYCCNT. Here is my test code. I removed unnecessary code generated by CubeMX. ~~~ void foo() { DWT->CYCCNT = 0; for(uint32_t i = 0 ;i<60000; i++){ asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); asm(“NOP”); } printf(“%drn”, DWT->CYCCNT); } void task(void* param){ foo(); while(1){}; } int main(void) { CoreDebug->DEMCR |= CoreDebugDEMCRTRCENA_Msk; DWT->CTRL &= ~0x00000001; DWT->CTRL |= 0x00000001; DWT->CYCCNT = 0;
foo();
xTaskCreate(task, "task", 100, NULL, 3, NULL);
vTaskStartScheduler();
while(1){};
} ~~~ For function foo() called from main() i get execution equal 1440909 cpu cycles. For function foo() called from task i get execution equal 1333743 cpu cycles. Do you have any ideas why that difference is? PS. Sorry for my English.

FreeRTOS vs Bare-metal comparision STM32

Can’t say why you notice the difference you do – but I will say what you are doing is measuring the performance of the MCU and in no way shows what the efficiency is of a program executing bare metal versus the same program running in an RTOS. For example, if you are using bare metal then you will probably be executing state machines that waste time looking to see if a state has changed or not, or possibly polling inputs when the input has not changed. Using the RTOS (any RTOS) you can create a completely event driven system that enables the scheduler to allocate CPU time to a task only when the task actually has useful work to do (no state machines, no polling, etc.) so can get a LOT more work done in the same number of CPU cycles – and squeeze a lot more functionality onto the same sized MCU.

FreeRTOS vs Bare-metal comparision STM32

At a guess your printf does a malloc of the internal buffer the first time its called .. AKA it has nothing to do with FreeRTOS or the tasks try two foo’s one after each other at start ~~~ foo(); foo(); ~~~ I am going to guess the first one is slow, the second will match the one called from the task 🙂

FreeRTOS vs Bare-metal comparision STM32

Thank you for the reply. The result is the same – function called from task execute faster. No matter how many times I call foo not in task ;/ To be sure that printf function is not the reason of additional cpu cycles I saved DWT->CYCCNT to variable before calling printf.

FreeRTOS vs Bare-metal comparision STM32

Thank you for the quick reply 🙂 Of course, when I use polling, the cpu utilization will be much higher. My bare-metal program is event and interrupt driven so it should be more efficiency than that with freertos, shouldn’t it? With RTOS cpu has more work to do e.g. context switch. The code included above is simple example of the problem. Of cource, the results are real for this example.

FreeRTOS vs Bare-metal comparision STM32

As to why the function runs faster in task than in main, I can think of a couple of reasons. One is that the FreeRTOS Start Scheduler function does some hardware initilization, and that might change some of processor speed. It might have turned on a hardware cache, and the task stack might be in faster ram than the main function if your processor has external ram. As to which system will be more efficient, as many things the answer will be “It Depends”. A home-grown bare metal system may be slightly more efficient in switching between operations, but a context switch actually is fairly cheap, not much more than the cost of an ISR entry. The biggest difference will be that a system like FreeRTOS will tend to give better performance in response time as you will put less code in the ISR, and the tasks can be preemptive, rather than having to wait for the current operation to reach a decision point. The need for the bare-metal system to have frequent stop point may make the RTOS based code more efficient.

FreeRTOS vs Bare-metal comparision STM32

Thank you. You are probably right becouse when I test single operation like ‘i++’ inside and outside the task the difference is about 1 cpu cycle so it looks like processor needs more time to fetch/store variable.