STM32F4 with FPU
Hi!
I just got my discovery board, and would like to try out the FPU. Did anyone write a port yet?! Or a time estimate when it will be officially supported?
I just had a quick look at the architecture manual.. It seems that FreeRTOS would have to store the entire state of the FPU, adding at least 32×4 bytes (Are all 32 FPU-registers in use by compilers? Seems to be an awful lot!). Perhaps i’ll give it a try myself.
STM32F4 with FPU
A lot of thought and work has already gone into supporting the Cortex-M4F, but support is not yet officially available. Note that if you have the FPU turned off then the standard Cortex-M3 port will work fine, but having the FPU turned on is much more complex than you might imagine.
The easy option, if you wish to do it yourself, is to set the FPU related registers to save and restore the FPU context automatically on each interrupt. This is horrendously inefficient with the VFP architecture of the M4F, especially when you consider that only a few tasks will ever use the FPU. Only half the context can be saved automatically, so the other have has to be done manually.
Another option is to allow tasks to register themselves as FPU context users, then manually save the FPU context for just those tasks. That is a little more efficient, but will still result in FPU contexts being saved unnecessarily sometimes.
Another extreme is to attempt to use the lazy save mechanism of the FPU (note lazy save is turned on by default). If you do that, then you have an extremely complex problem to implement, and if interrupts use the FPU too (they might if they are doing something like motor control) then there are a dozen corner cases to take care of once interrupts start nesting that are near impossible to test.
Yet another options is to preform a software lazy save.
Etc. Etc.
Also a word of warning – take extreme care to set up your compiler such that it does not randomly use FPU registers as temporary registers in tasks that are not themselves using the FPU. Some do that, unless special non default command line options are used.
Have fun.
Regards.
STM32F4 with FPU
Hi again!
I think I got my port up and running.. please find it here:
https://github.com/thomask77/FreeRTOS_ARM_CM4F
Before I started, I did some performance measurements. As you said, the time for a full FPU state save/restore is quite long. A pair of vpush {s0-31}/vpop {s0-s31} takes around 400ns on my STM32F407 @ 168MHz.
On the other hand, that translates to just ~68 cycles, which is not that bad at all if you consider the overall performance gain of the FPU vs. software emulation.
Still, I don’t want to have the performance hit for things like serial-port or motor-control interrupts. So I’ll leave the hardware lazy-save mode enabled.
Without an OS switching tasks, the CPU will just do the right thing anyways:
The AAPCS says that s0-s15 are used as scratch registers, so they’re automatically (lazy)-saved on exception entry. s16-s31 are saved by the compiler. There is a performance hit of ~200ns for entry/exit if the lazy save is actually triggered. For interrupts without FPU instructions there is no additional overhead.
The only time when all registers must be saved and restored is for a task switch. This will take about 400ns longer than without FPU.
I added the extended stack frame registers to pxPortInitialiseStack, vPortSVCHandler and xPortPendSVHandler. Additionally, vPortSVCHandler marks the stack frame as an extended frame (Bit 4, LR/EXC_RETURN value).
I must warn that the code is _not_ yet fully tested! Use at your own risk!
Have fun,
Thomas Kindler <mail_cm4@t-kindler.de>
Thomas Kindler <mail_cm4@t-kindler.de>
STM32F4 with FPU
Hi!
In the meantime, I’ve improved my port. Actually, it was simpler than I thought.. compared to the normal Cortex-M3 port, very few additions were required.
https://github.com/thomask77/FreeRTOS_ARM_CM4F
Here’s the README:
This is the second version of my FreeRTOS port for ARM Cortex M4 cores with FPU support.
It does now support both FPU and non-FPU tasks, and tries to only save the necessary registers.
To achieve this, the EXC_RETURN value (stored in the LR register during exceptions, esp. the PendSVCHandler) of a task is saved on it’s stack. Only if bit 4 of the EXC_RETURN value indicates an extended stack frame, the FPU registers are saved or restored.
See the ARM architecture manual, B1-653 for more details.
If a task uses the FPU, it will automatically set the CONTROL.FPCA bit. No special user interaction or task registration is required.
This port is also fully compatible with the FPU lazy-save feature (which is enabled by default).
Have fun,
Thomas Kindler <mail_cm4@t-kindler.de>
Thomas Kindler <mail_cm4@t-kindler.de>
STM32F4 with FPU
Hi and thanks for your work Thomas!
When I try to use your port.c in an eclipse-yagarto enviroment i run into problems ..
When I try to use your port.c in an eclipse-yagarto enviroment i run into problems ..
'Building file: ../FreeRTOS/portable/port.c'
'Invoking: ARM Yagarto Windows GCC C Compiler'
arm-none-eabi-gcc -DUSE_STDPERIPH_DRIVER -DUSE_STM32F4_DISCOVERY -DSTM32F4XX -I"E:INDIGOYAG-FreeRTOS-123FreeRTOSinclude" -I"E:INDIGOYAG-FreeRTOS-123LibrariesSTM32F4xx_StdPeriph_Driversrc" -I"E:INDIGOYAG-FreeRTOS-123FreeRTOSportable" -I"E:INDIGOYAG-FreeRTOS-123LibrariesCMSISInclude" -I"E:INDIGOYAG-FreeRTOS-123LibrariesDeviceSTM32F4xxInclude" -I"E:INDIGOYAG-FreeRTOS-123LibrariesSTM32F4xx_StdPeriph_Driverinc" -I"E:INDIGOYAG-FreeRTOS-123src" -I"E:INDIGOYAG-FreeRTOS-123Utilities" -O0 -Wall -Wa,-adhlns="FreeRTOS/portable/port.o.lst" -c -fmessage-length=0 -MMD -MP -MF"FreeRTOS/portable/port.d" -MT"FreeRTOS/portable/port.d" -mcpu=cortex-m4 -mthumb -g3 -gdwarf-2 -o "FreeRTOS/portable/port.o" "../FreeRTOS/portable/port.c"
C:UsersGLAppDataLocalTempcczJmjjA.s: Assembler messages:
C:UsersGLAppDataLocalTempcczJmjjA.s:389: Error: selected processor does not support Thumb mode `vstmdbeq r0!,{s16-s31}'
C:UsersGLAppDataLocalTempcczJmjjA.s:390: Error: instruction not allowed in IT block -- `stmdb r0!,{r14}'
C:UsersGLAppDataLocalTempcczJmjjA.s:406: Error: selected processor does not support Thumb mode `vldmiaeq r0!,{s16-s31}'
C:UsersGLAppDataLocalTempcczJmjjA.s:407: Error: instruction not allowed in IT block -- `ldmia r0!,{r4-r11}'
make: *** [FreeRTOS/portable/port.o] Error 1
What compiler version are you using? Which options do you pass to avoid problems like that?
Gregor
STM32F4 with FPU
Note that FreeRTOS V7.1.0 has two basic Cortex-M4F ports now, one for IAR and one for Keil. GCC is the next on the hit list.
The errors seem to be telling you that GCC is not expecting floating point instructions to be present. I have not tried using GCC with an M4F yet, but looking at your command line, and your output I would suggest that either you need to define the CPU as Cortex-M4F rather than just Cortex-M4 (not all Cortex-M4s have a floating point unit), or that you need to manually tell GCC that a hardware floating point unit is being used via a separate command line option.
That assumes the version of GCC you are using supports an M4F, of course.
Regards.
STM32F4 with FPU
Hi!
I’m using the codesourcery toolchain with the following options:
-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp
Keep in mind that the FPU is single precision only. So you should use sqrtf() instead of sqrt() to prevent double precision emulation calls.
You should also try
-fsingle-precision-constant
To treat float literals as single precision. Otherwise, a term like x = x * 0.123 will call a double precision library function (or write 0.123f, which I find quite awkward).
have fun!
STM32F4 with FPU
Hi!
I have tried out your Cortex-M4F port ver 0.2. with the STM32F4-Discovery board.
I use Mentor CodeSourcery Lite GCC compiler (2011.09-69-arm-none-eabi). I can compile your code, with the compiler flags:
My program hangs in this function. My program slice:
mainLED_TASK_PRIORITY is ( tskIDLE_PRIORITY + 1 ) If I look deeper with a SWD debugger the program hang in task.c xTaskGenericCreate function at line:
cd334
I use Mentor CodeSourcery Lite GCC compiler (2011.09-69-arm-none-eabi). I can compile your code, with the compiler flags:
-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp
But i have a problem at xTaskCreate funtion.My program hangs in this function. My program slice:
portBASE_TYPE task_create_LED;
task_create_LED = xTaskCreate( prvLEDTask, ( signed char * ) "Led", configMINIMAL_STACK_SIZE, NULL, mainLED_TASK_PRIORITY, NULL );
if (task_create_LED == pdPASS) printf(" LED Task Created!rn");
else printf(" LED Task Create FAILED! Err. Code: %u!rn",task_create_LED);
configMINIMAL_STACK_SIZE is 256mainLED_TASK_PRIORITY is ( tskIDLE_PRIORITY + 1 ) If I look deeper with a SWD debugger the program hang in task.c xTaskGenericCreate function at line:
/* Check the alignment of the initialised stack. */
portALIGNMENT_ASSERT_pxCurrentTCB( ( ( ( unsigned long ) pxNewTCB->pxTopOfStack & ( unsigned long ) portBYTE_ALIGNMENT_MASK ) == 0UL ) );
Can you look depper in your code? With the official Cortex-M3 port without FPU works well.
Best Regards!cd334
STM32F4 with FPU
Hi!
I have forgotten:
If i can help (futher setup, makefile, code or somteing important), please write me.
I will send you my details. Thank you! Best Regards!
cd334
If i can help (futher setup, makefile, code or somteing important), please write me.
I will send you my details. Thank you! Best Regards!
cd334
STM32F4 with FPU
I know the port you are using is not the official FreeRTOS port, but I think if you update to the FreeRTOS V7.1.0 code (and use the same contributed port layer as you are now), then you might find the problem doesn’t exist.
To know if there really is a problem, set a break point on entry to a task (before the task function prologue assembly code manipulates the stack pointer to create a stack frame for the task function), then check to see if the stack pointer is 8 byte aligned.
Regards.
STM32F4 with FPU
Hi!
Thank you for your help!
I know that is an unofficial Cortex-M4F port.
I wait for the offical Cortex-M4F gcc port. When would you release it? :)
I had some free time, i though I try the FPU with FreeRTOS out.
I use the latest V7.1.0 version of FreeRTOS. I have checked what you say, and yes the stack pointer is not 8 byte aligned.
When i set in the new portmacro.h the
cd334
I wait for the offical Cortex-M4F gcc port. When would you release it? :)
I had some free time, i though I try the FPU with FreeRTOS out.
I use the latest V7.1.0 version of FreeRTOS. I have checked what you say, and yes the stack pointer is not 8 byte aligned.
When i set in the new portmacro.h the
#define portBYTE_ALIGNMENT 4
the unofficial port works with FPU.
What is the significance of the aligment settings? What happens when I leave it at 4? Or at STM32F407 must be 8?
Best Regards!cd334
STM32F4 with FPU
What is the significance of the aligment settings? What happens when I leave it at 4? Or at STM32F407 must be 8?
You probably won’t notice any problems with it at four until you use 64 bit numbers, or use a library function that makes assumptions about how 64 bit numbers are stored. The most common symptom is getting an incorrect value for a printf() with a floating point modifier.
Regards.
STM32F4 with FPU
Hi!
I just uploaded a (really) minimal demo project for my port:
https://github.com/thomask77/STM32F4_demo
have fun,
Thomas Kindler
Thomas Kindler
STM32F4 with FPU
Is there any word on when/if this unofficial port will be made official? Or if there will be an official port sometime in the nearish future?
Cheers,
Sasha
Sasha
STM32F4 with FPU
If I run with a single task the assert fails for the line in bold. More specifically, if I remove the comments from the task create below, everything will work.
void DebugUART::Start()
{
// Init the Debug UART then start the task.
SerialDebugUARTInit();
// xTaskCreate( vDebugUARTOutputTask, (signed char *) “DebugUART”, configMINIMAL_STACK_SIZE,
// NULL, mainDEBUG_UART_TASK_PRIORITY, &hDebugOutputTask );
} My solution was simply to run with two tasks. I don’t know if this is a bug or something I am doing wrong. By the way. Thank you for the port. The STM32F4 series seems very nice in many respects. It is nice to have a FreeRTOS port for it. void vTaskSwitchContext( void )
{
.
.
.
while( listLIST_IS_EMPTY( &( pxReadyTasksLists ) ) )
{
configASSERT( uxTopReadyPriority );
-uxTopReadyPriority;
} /* listGET_OWNER_OF_NEXT_ENTRY walks through the list, so the tasks of the
same priority get an equal share of the processor time. */
listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists ) ); traceTASK_SWITCHED_IN();
}
} John
{
// Init the Debug UART then start the task.
SerialDebugUARTInit();
// xTaskCreate( vDebugUARTOutputTask, (signed char *) “DebugUART”, configMINIMAL_STACK_SIZE,
// NULL, mainDEBUG_UART_TASK_PRIORITY, &hDebugOutputTask );
} My solution was simply to run with two tasks. I don’t know if this is a bug or something I am doing wrong. By the way. Thank you for the port. The STM32F4 series seems very nice in many respects. It is nice to have a FreeRTOS port for it. void vTaskSwitchContext( void )
{
.
.
.
while( listLIST_IS_EMPTY( &( pxReadyTasksLists ) ) )
{
configASSERT( uxTopReadyPriority );
-uxTopReadyPriority;
} /* listGET_OWNER_OF_NEXT_ENTRY walks through the list, so the tasks of the
same priority get an equal share of the processor time. */
listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists ) ); traceTASK_SWITCHED_IN();
}
} John