ST STM32H745 Dual Core AMP Demo
Core to core communication using Message Buffers
[RTOS Ports]

STM32H745I Discovery Board

This page documents a simple Asymmetric Multi Processing (AMP) core to core communication demo implemented using FreeRTOS message buffers. It is accompanies by a separate article that describes some of the internal implementation details.

The demo is preconfigured to run on the STM32H745I Discovery Board and build with the IAR compiler and Embedded Workbench IDE. The STM32H7xx has one ARM Cortex-M4 core and one ARM Cortex-M7 core. Both cores run the same ARMv7-M FreeRTOS port.

Embedded Workbench enables a productive feature rich development environment; it ships with a fully thread aware FreeRTOS kernel plug-in and enables both MCU cores to be debugged simultaneous.

IMPORTANT! Notes on using the FreeRTOS ARMv7-M (Cortex-M4 and M7) port

Please read all the following points before using this RTOS port.

Source code organisation
The demo application functionality
Building and running the RTOS demo application
Debugging the demo application - STLink
Debugging the demo application - I-jet
RTOS configuration and usage details

Also see the FAQ My application does not run, what could be wrong?.

Source Code Organization

The FreeRTOS zip file download contains the source code for all the FreeRTOS ports, and every demo application. That means it contains many more files than are required to use the FreeRTOS STM32H745I dual core AMP demo. See the Source Code Organization page for information on the zip file's directory structure. The IAR Embedded Workbench workspace for this demo is located in the FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR directory. The project in the workspace contains two configurations, one for the Cortex-M4 core and one for the Cortex-M7 core.

The STM32H745 Dual Core Demo Application

Functionality

Message buffers are used to pass an ASCII representation of an incrementing number (so "0", followed by "1", followed by "2", etc.) from a single 'sending' RTOS task (or 'thread') that runs on the Arm Cortex-M7 core to two "receiving" RTOS tasks running on the Arm Cortex-M4 core. The string changes length as the number of digits increases. There are two data message buffers, one for each receiving task. To distinguish between the receiving tasks one is assigned the task number 0, and the other task number 1.

rtos on both cores in AMP multicore configuration

Hardware topology. Click to enlarge.

The Cortex-M7 task sits in a loop sending the ascii strings to each Cortex-M4 task. If a receiving task receives the next expected value in the sequence it prints its task number to the UART. If a receiving task receives anything else, or its attempt to receive data times out, then it hits an assert() that prints an error message to the UART before stopping all further processing on the Cortex-M4 core. The following pseudocode snippets demonstrate the structure of the sending and receiving tasks respectively.

SendingTask()
{
    for ever
    {
        Generate the next string in the sequence

        /* The message buffers become full so a block time is used.
        on each send. */
        Send the generated string to the first message buffer
        Send the generated string to the second message buffer
    }
}

Simplified psuedocode showing the structure of the sending task

ReceivingTask()
{
    for ever
    {
        Read next message from the message buffer

        /* Failed asserts print an error and stop execution. */
        configASSERT( Received message is next expected in sequence );
        Write task number (0 or 1) to the UART
    }
}

Simplified psuedocode showing the structure of the receiving tasks

When executing correctly a stream of "0"s will be written to the UART by the receiving task assigned task number 0, with one 0 written each time that receiving task receives the next expected message in the sequence. Likewise a stream of "1"s will be written to the UART by the receiving task assigned task number 1. The demo runs as fast as it can, the time taken to output characters to the UART being the limiting factor. The control message buffer (see the Implementation section below) becomes full because the sending task runs on the Cortex-M7 core, which is clocked at twice the speed of the Cortex-M4 core, plus the sending task is not throttled by the speed of the UART.

UART output when executing the demo

Implementation Details

The separate article that accompanies this demo provides a detailed explanation.

Building and running the RTOS demo application

Important note: The project will not build if the directory structure is different to the directory structure used in official FreeRTOS zip file releases.

To build and run the demo application:

Ensure the parts database included with your IAR Embedded Studio for ARM instillation includes the STM32H745. At the time of writing it was necessary to manually download the STM32Cube Package for STM32H7 to obtain a patch that updates the IAR installation. This will no longer be necessary if you are using the latest IAR tools.
The demo outputs UART data via USB connector CN14 (marked STLink) on the STM32H745I Discovery Board. Connect a USB cable between port CN14 on the STM32H745I Discovery Board and the host computer (the computer that will be used to view the UART output) and power the board up in order for the USB (virtual) COM port to enumerate on the host. There are several power options, set by the JP8 bank of jumpers.
Using a dumb serial terminal such as Teraterm on the host computer, connect to whichever COM port was enumerated when the Discover Board was connected and set the port settings to 115200 baud, with no parity bits, 8 data bits, and 1 stop bit. An easy way to find the COM port number is to see which port numbers are provided as options in the dumb terminal both with and without the STM32H745I Discovery Board powered up.
Open FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR/Project.eww from within the IAR Embedded Workspace IDE (or simply double click the file to open it in Embedded Workbench).
Use the drop down at the top of the Workspace window to select the configuration for the Cortex-M4 core.
Select "Make" from the "Project" menu to build the project (or just press F7).
Open the project options dialog by right clicking the project in the Workspace window and selecting "Options" from the pop up menu.
Select your debug interface from the "Debugger" category in the options dialog box. I tested using both the built in STM32Link and an external I-jet.
Still in the "Debugger" category in the options dialog, select the category specific to your debug interface and ensure the options are set to "connect under reset" and use the SWD (as opposed to the JTAG) interface.
Finally in the "Debugger" category, and in preparation for debugging in Embedded Workbench (see below), select the "Plugins" tab and ensure the fully thread aware FreeRTOS kernel plug-in is selected. Also select the WITTENSTEIN StateViewer plug-ins if you have installed them separately.
Select "Download->Download Active Project" from the "Project" menu to program the Cortex-M4 core.
Repeat the steps above, but this time for the Cortex-M7 core (so start by using the drop down at the top of the Workspace window to select the configuration for the Cortex-M7 core).
Press the reset button on the STM32H745 Discover Board and view the output in the dumb terminal. If all is well you will view a rapid streams of 1's and 0's scrolling up the terminal window.

To debug the demo application using the built in STLink debug interface:

Follow the instructions above to build and run the application, ensuring STLink is selected as the debug interface, and that the STLink USB connector CN14 on the STM32H745 Discovery Board is connected to the host computer.
ST provide an application note describing how to configure the STLink debug settings in the project option to enable dual core debugging. At the time of writing the project options only enable a single core to be debugged at a time. Dual core debugging should be possible if you use EWARM V8.40.1 or higher and configure the debug options as shown in the images below:

STLink settings required for dual core debugging in the Cortex-M4 project

STLink settings required for dual core debugging in the Cortex-M7 project
With the Cortex-M7 project selected as the active project, select "Download and Debug" from the "Project" menu. A message printed to the UART by the Cortex-M4 core should show the Cortex-M4 core was also reset. The debugger should break at the start of the application running on the Cortex-M7 core, from where you can step through the code, set breakpoints, inspect variables, etc. as normal.
Note that if the application is set running and then the Cortex-M7 core is stopped in the debugger while the Cortex-M4 core is left running then the receiving tasks on the Cortex-M4 core will recognise that messages have stopped arriving from the Cortex-M7 core and hit an assert(). To prevent that set the value of the xShortBlockTime variable in prvM4CoreTasks() to portMAX_DELAY so it does not time out. prvM4CoreTasks() is implemented in the Cortex-M4 main.c file.
Select the "Task List" from "FreeRTOS" menu to open the fully thread aware FreeRTOS plug-in windows.

To debug the demo application using an I-jet (both cores together):

Follow the instructions above to build and run the application, ensuring I-jet is selected as the debug interface.
With the Cortex-M7 project selected as the active project, open the project options again.
Select the "Multicore" tab in the project options "Debugger" category, then configure the tab as shown below (click to enlarge) - using FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR/Project.eww as the Slave workspace and FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR/Project.ewp as the slave project (the same workspace and project as used for the Cortex-M7 core - just a different configuration).
After closing the project options dialog select "Download and Debug" from the "Project" menu. The [master] instance of Embedded workbench will program and then start a debug session for the Cortex-M7 core. A second [slave] instance of Embedded Workbench will open automatically and do the same for the Cortex-M4 core. The master and slave instances of Embedded Workbench are synchronized so you can now start, stop and debug each core individually or start, stop and debug both cores at the same time. See IAR Multicore Debugging on the IAR website for more details.
Note that if the application is set running and then the Cortex-M7 core is stopped in the debugger while the Cortex-M4 core is left running then the receiving tasks on the Cortex-M4 core will recognise that messages have stopped arriving from the Cortex-M7 core and hit an assert(). To prevent that set the value of the xShortBlockTime variable in prvM4CoreTasks() to portMAX_DELAY so it does not time out. prvM4CoreTasks() is implemented in the Cortex-M4 main.c file.
Select the "Task List" from "FreeRTOS" menu to open the fully thread aware FreeRTOS plug-in windows.

Configuration and Usage Details

RTOS port specific configuration

Configuration items specific to the Cortex-M4 build are contained in FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR/CM4/include/FreeRTOSConfig.h, configuration items specific to the Cortex-M7 build are contained in FreeRTOS/Demo/CORTEX_M7_M4_AMP_STM32H745I_Discovery_IAR/CM7/include/FreeRTOSConfig.h. The constants defined in this file can be edited to suit your application. In particular -

configTICK_RATE_HZ
This sets the frequency of the RTOS tick interrupt. The supplied value of 1000Hz is useful for testing the RTOS kernel functionality but is faster than most applications need. Lowering the frequency will improve efficiency.
configKERNEL_INTERRUPT_PRIORITY and configMAX_SYSCALL_INTERRUPT_PRIORITY
See the RTOS kernel configuration documentation for full information on these configuration constants.
configLIBRARY_LOWEST_INTERRUPT_PRIORITY and configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY
Whereas configKERNEL_INTERRUPT_PRIORITY and configMAX_SYSCALL_INTERRUPT_PRIORITY are full eight bit un-shifted values, defined to be used as raw numbers directly in the ARM Cortex-M NVIC registers, configLIBRARY_LOWEST_INTERRUPT_PRIORITY and configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY are equivalents that are defined using just the 4 priority bits implemented in the STM32H7 NVIC. These values are provided because the CMSIS library function NVIC_SetPriority() requires the un-shifted 4 bit format.

Attention please!: See the page dedicated to setting interrupt priorities on ARM Cortex-M devices. It is also recommended to ensure that all priority bits are assigned as being preemption priority bits, and none as sub priority bits as is done in the demo project by the function call

HAL_NVIC_SetPriorityGrouping( NVIC_PRIORITYGROUP_4 );

Each port #defines 'BaseType_t' to equal the most efficient data type for that processor. This port defines BaseType_t to be of type long.

Interrupt service routines

Unlike many FreeRTOS ports, interrupt service routines that cause a context switch have no special requirements, and can be written as per the compiler documentation. The macro portYIELD_FROM_ISR() can be used to request a context switch from within an interrupt service routine.

Note that portYIELD_FROM_ISR() will leave interrupts enabled.

The following source code snippet is provided as an example. The interrupt uses a direct to task notification to synchronise with a task (not shown), and calls portYIELD_FROM_ISR to ensure the interrupt returns directly to the task.

void Dummy_IRQHandler(void)
{
long lHigherPriorityTaskWoken = pdFALSE;

    /* Clear the interrupt if necessary. */
    Dummy_ClearITPendingBit();

    /* This interrupt does nothing more than demonstrate how to synchronise a
    task with an interrupt.  A task notification is used for this purpose.  Note
    lHigherPriorityTaskWoken is initialised to zero. */
    vTaskNotifyGiveFromISR()( xTaskToNotify, &lHigherPriorityTaskWoken );

    /* If the task with handle xTaskToNotify was blocked waiting for the notification
    then sending the notification will have removed the task from the Blocked
    state.  If the task left the Blocked state, and if the priority of the task
    is higher than the current Running state task (the task that this interrupt
    interrupted), then lHigherPriorityTaskWoken will have been set to pdTRUE
    internally within vTaskNotifyGiveFromISR().  Passing pdTRUE into the
    portYIELD_FROM_ISR() macro will result in a context switch being pended to
    ensure this interrupt returns directly to the unblocked, higher priority,
    task.  Passing pdFALSE into portYIELD_FROM_ISR() has no effect. */
    portYIELD_FROM_ISR( lHigherPriorityTaskWoken );
}

Only FreeRTOS API functions that end in "FromISR" can be called from an interrupt service routine - and then only if the priority of the interrupt is less than or equal to that set by the configMAX_SYSCALL_INTERRUPT_PRIORITY configuration constant (or configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY).

Resources used by FreeRTOS

FreeRTOS requires exclusive use of the SysTick and PendSV interrupts. SVC number #0 is also used.

Switching between the pre-emptive and co-operative RTOS kernels

Set the definition configUSE_PREEMPTION within FreeRTOSConfig.h to 1 to use pre-emption or 0 to use co-operative. The full demo application may not execute correctly when the co-operative RTOS scheduler is selected.

Compiler options

As with all the ports, it is essential that the correct compiler options are used. The best way to ensure this is to base your application on the provided demo application files.

Memory allocation

Source/Portable/MemMang/heap_4.c is included in the ARM Cortex-M7 and ARM Cortex-M4 configurations to provide the memory allocation required by the RTOS kernel. Please refer to the Memory Management section of the API documentation for full information.

Miscellaneous

Note that vPortEndScheduler() has not been implemented.

Quality RTOS & Embedded Software Download FreeRTOS Menu

ST STM32H745 Dual Core AMP Demo Core to core communication using Message Buffers [RTOS Ports]