FreeRTOS+TCP on STM32F7

Greetings, I’ve evaluated the TCP stack from FreeRTOS+TCP version 160919 on the STM32F746 MCU together with FreeRTOS 9.0.0. Using the STM32F4 demo as a base, I’ve ported the driver layer to the STM32F7 MCU. This process wasn’t all that straight forward unfortunately, partly since it took some time to understand how the ST supplied HAL ethernet driver layer had been altered in this F4 demo, and since I wanted to work on the F7 I could’t just use the FreeRTOS modifed F4 HAL drivers as is. I ended up modifying the glue layer between +TCP and HAL quite a bit in order not to require any HAL layer changed. At the end of this process I was stuck with a silly intermittent error which causes intermittent TX frame drops, as far as I can tell this is an F7 specific issue which seems to have hit other users as well. My findings are documented towards the end of this thread: https://community.st.com/thread/31587, conclusion is that there appears to be a race condition inside the HAL driver, so I ended up having to apply a small patch to ST HAL layer. With my HAL patch in place I have enabled the D-cache on the F7, this cache requires some fairly specific MPU setup to ensure cache coherency. Atmel has a nice presentation on this topic for Cortex-M7, even though it is a different MCU their reasoning behing the GMAC buffer coherency in this presentation holds true also for STM32F7 as far as I can tell: http://atmel.force.com/support/servlet/fileField?id=0BEG000000002X7. What it comes down to is setting up the linker control file and the MPU for a non-cachable memory area to store the MAC buffers. I’m using the BufferAllocation_1.c allocator, but haven’t spent the time for a true zero-copy scheme yet. To test the stack I ended up using the iperf client/server from this thread: https://sourceforge.net/p/freertos/discussion/382005/thread/211fa5fa/3f27/attachment/iperfplustcp.zip. This iperf code integrates nicely, but I’m not convinced about the results yet, the throughput is very assymetrical as in this iperf output (192.168.2.142 is target, 192.168.2.26 is host) iperf -c 192.168.2.142 –dualtest -p 5001 -i 2 -n 128M Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) Client connecting to 192.168.2.142, TCP port 5001 TCP window size: 144 KByte (default) [ 5] local 192.168.2.26 port 42152 connected with 192.168.2.142 port 5001 [ 4] local 192.168.2.26 port 5001 connected with 192.168.2.142 port 59766 [ ID] Interval Transfer Bandwidth [ 5] 0.0- 2.0 sec 22.2 MBytes 93.3 Mbits/sec [ 4] 0.0- 2.0 sec 161 KBytes 660 Kbits/sec [ 5] 2.0- 4.0 sec 22.2 MBytes 93.3 Mbits/sec [ 4] 2.0- 4.0 sec 71.3 KBytes 292 Kbits/sec [ 4] 4.0- 6.0 sec 71.3 KBytes 292 Kbits/sec [ 5] 4.0- 6.0 sec 22.2 MBytes 93.3 Mbits/sec [ 5] 6.0- 8.0 sec 22.1 MBytes 92.8 Mbits/sec [ 4] 6.0- 8.0 sec 71.3 KBytes 292 Kbits/sec [ 5] 8.0-10.0 sec 22.1 MBytes 92.8 Mbits/sec [ 4] 8.0-10.0 sec 71.3 KBytes 292 Kbits/sec [ 5] 0.0-11.5 sec 128 MBytes 93.1 Mbits/sec [ 4] 0.0-11.6 sec 502 KBytes 355 Kbits/sec While running these tests, there are no errors from the TCP stack. Reason why there seems to be so little data from the target to the host is since the call to FreeRTOSSend() around line 363 inside vIPerfTCPWork() returns -pdFREERTOSERRNO_ENOSPC and the data to host is not actually sent. I’m not sure how this iperf server is intended to work and haven’t gone into details on this yet. But it seems to me that the TCP stack is very busy ACKing data on the socket in the direction towards to host and no resources are left to transmit data on the socket in the other direction. Maybe someone can shed some light on how the stack is intended to work in this scenario? Apart from that, the F7 is running at 192MHz, compiler is gcc 5.4 with newlib, ART and I+D caches enabled, STM32F7xx HAL drivers version 1.1.2. All in all, I think the integration of FreeRTOS+TCP is showing promise though the complexity of the intergration towards the platform layer (CMSIS and ST HAL) is quite complex at the moment.

FreeRTOS+TCP on STM32F7

Hi Daniel, Thanks a lot for reporting all this. We also experienced that it is difficult to write a good Ethernet driver. Especially in combination with memory caching and zero-copy. Although it sounds very simple: send packets and pass received packets to the IP stack. I struggled with cached memory in combination with DMA. DMA has direct access to the physical memory, so before and after the DMA actions, forced flushing and refreshing has to be done. In case of the Xilinx +TCP demo, just like you, we decided to reserve uncached RAM, both for +TCP and +FAT. Indeed we made some changes to the Cube Ethernet driver stm32f4xx_hal_eth.c. Most of them were optimisations, but I forgot exactly the other changes. I will have a new look at that. There was one bug that went unnoticed in the networkInterface.c for STM32F4. The following mask was used ~~~~ #define PHYLINKSTATUS ( ( uint16_t )0x2000 ) /* PHY link status interrupt mask */ ~~~~ I was mislead by its name PHY_LINK_STATUS where in fact it is an interrupt bit. The driver shoul have tested bit-0 in stead, which is the live Link Status: ~~~~ + #define PHYSRLINKSTATUS ( ( uint16t )0x0001 ) /* The actual PHY link status / + #define PHY_REG_10_PHY_SR 0x10 / PHY status register Offset */
 /* Read the result of the auto-negotiation. */
– HALETHReadPHYRegister( &xETH, PHYSR, &ulRegValue); – if( ( ulRegValue & PHYLINKSTATUS ) != 0 ) + HALETHReadPHYRegister( &xETH, PHYREG10PHYSR, &ulRegValue); + if( ( ulRegValue & PHYSRLINKSTATUS ) != 0 ) { ulPHYLinkStatus |= BMSRLINKSTATUS; } else { ulPHYLinkStatus &= ~( BMSRLINKSTATUS ); } ~~~~ iperf : the option with the -dualtest wasn’t worked-out well enough, due to a lack of time. In a meanwhile I developed an iperf v3.0 server, which implements the reverse option (--reverse), among other things. ~~~~ iperf3 -c 192.168.2.106 –port 5001 –bytes 10M /* Normal way: client is sending. / iperf3 -c 192.168.2.106 –port 5001 –bytes 10M -R / reverse option: client is receiving. */ ~~~~ So if you like, use the attached iperf_task_v3_0c.c source file and run iperf3 on your host. As for transmission by +TCP : in the Zynq project, I found that zero-copy transmission is profitable, and it is worth to reserve at least 10 full-sized (1536-byte) DMA buffers. Once all transmission buffers are full (that can happen on a 100 Mbit LAN), the driver will block and wait for a transmission-complete interrupt. This method can be found in the current ‘160919’ release (see FreeRTOS-Plus-TCPportableNetworkInterfaceZynqNetworkInterface.c).
the integration towards the platform layer (CMSIS and ST HAL) is quite complex at the moment.
Yes you are right, it is complex. I would (have) like(d) to separate all PHY stuff from the rest. This could have become a separate module. What is your next step? You could try-out iperf3, it’s going to show much better results, I’m sure 🙂 You might consider uploading your driver to here, so other people can test and use it. If you like, you can send me your driver as it is now, and I’ll have a detailed look. You can attach it to a post, or if you want, send it directly to my email (h point tibosch at freertos point org). Thanks again. Hein

FreeRTOS+TCP on STM32F7

Hi, Thanks for the feedback and the updated iperf server code. I just found a little bit of time tonight to test the iperf3 server, but wasn’t successful yet. I’ll look in more detail what is going on, but here is what I see currently… On the host: iperf3 -c 192.168.2.142 –port 5001 –bytes 10M Connecting to host 192.168.2.142, port 5001 [ 4] local 192.168.2.1 port 52668 connected to 192.168.2.142 port 5001 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.33 MBytes 11.2 Mbits/sec 0 17.1 KBytes
[ 4] 1.00-2.00 sec 1.25 MBytes 10.5 Mbits/sec 0 17.1 KBytes
[ 4] 2.00-3.00 sec 1.28 MBytes 10.7 Mbits/sec 0 17.1 KBytes
[ 4] 3.00-4.00 sec 1.25 MBytes 10.5 Mbits/sec 0 17.1 KBytes
[ 4] 4.00-5.00 sec 1.28 MBytes 10.7 Mbits/sec 0 17.1 KBytes
[ 4] 5.00-6.00 sec 1.25 MBytes 10.5 Mbits/sec 0 17.1 KBytes
[ 4] 6.00-7.00 sec 1.28 MBytes 10.7 Mbits/sec 0 17.1 KBytes On the device side: Info ../src/iperfserver.c:846 vIPerfTask: created TCP server socket 0x20003d88 bind port 5001: 0 listen 0 Info ../src/iperfserver.c:861 vIPerfTask: created UDP server socket 0x20004830 bind port 5001: 0 Info ../system/src/freertos-tcp/portable/NetworkInterface/NetworkInterface.c:593 Network buffers: 19 lowest 19 Info ../system/src/freertos-tcp/FreeRTOSDHCP.c:809 vDHCPProcess: offer c0a8028eip Info ../system/src/freertos-tcp/FreeRTOSDHCP.c:809 vDHCPProcess: offer c0a8028eip Info ../src/tcpipsupport.c:130 vApplicationIPNetworkEventHook: event 0 Info ../src/tcpipsupport.c:189 IP Address: 192.168.2.142 Info ../src/tcpipsupport.c:192 Subnet Mask: 255.255.255.0 Info ../src/tcpipsupport.c:195 Gateway Address: 192.168.2.1 Info ../src/tcpipsupport.c:198 DNS Server Address: 192.168.2.1 Info ../system/src/freertos-tcp/portable/NetworkInterface/NetworkInterface.c:593 Network buffers: 18 lowest 18 Info ../src/iperfserver.c:235 vIPerfTask: Received a connection from 192.168.2.1:52667 Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 0 ] 37 Info ../src/iperfserver.c:523 Got Control Socket: rc -1: ‘oden.1485296058.471817.5d84ec1c6fa4d’ Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 1 ] 4 Info ../src/iperfserver.c:562 TCP skipcount 87 xRecvResult 4 Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 2 ] 87 Info ../src/iperfserver.c:578 Control string: {“tcp”:true,”omit”:0,”num”:10485760,”parallel”:1,”len”:131072,”client_version”:”3.1.5″} Info ../src/iperfserver.c:235 vIPerfTask: Received a connection from 192.168.2.1:52668 Info ../src/iperfserver.c:488 TCP[ port 52668 ] recv[ 0 ] 37 Info ../src/iperfserver.c:527 Got expected client: rc 0: ‘oden.1485296058.471817.5d84ec1c6fa4d’ Info ../system/src/freertos-tcp/portable/NetworkInterface/NetworkInterface.c:593 Network buffers: 17 lowest 17 Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 3 ] 1 Info ../src/iperfserver.c:633 TCP[ port 52667 ] recv 1 bytes: 0x04 Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 4 ] 4 Info ../src/iperfserver.c:650 TCP skipcount 4294967260 xRecvResult 4 Info ../src/iperfserver.c:488 TCP[ port 52667 ] recv[ 5 ] 220 Once this “skipcount 4294967260” shows up the test hangs. Depending on which host I test from, this either happens right away or as in the example above after a few seconds. I’ll look into this in more detail later this week. I modified the source slightly to remove some compiler warnings and to enable the “ipconfigIPERF_VERSION 3” define, attaching my copy as a reference. Plan now is to test the +TCP stack and the driver layer, so try to get something like iperf running. This will give me confidence in the stack and the integration to move over to application development. Thanks Daniel

FreeRTOS+TCP on STM32F7

Hi Daniel, I just used the following settings: ~~~~

define ipconfigIPERF_VERSION 3

define ipconfigIPERFSTACKSIZEIPERFTASK 680

define ipconfigIPERFTXBUFSIZE ( 4 * ipconfigTCP_MSS )

define ipconfigIPERFTXWINSIZE ( 2 )

define ipconfigIPERFRXBUFSIZE ( 8 * ipconfigTCP_MSS )

define ipconfigIPERFRXWINSIZE ( 4 )

~~~~ and tested iperf3 on a smaller STM32F4 It showed a good performance. The BUFSIZE/WINSIZE parameters determine the TCP buffer sizes and TCP windows size.

FreeRTOS+TCP on STM32F7

Hi, Thanks for the update. I’ve adjusted slightly and ended up with these settings
#define ipconfigIPERF_VERSION                   3
#define ipconfigIPERF_STACK_SIZE_IPERF_TASK     680
#define ipconfigIPERF_TX_BUFSIZE                ( 6 * ipconfigTCP_MSS )
#define ipconfigIPERF_TX_WINSIZE                ( 4 )
#define ipconfigIPERF_RX_BUFSIZE                ( 16 * ipconfigTCP_MSS )
#define ipconfigIPERF_RX_WINSIZE                ( 4 )
I’m now getting consistent high transfer rates and stable performance based on the testing so far. Looks good at this point, so next phase will be to add my own application on top of this stack. Thanks for the support.

FreeRTOS+TCP on STM32F7

Hi Sasha, I have a STM32F746 Discovery board now, so I can finally help testing. Would you mind sharing your +TCP driver for STM32F7xx? The same for Daniel Nilsson ( if you read this post ) or other people using the STM32F7 along with FreeRTOS+TCP. You can either attach your source code to a post or send it by email to “h [ point ] tibosch [ at ] freertos [ point ] org”. Thanks, Hein

FreeRTOS+TCP on STM32F7

Hi Daniel I would stay well clear of the HAL code, it is buggy as hell, and to figure out the problems you can spend more time than writing (and learning from scratch). regards glen

FreeRTOS+TCP on STM32F7

I don’t know all details about the HAL code. I read that in the HAL library some choices have been made that not everyone agrees upon. I did check and the SDMMC and ETH drivers of both the earlier ST standard library, as well as HAL. I must admit that those two drivers have gotten better through time. The latest SDMMC driver does support more types of memory cards. For both drivers, we’ve added support to make them interrupt-driven, and also we looked at how DMA works together with the caching of the STM32F7. We’ve added ‘Clean’ ( flush ) and ‘Invalidate’ statements. The SDMMC driver: for some small requests, DMA is not used, and in those cases underflow could occur. We optimised this by reading from the internal FIFO with a simple sequence of statements:
/* Read data from SDIO Rx FIFO */
tempbuff[0] = *( pulFIFO );
tempbuff[1] = *( pulFIFO );
tempbuff[2] = *( pulFIFO );
tempbuff[3] = *( pulFIFO );
tempbuff[4] = *( pulFIFO );
tempbuff[5] = *( pulFIFO );
tempbuff[6] = *( pulFIFO );
tempbuff[7] = *( pulFIFO );
These are small reads, much less than 512 bytes. Within a short time, I will post a new driver that should work for both STM32 F4 and F7

FreeRTOS+TCP on STM32F7

Hi Daniel, The new +TCP and +FAT drivers for the STM32F7 are ready to be tested. I just posted then here. They are zero-copy, interrupt driven, and aware of caching (D-cache). Please have a try and comment if you like.