TCP/IP stack performance and reliability on K63F

Hello! We’ve got a custom PCB with Freescale Kinetis K63F @96MHz (Cortex M4). It’s running FreeRTOS-8.0.0, ported by Freescale, sources are taken from KSDK-1.2.0. I’ve added FreeRTOS Labs TCP/IP stack version 150825. The PHY is LAN8720a, uses RMII interface. In order to test the stability and reliability of K63F ethernet HW/driver and my NetworkInterface.c implementation, I’ve made a small test suite comprising of a UDP and a TCP “flood” test. The only difference between them (apparently) is the protocol being used (TCP/UDP). Anyway, the working mechanism is the same: 1. sender (PC / linux) generates a packet with random length (not greater than 1500 bytes); 2. sender transmits the packet; 3. receiver (K63F / FreeRTOS) receives the packet, checks the CRC, updates internal stats etc; 4. receiver sends a packet back with the same format but with different contents (and size) to the sender (PC); 5. PC checks the validity of the incoming message (CRC etc), updates stats etc; 6. and this goes in an endless loop; I’ve got two questions regarding the network stack:
  1. the transfer speed can’t be increased over ~380kB/s in each direction (TX/RX). If the payload size is smaller (e.g. ~800 bytes), it can RX/TX up to 460-470 packets in a second. If the payload is greater (e.g. ~1450 bytes), it can RX/TX less packets, but the final transfer speed will be somewhere around ~380kB/s. I believe there can be a HW bottleneck (memory bandwidth with memcpy tests is around ~100MB/s), or my NetworkInterface.c implementation is probably not optimal. What I’m curious is that according to your experiences what’s the expectable highest RX/TX speed on such a machine like mine? What transfer speeds have you faced?
  2. My UDP stress test appears to be able to run forever (>23 GB RX and >23 GB TX data, I’ve stopped it manually), but TCP test goes wrong. After ~12 hours of running (>14 GB RX and >14GB TX data) at the mentioned ~330kB/s speed (in each direction) something happens and the transfer speed falls back to ~30-40kB/s and some transfer errors occur (the CRC detects corrupt packets in my test program). With this reduced speed and some occasional badly received packets it still can run ~12 hours, when it totally crashes and the transfer speed goes down to 0-1 kB/s (0-1 packets/s). At first, I can imagine my NetworkInterface.c still needs some tuning, and the ethernet buffers for the Ethernet HW may also be need to be tuned. However, what is strange is that if I stop the test when the TCP speed is so slow (MCU is not restarted) and initiate a UDP test, it runs at ~330kB/s as it should. If I restart the TCP test, the TCP transfer speed is still too low. This is the only sign that makes me ask you whether this can be a TCP/IP stack related issue? Have you got any idea? If needed, I can provide you with the source code of my UDP/TCP tests. Of course I haven’t given up examining my codes.
Thanks in advance, tselmeci

TCP/IP stack performance and reliability on K63F

Hi Tamas,
the final transfer speed will be somewhere around ~380kB/s.
That sounds very slow to me. I’m sure that it can do much better.
What transfer speeds have you faced?
Varying between 1 and 10 MByte on a 100 mbit LAN, depending on the platform. On a Freescale Kinetis K63F @96MHz (Cortex M4), I would expect at least 3 MByte per second. When using DMA, having enough SRAM, etc. If you want, can you attach the following files?
  • FreeRTOSConfig.h
  • FreeRTOSIPConfig.h
  • NetworkInterface.c
  • The actual MAC / ethernet / DMA / PHY driver(s)
  • Your testing / demo code
Oh yes please tell us which choices that you made here: ~~~~~ Source/portable/MemMang/heap???.c FreeRTOS-Plus-TCP/portable/BufferManagement/BufferAllocation???.c ~~~~~
After ~12 hours of running (..) something happens and the transfer speed falls back
Do you have logging enabled? See FreeRTOSIPConfig.h : ~~~~~ #define ipconfigHASPRINTF 1 #if( ipconfigHASPRINTF != 0 ) #define FreeRTOS_printf( MSG ) vLoggingPrintf MSG #endif ~~~~~ Now before and after the speeds drops, it would be nice to give the command, declared in FreeRTOS_Sockets.h : ~~~~~ void FreeRTOS_netstat( void ); ~~~~~ This command will log how many sockets and buffers are being used. Here is a real example of the netstat output: ~~~~~ netstat Prot Port IP-Remote : Port R/T Status Alive tmout Child TCP 8001 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/12 TCP 8000 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/12 TCP 21 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 1/12 TCP 8021 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/12 TCP 2402 192.168.2.3 : 4281 1/1 eESTABLISHED 3 20000 TCP 23 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/3 TCP 80 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 12/16 TCP 8080 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/16 TCP 10145 0.0.0.0 : 0 0/0 eTCPLISTEN 999999 0 0/10 TCP 80 192.168.2.3 : 4331 1/1 eESTABLISHED 7464 12430 TCP 80 192.168.2.3 : 4562 1/1 eESTABLISHED 7458 12440 TCP 80 192.168.2.3 : 4563 1/1 eESTABLISHED 7782 12112 TCP 80 192.168.2.3 : 4564 1/1 eESTABLISHED 7782 12112 TCP 80 192.168.2.3 : 4565 1/1 eESTABLISHED 7768 12128 TCP 80 192.168.2.3 : 4566 1/1 eESTABLISHED 7767 12131 TCP 80 192.168.2.3 : 4584 1/1 eESTABLISHED 10233 12152 TCP 80 192.168.2.3 : 4585 1/1 eESTABLISHED 10310 12074 TCP 80 192.168.2.3 : 4586 1/1 eESTABLISHED 9674 12713 TCP 80 192.168.2.3 : 4587 1/1 eESTABLISHED 10337 12048 TCP 80 192.168.2.3 : 4588 1/1 eESTABLISHED 10338 12047 TCP 80 192.168.2.3 : 4589 1/1 eESTABLISHED 10311 12076 TCP 21 192.168.2.3 : 4596 1/1 eESTABLISHED 7828 12214 TCP 57832 192.168.2.3 : 4597 1/0 eESTABLISHED 6 18 UDP Port 30718 UDP Port 2000 UDP Port 30717 FreeRTOS_netstat: 26 sockets 23 < 24 < 36 buffers free ~~~~~ There are 26 sockets at this moment, mostly opened by browsers (port 80). The buffers are interesting: a minimum of 23, currently 24 and a maximum of 36 buffers. Regards.

TCP/IP stack performance and reliability on K63F

Hello Hein! First of all, thanks for your response. K63F has 256KB SRAM, but I’ve allocated much less for ethernet buffers. The ethernet driver is based on the KSDK’s ethernet hal/drv, and is using IRQs but not using DMA. I’ve made a quick test and the KSDK driver can transmit ~30000 ethernet frames in a second (46 bytes in size), what equals to ~1.3MB/s. Increasing the packet size is expected to boost the transfer speed too. Unfortunately KSDK’s ethernet driver can produce strange things, so I have to solve these too. I believe I’m doing something wrong in my NetworkInterface.c and my TCP/IP settings also need some tuning. I’ve added FreeRTOS_netstat(). Displays the following when running at full speed: **Prot Port IP-Remote : Port R/T Status Alive tmout Child TCP 2222 0 ip: 0 0/0 eTCPLISTEN 0 0 1/8 TCP 2222 c0a8c9de ip:47817 1/1 eESTABLISHED 0 0 FreeRTOSnetstat: 2 sockets 19 < 20 < 20 buffers free ** Displays the following just after the speed drops to 30kB/s (but still not RX errors): **Prot Port IP-Remote : Port R/T Status Alive tmout Child TCP 2222 0 ip: 0 0/0 eTCPLISTEN 0 0 1/8 TCP 2222 c0a8c9de ip:47817 1/1 eESTABLISHED 0 0 FreeRTOSnetstat: 2 sockets 18 < 20 < 20 buffers free ** Using src/heap3.c and BufferAllocation1.c. Several files you requested are attached. None of them is compilable, since I had to remove some parts, but I hope they help you. I’m working on another (more important) project in 80% of my time, so that’s the reason for the slow reply. Nevertheless I want to solve all my issues by the end of the year, so I’m going to allocate the adequate amount of time to them. Regards,

TCP/IP stack performance and reliability on K63F

Hello Tamas, Thanks for attaching the sources. I’ll give a quick response now and another response after studying the sources in detail.
The Ethernet driver is based on the KSDK’s Ethernet hal/drv, and is using IRQs but not using DMA.
I’ve downloaded the drivers and had a look at them. I would definitely define: ~~~~~ #define ENETRECEIVEALL_INTERRUPT 0 ~~~~~ If not, ENET_DRV_ReceiveData() will be called from a hardware interrupt. In stead of fetching the data, OSA_EventSet() can wake-up the task in NetworkInterface.c (using xTaskNotify() or xSemaphoreGive()). You can look into any of the existing NetworkInterface.c how this was done with xTaskNotify() / xTaskNotifyGive(). After being woken up, your MAC-task will call ENET_DRV_ReceiveData() to check if there is new RX data.
but not using DMA…
I think it is using DMA although it is not mentioned as such. The MAC peripheral has direct access to SRAM to read and write packet data. That is also why it should be well aligned.
Unfortunately KSDK’s Ethernet driver can produce strange things, so I have to solve these too. Maybe it will already help to stop using ENET_RECEIVE_ALL_INTERRUPT = 1.
From your test-pc.c :
close(sock); shutdown(sock, SHUT_RDWR);
Oops : once a socket is closed, it may not be accessed any more. A socket is like a pointer to memory, once freed it may not be accessed. After issuing a shutdown(), the program should wait until the shutdown has been executed. After starting a graceful shutdown, you would see the following:
  • select(socket) => succeeds
  • recv(socket) => 0 bytes
  • Under Windows, you’d receive an event from WinSock: “FD_CLOSE”.
The above sequence is a possible sign that the shutdown sequence is ready. Under +TCP: keep on reading from the socket until you get a negative return value other than -pdFREERTOS_ERRNO_EWOULDBLOCK: ~~~~~ /* For +TCP the second parameter of shutdown() is still ignored. / if( FreeRTOS_shutdown( xSocket, FREERTOS_SHUT_RDWR ) >= 0 ) { for( ;; ) { / You may also use FreeRTOSselect() when working with several sockets. */ xRc = FreeRTOSrecv( xSocket, xxx ); if( xRc > 0 ) { /* Use xRc data bytes. / } else if( ( xRc < 0 ) && ( xRc != -pdFREERTOS_ERRNO_EWOULDBLOCK ) ) { / The connection was shutdown gracefully. / break; } } } else { / Shutdown failed, close the socket. */ } FreeRTOS_closesocket( xSocket ); ~~~~~ This is important: “a negative return value other than -pdFREERTOS_ERRNO_EWOULDBLOCK” : it always means that the socket has become unusable and may be closed.
I’ve added FreeRTOSnetstat(). Displays the following when running at full speed: FreeRTOSnetstat: 2 sockets 19 < 20 < 20 buffers free
That looks good, you only used at most 1 buffer (officially called a “network buffer descriptor”).
Displays the following just after the speed drops to 30kB/s (but still not RX errors): Prot Port IP-Remote : Port R/T Status Alive tmout Child TCP 2222 0 ip: 0 0/0 eTCPLISTEN 0 0 1/8 TCP 2222 c0a8c9de ip:47817 1/1 eESTABLISHED 0 0 FreeRTOSnetstat: 2 sockets 18 < 20 < 20 buffers free
Hm, that also looks good.
Using src/heap3.c and BufferAllocation1.c.
Very well, BufferAllocation_1 is the module that does not use pvPortMalloc(). It is faster that BufferAllocation_2 and it guarantees that all ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS buffers will be available (independent from availability of heap memory).
Several files you requested are attached. None of them is compilable, since I had to remove some parts, but I hope they help you.
Thanks, no problem if it’s not compilable.
I’m working on another (more important) project in 80% of my time, so that’s the reason for the slow reply. Nevertheless I want to solve all my issues by the end of the year, so I’m going to allocate the adequate amount of time to them.
That should be doable. As soon as I have my Freescale board I can help to boost the development. For now I can only help with the theory 🙂 Regards.

TCP/IP stack performance and reliability on K63F

Thanks. This sounds reasonable and I’ll give it a try: ~~~~

define ENETRECEIVEALL_INTERRUPT 0

~~~~ About shutdown and close: I didn’t pay too much attention to use it correctly, since it wasn’t the point of test apps (instead testing the reliability of UDP/TCP transmissions). Regards,

TCP/IP stack performance and reliability on K63F

I’ve refactored the code to compile with #define ENETRECEIVEALL_INTERRUPT 0 but the max RX/TX transfer speed is still around ~330kB/s. I’m still thinking what could go wrong. About the stability there’s no relevant information available yet, since the stress test is about to be done over the night…

TCP/IP stack performance and reliability on K63F

The good news that the stability has improved. The bad is that I’ve found some other issues…

TCP/IP stack performance and reliability on K63F

what changes did you have to do to the source to make it work with the ENETRECEIVEALL_INTERRUPT 0 flag ?

TCP/IP stack performance and reliability on K63F

setting the ENETRECEIVEALL_INTERRUPT flag to 0 results in a hardfault as soon as I try iperf test which properly works with the flag set to 1 (and having the runtime assertion in the generic receive function disabled)

TCP/IP stack performance and reliability on K63F

Are you also working with Freescale Kinetis K63F, I assume? This is a new port. Tamás (who started this forum thread) and I are currently developing and testing a NetworkInterface.c I suggested to put the ENETRECEIVEALL_INTERRUPT flag to zero in order to avoid that the ISR (ENET_DRV_RxIRQHandler) will execute a lot of code. It looks like Tamás has his driver almost stable. As soon as it runs perfectly, you’ll hear about it in this thread. Thanks.

TCP/IP stack performance and reliability on K63F

not exactly. I have it running on a TWR-K65F180 + TWR-SER combination. really great to hear someone is working on it! besides: I get ~ 3.6 MB/s performance with the old driver an my board.

TCP/IP stack performance and reliability on K63F

As Hein wrote earlier, I’ve successfully managed to make it work with #define ENETRECEIVEALL_INTERRUPT 0 enetdevift has a signal that is called by the driver once an ethernet frame is received. I made a task (deferred RX task) watching this signal and taking the RX frames out (ENETDRV_ReceiveData) when the signal is set. Since ethernet driver is set up with preallocated buffers no memory allocation is needed in this step, so the received frame is simply passed to +TCP stack for further processing. +TCP also has preallocated buffers, so this step is very fast. +TCP calls xNetworkInterfaceOutput when it wants to send an ethernet frame out. It can happen that enet driver at that very moment doesn’t have a free buffer. No matter if xNetworkInterfaceOutput returns with an error code, it won’t attempt to send that frame out again, so it’ll be lost. So my xNetworkInterfaceOutput driver is waiting till at least one enet buffer becomes available. On my machine the +TCP stack’s task priority must be higher than the deferred RX task’s priority (which is woken up by the mentioned signal). This is a bit weird, since Hein has just suggested me the opposite setup; with that configuration my enet driver was non-functional. About the transfer speed: I’ve created four tests: 1. PC –> K63F ethernet frame flood; 2. PC <--> K63F RX/TX ethernet frame flood (ping-pong); 3. PC –> K63F TCP packet flood (size up to 1500 bytes); 4. PC <--> K63F RX/TX TCP packet flood; 1) PC sends to K63F as many raw ethernet frames with max size (1514) as possible. My K63F at 96MHz can receive them at almost 100Mbps; 2) PC sends a raw ethernet frame, K63F receives and sends it back. This is surprisingly slow, ~15Mbps in each direction (RX/TX); 3) PC sends to K63F as many TCP packets (size up to 1500 bytes) as possible. My K63F performs at ~30Mbps; 4) PC sends a TCP packet, K63F receives and sends it back. This is the slowest of all, ~6-7Mbps in each direction (RX/TX); It turned out that the CRC16 algo I used to use on my K63F is quite slow and is a bottleneck. Replacing it with a much simpler checksum algo, the transfer speed had a big jump. Also, setting the correct task priorities, using vTaskDelay, taskYIELD etc. can also boost the performance, since other task don’t get starved. There are still options to further increase transfer speed (e.g. zero-copy, or by very thoroughly refactoring all memcpy calls, using MAC’s HW capabilities etc.). Or by reaching the theoretical highest speed of the MCU (120 MHz) 😉 I want to say “thank you” for Hein’s exhaustive help 😉 he has given me some really good ideas.

TCP/IP stack performance and reliability on K63F

may I ask why you don’t use the hardare CRC unit in your processor? do you plan to release your work ?

TCP/IP stack performance and reliability on K63F

This CRC (CRC16) is on top of the received TCP packet, which is already protected by TCP/IP’s CRC and ethernet frame’s CRC. I was just using it during development until I ensured that my TCP packets really get transferred without any corruption. In another words, I’d say it was a kind of paranoid check for my sake only. The MAC uses some HW accelerators, but I can’t tell you now exactly what. I’ve got a scheduled ticket to turn on more MAC HW accelerators to boost speed, but it’s of low priority now. I don’t plan on releasing my final NetworkInterface.c, since it’s the property of the company I’m working for and I’m not authorized to do so. I hope this isn’t against the licensing conditions of FreeRTOS/+TCP. By the way, if you read through this discussion carefully, you’ll find the skeleton of my original driver and following all the available descriptions and hints, you can create you own version too.