Zynq – TCP: Improve speed

Hello, herewith I’d like to share my experience with improving the TCP communication. In my case I could get more than 20 % gain in TCP speed at 1000 Mbps. Following things are necessay: 1st: Re-map the OCM (on-chip-memory) from bottom to top of address-space. 2nd: Force the linker to place the ucNetworkPackets – buffers into the OCM space. The remapping is done by a macro calling some certain assembler code directly after starting in main(). My code is : int main( void ) { xilprintf( “Hello from FreeRTOS mainrn” ); configASSERT( configUSETASKFPUSUPPORT == 2 ); xilprintf( “configUSETASKFPUSUPPORT (FreeRTOS.h) is set to %drn”, configUSETASKFPU_SUPPORT );
    // Remap all 4 64KB blocks of OCM to top of memory and enable DDR address filtering
    MY_REMAP();

...
...
The configUSETASKFPU_SUPPORT – part could of course be omitted if not used. The code (found somewhere in the Xilinx forum) for the MY_REMAP() define is:

define MY_REMAP() asm volatile(

"mov  r5, #0x03                                           n"
"mov  r6, #0                                              n"
"LDR  r7, =0xF8000000  /* SLCR base address    */         n"
"LDR  r8, =0xF8F00000  /* MPCORE base address  */         n"
"LDR  r9, =0x0000767B  /* SLCR lock key        */         n"
"mov  r10,#0x1F                                           n"
"LDR  r11,=0x0000DF0D  /* SLCR unlock key                 n"
"dsb                                                      n"
"isb                   /* make sure it completes */       n"
"pli  do_remap     /* preload the instruction cache */    n"
"pli  do_remap+32                                         n"
"pli  do_remap+64                                         n"
"pli  do_remap+96                                         n"
"pli  do_remap+128                                        n"
"pli  do_remap+160                                        n"
"pli  do_remap+192                                        n"
"isb                   /* make sure it completes */       n"
"b    do_remap                                            n"
".align 5, 0xFF         /* forces the next block to a cache line alignment */ n"
"do_remap:              /* Unlock SLCR                         */ n"
"str  r11, [r7, #0x8]   /* Configuring OCM remap value         */ n"
"str  r10, [r7, #0x910] /* Lock SLCR                           */ n"
"str  r9,  [r7, #0x4]   /* Disable SCU & address filtering     */ n"
"str  r6,  [r8, #0x0]   /* Set filter start addr to 0x00000000 */ n"
"str  r6,  [r8, #0x40]  /* Enable SCU & address filtering      */ n"
"str  r5,  [r8, #0x0]                                             n"
"dmb                                                              n"
); Next step is to create a memory section “.ocm” by changing the linker-desciption file. Following changes are to be done: In the memory section add: ps7_ocm : ORIGIN = 0xfffc0000, LENGTH = 0x3fe00 In the section description add: .ocm (NOLOAD) : { __ocmstart = .; *(.ocm) __ocmend = .; } > ps7_ocm Final step is to inform the buffer definition, that the buffers should be placed into the ocm. In file: NetworkInterface.c add the ocm-section attribute. It then should be: static uint8t ucNetworkPackets[ ipconfigNUMNETWORKBUFFERDESCRIPTORS * niBUFFER1PACKET_SIZE ] attribute ( ( aligned( 32 ) ) ) attribute ((section (“.ocm”))); After compile and link one could inspect the map-file to see, if the ocm section is successful generated and populated. It looks like: .ocm 0x00000000fffc0000 0x30000 0x00000000fffc0000 __ocmstart = . *(.ocm) .ocm 0x00000000fffc0000 0x30000 ./src/Ethernet/FreeRTOS-Plus-TCP/portable/NetworkInterface/Zynq/NetworkInterface.o 0x00000000ffff0000 __ocmend = . All hints and changes are of course without my responsibility and warrenty If anybody has other or additional changes or hints to improve the speed in TCP communication let me please know. Especially someone could comment, if it makes sense to push other buffers or variables into the ocm. Greetings to all.

Zynq – TCP: Improve speed

Really appreciate you taking the time to write this up.

Zynq – TCP: Improve speed

Hi Johannes, thanks a lot for sharing this. I’m afraid I can not comment on it as I don’t know enough about the Zynq memory handling. But I would be curious to see the results of a test with iperf3. I’ll attach the latest version ( v3.0d ) of the iperf server to this message. To activate the server wait for +TCP to be ready and call: ~~~ void vIPerfInstall( void ); ~~~ You can start a test on the host with this command: ~~~ iperf3 -c 192.168.2.114 –port 5001 –bytes 100M [ -R ] ~~~ The reverse flag ( -R ) causes the Zynq to send data, in stead of receiving data.

Zynq – TCP: Improve speed

I wrote:
I would be curious to see the results of a test with iperf3.
It would be great if you can start two sessions simultaneously in two directions: ~~~ iperf3 -c 192.168.2.114 –port 5001 –bytes 1G iperf3 -c 192.168.2.114 –port 5001 –bytes 1G -R ~~~ We recently saw a problem with Zynq: incoming packets were dropped under heavy traffic, causing very slow transfer speed for the first session ( the one without -R ). I am curious to see if a faster memory access will prevent these problems

Zynq – TCP: Improve speed

Hello Hein, here are the results of the iperf3 tests I’ve done at my Zynq7000, running at 666MHz. First I had to change the ip-address and port-number according to needs of our firewall. Then I got a stack-fault. Maybe that some other tasks wich are running on my system caused this. These other tasks don’t use much CPU time, I believe, so I didn’t change the software for the iperf tests. After setting the stack-size to 1000 everything was fine. Next I changed the window- and buffer settings to those values I used at my HTTP-server work. The code after change is:

ifndef ipconfigIPERFTXBUFSIZE

define mySETTINGS

ifdef mySETTINGS

#define ipconfigIPERF_TX_BUFSIZE                ( 128 * 1024 )
#define ipconfigIPERF_TX_WINSIZE                ( 48 )
#define ipconfigIPERF_RX_BUFSIZE                ( ( 80 * 1024 ) - 1 )
#define ipconfigIPERF_RX_WINSIZE                ( 24 )

else

#define ipconfigIPERF_TX_BUFSIZE                ( 65 * 1024 )   /* Units of bytes. */
#define ipconfigIPERF_TX_WINSIZE                ( 4 )           /* Size in units of MSS */
#define ipconfigIPERF_RX_BUFSIZE                ( ( 65 * 1024 ) - 1 )   /* Units of bytes. */
#define ipconfigIPERF_RX_WINSIZE                ( 8 )           /* Size in units of MSS */

endif

endif

By the way: Why should or must the RX_BUFSIZE be uneven? I’ve done tests with original bufsizes with and without -R, and also tests with my settings, also both variants. Finally I did the concurrent test as you supposed and saw that indeed the performance of the one without -R really dropped. On my debug uart I got lots of messages like: SACK[4503,34508]: optlen 12 sending 14583407 – 14584867 The test results are: Original bufsize: [ 4] 0.00-19.59 sec 1.00 GBytes 438 Mbits/sec sender [ 4] 0.00-19.59 sec 1024 MBytes 438 Mbits/sec receiver and for the reverse mode: [ 4] 0.00-27.12 sec 37.0 Bytes 10.9 bits/sec 4294967295 sender [ 4] 0.00-27.12 sec 1.00 GBytes 317 Mbits/sec receiver mySettings bufsize: [ 4] 0.00-16.43 sec 1.00 GBytes 523 Mbits/sec sender [ 4] 0.00-16.43 sec 1024 MBytes 523 Mbits/sec receiver and for the reverse mode: [ 4] 0.00-13.81 sec 37.0 Bytes 21.4 bits/sec 4294967295 sender [ 4] 0.00-13.81 sec 1.00 GBytes 622 Mbits/sec receiver For completeness I added the results in a file. Greetings

Zynq – TCP: Improve speed

Thanks Johannes, for these detailed and systematic measurements. It looks like using your memory settings makes the Ethernet communication faster, at least a 20%. But unfortunately, it does not help against the packet loss in this case: ~~~ My settings bufsize parallel: Connecting to host 169.254.79.19, port 4503 [ 4] local 169.254.214.213 port 35510 connected to 169.254.79.19 port 4503 [ ID] Interval Transfer Bandwidth [ 4] 3.00-4.00 sec 512 KBytes 4.19 Mbits/sec
[ 4] 4.00-5.00 sec 896 KBytes 7.34 Mbits/sec
~~~ In this case with heavy two-way traffic, incoming packets are being dropped. Earlier I found that it helps to decrease the packet size of TCP packets ( MSS ).
and saw that indeed the performance of the one without -R really dropped. On my debug uart I got lots of messages like: SACK[4503,34508]: optlen 12 sending 14583407 - 14584867
That is indeed a sign of packets being dropped

Zynq – TCP: Improve speed

HI Johannes, I finally solved the problem of the lost packets during concurrent transmissions. See [this post] (https://sourceforge.net/p/freertos/feature-requests/126/) See xemacpsifdma.c in the function emacps_send_message((), it is very essential to read back the register that was just set: ~~~ XEmacPsWriteReg( ulBaseAddress, XEMACPSNWCTRLOFFSET, xxx ); + /* Reading it back is important compiler is optimised. */ + XEmacPsReadReg( ulBaseAddress, XEMACPSNWCTRLOFFSET ); ~~~ Now I started two concurrent sessions with iperf3, and both transported an equal amount of data. This is the new Zynq driver