Project to improve the performance of LwIP for Kinghall

Description

Lwip is too slow, customer seeing 3MB/s on ftp get from target. Linux on same configuration is getting ~ 125MB/s, customer wants same on Deos.

Current results are 38.5Mbps (approx 13x faster than orig), still ~3x slower than theoretical max.

This is being worked partly as a BOT. See Richard/Kelly for time reporting.

Tasks

Task Description	Priority	Assignee	Status	Effort (Hours)	Comments
Unelease updated ANSI	1	AL	Done-ish	24	Many TODOs. No optimization for arch != ARM
Unelease lwip (see issues below)	1	MV	Done-ish
Unelease gem driver (see issues below)	3	MV	Pending		must compile from head
Unelease kernel	3	RLR	Done-ish		unreleased "from the desk of Ryan"
Unelease ftp	3	CP	Done-ish
Generate customer letter	1	TBD	Pending

ANSI: memcpy() and memset() very slow for misaligned or uncached memory.
Debug variants of lwip, and gem drivers compiled with -O0
gem descriptor resource cache mode to writeThru
gem turn off tx interrupts
kernel debug variant very slow (presumably due to DataMemberTemplate)
kernel event signaling
lwip
1. lwipopts window size to 44 times MSS
2. TCP send buffer is 32 times MMS may want to at least 44 to match window size.
FTP send buffer output cache
On ARM reads of device memory slower than uncached normal memory (using writeThru as surrogate since current customer's ARM processors don't implement writeThru and fall back to uncached normal memory semantics.
PCR:16193 Kernel was being conservative computing slack causing idle time at the end of a period.

Possible things to investigate

Checksum offload. Determine how much time spent there now.
1. Ref: https://lists.nongnu.org/archive/html/lwip-users/2008-02/msg00022.html
2. Ref: https://docs.amd.com/r/en-US/ug1087-zynq-ultrascale-registers/dma_config-GEM-Register
Minimize semaphore overhead:
1. Alloc multiple pbufs per semaphore lock?
2. Use fast path via atomics?
3. Modify alloc algorithm to be lockless?
Get gprof working
Continue to add logSystemEvent() calls.
1. Receive path?
Try to modify gem driver to use cached descriptor memory.
Turn off lwip software receive checksum verification: CHECKSUM_CHECK_TCP

put libansi.so from this chat.
Put release variants of lwip, kernel, and gem driver.
Change cacheMode:
lwip.pia.xml: NetworkBuffers stays off (this is not a change, but it is curious that off works better than writeThru)

xilinx-gem.pia.xml: GigabitEthDescriptorMemory0 (and 1, 2, and 3) to writeThru
Change Scheduling Priority of ISR thread to zero.

The wireshark graph should show ~230Mbps