Kinghall lwip performance
From DDCIDeos
Jump to navigationJump to search
Project to improve the performance of LwIP for Kinghall
Description
Lwip is too slow, customer seeing 3MB/s on ftp get from target. Linux on same configuration is getting ~ 125MB/s, customer wants same on Deos.
Current results are 38.5Mbps (approx 13x faster than orig), still ~3x slower than theoretical max.
This is being worked partly as a BOT. See Richard/Kelly for time reporting.
There is an extensive chat history: https://teams.microsoft.com/l/message/19:fccc132d3593491598a081eaae7a506e@thread.v2/1735937346991?context=%7B%22contextType%22%3A%22chat%22%7D
Tasks
| Task Description | Priority | Assignee | Status | Effort (Hours) | Comments |
|---|---|---|---|---|---|
| Unelease updated ANSI | 1 | AL | Done-ish | 24 | Many TODOs. No optimization for arch != ARM |
| Unelease lwip (see issues below) | 1 | MV | Done-ish | ||
| Unelease gem driver (see issues below) | 3 | MV | Pending | must compile from head | |
| Unelease kernel | 3 | RLR | Done-ish | unreleased "from the desk of Ryan" | |
| Unelease ftp | 3 | CP | Done-ish | ||
| Generate customer letter | 1 | TBD | Pending |
Issues Identified
- ANSI: memcpy() and memset() very slow for misaligned or uncached memory.
- Debug variants of lwip, and gem drivers compiled with -O0
- gem descriptor resource cache mode to writeThru
- gem turn off tx interrupts
- kernel debug variant very slow (presumably due to DataMemberTemplate)
- kernel event signaling
- lwip
- lwipopts window size to 44 times MSS
- TCP send buffer is 32 times MMS may want to at least 44 to match window size.
- FTP send buffer output cache
- On ARM reads of device memory slower than uncached normal memory (using writeThru as surrogate since current customer's ARM processors don't implement writeThru and fall back to uncached normal memory semantics.
- PCR:16193 Kernel was being conservative computing slack causing idle time at the end of a period.
Possible things to investigate
- Checksum offload. Determine how much time spent there now.
- Minimize semaphore overhead:
- Alloc multiple pbufs per semaphore lock?
- Use fast path via atomics?
- Modify alloc algorithm to be lockless?
- Get gprof working
- Continue to add logSystemEvent() calls.
- Receive path?
- Try to modify gem driver to use cached descriptor memory.
- Turn off lwip software receive checksum verification: CHECKSUM_CHECK_TCP
To reproduce current best case results
Configuration
- put libansi.so from this chat.
- Put release variants of lwip, kernel, and gem driver.
- Change cacheMode:
- lwip.pia.xml: NetworkBuffers stays off (this is not a change, but it is curious that off works better than writeThru)
- xilinx-gem.pia.xml: GigabitEthDescriptorMemory0 (and 1, 2, and 3) to writeThru
- Change Scheduling Priority of ISR thread to zero.
To Run The Experiment
- On tfhost associated with target:
- ftp to target and:
- get /dev/zero /dev/null
- Start wireshark
- set filter of ip.addr to target's IP.
- Statistics/ I don't recall the steps here.
- ftp to target and:
The wireshark graph should show ~230Mbps