How to scale UDP read throughput?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-09 11:28 出处：网络

Setup: Two linux (CentOS 6) servers connected back-to-back over a dedicated GigE link. Each server with 24 cores and 32GB RAM

相关专题：

Setup: Two linux (CentOS 6) servers connected back-to-back over a dedicated GigE link. Each server with 24 cores and 32GB RAM

Client: Simulator shooting UDP packets as fast as it could in one thread. Each packet size is 256 bytes. I see that the maximum throughput is ~200,000 packets/sec.

Server: Receive packets on the UDP socket in one thread and perform light-weight parsing. I see that the maximum throughput is ~200,000 packets/sec, with CPU one 1 core about at 85% utilization during processing. There is no packet loss, the receive buffer is set to 128M just in case.

Now I have 23 additional cores i would like to use, but as soon as i add one more thread for receiving data on the server side and a client thread for sending data on the client side over a dedi开发者_如何学JAVAcated socket, I see a lot of packet loss on the server side.

The server threads are completely independent of each other and do not block except for I/O. They are not contending on the socket either as each one of them sucks packets off of their own UDP sockets.

I see the following behavior:

The CPU is about 85% on each core and a process called ksoftirqd/xx using about 90% if an additional core.
I am loosing packets, the throughout drops down on each thread to about 120,000 packets/sec

Google says that ksoftirqd uses CPU to handle soft interrupts and heavy system calls (UDP read) can be one of the reason for high CPU usage for this kernel thread.

I repeated the experiment by pinning my process to a bunch of cores on the same physical socket and I see that the performance improved to 150,000 packets/sec with still some considerable packet loss.

Is there a way i can improve the scalability of my program and minimize packet loss?

First off, at 200,000 packets per second with 256 (data?) bytes per packet, once you consider UDP, IP, and Ethernet overhead, you're running at nearly half the capacity, counting bandwidth alone, of your gigabit link. At the packet-per-second rates you're pushing, many switches (for example) would fall over.

Second, you're probably being killed by IRQs. Better network cards have tunables to let you trade off fewer IRQs for increased latency (they only interrupt once every N packets). Make sure you have that enabled. (One place to check is the module parameters for your ethernet card.)

You can set the kernel receive buffer, but be aware that it won't help if the IRQs eat all your CPU.

int n = 1024 * 512; //experiment with it
if (setsockopt(socket, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n)) == -1) {
  //oops... check this
}

You should also set the maximum receive buffer to a large value so the kernel doesn't limit your setting:

sysctl -w net.core.rmem_max=<value>

If your application requires 200,000 packets per second, either your protocol is not well designed, or you're doing something else wrong.

Where do the 200k packets come from? Different hosts across the internet? Maybe you should load-balance it or something?

As derobert pointed out, 200k packets per second represents a lot of bandwidth which will be very costly on the internet, you should probably consider a protocol redesign.