Large multithreaded server applications are a staple of backend infrastructure, but present many development and performance challenges. Correctly tuning epoll to reduce tail latencies can be tricky. Minimizing excessive or expensive system calls, such as madvise, is necessary for maximum performance. Enabling security for all TCP traffic has also presented several areas for optimization.
We will look at techniques Facebook has developed or is working on to solve problems in our large backend services. The Kernel Connection Multiplexer (https://lwn.net/Articles/657999/) was developed to help tackle tail latencies via better load balancing in epoll. EPOLL_EXCLUSIVE can be used to avoid thundering herd when using epoll with multiple threads. A TLS kernel module (https://lwn.net/Articles/666509/) was developed to enable splice and sendfile support for files sent directly from disk. Restartable Sequences [3] (https://lwn.net/Articles/650333/) are being used to reduce memory caching overhead without additional locking.