A note about general concepts of high performance: both throughput and response time.

Background

The number of concurrent connections matters because better user experience requires long-lived connection to each user via web socket.

The Context Switching and Overheads blog

  • Pinned context switch: 1.2 to 1.5 microsecond
  • Unpinned context switch: 2.2 microsecond
  • Thread launch: 5 microsecond
  • Process launch: 22 microsecond
  • Go routine: 170ns

The default per-thread stack size on Linux is 8MB virtual memory and can be adjusted. One cpu with enough memory can run 10,000 threads.

This Linux async programming blog and io-uring revolutionizes Linux programming blog summarize the async features in Linux.

The Linux I/O evloved in the following phases:

  • blocking read, write. The calling threads sleep till an operation is completed.
  • non-blocking select, poll and epoll. But they only work for socket and pipe, not for files. Storage I/O are handled by a thread pool. Modern I/O devices have a latency of single-digit microsecond - the same order of magnitude of a context switch.
  • Recent AIO interface allows file IO. It works with un-cached file access. It is buggy.
  • io_uring in Linux 5.0 implements the async interface for all kinds of I/O. It uses poll-based, not interruption-based, for fast devices.

With io_uring an application becomes an event-loop that constantly add things to a shared buffer, deals with the previous entries that completed, rinse, repeat.