A note about general concepts of high performance: both throughput and response time.
The number of concurrent connections matters because better user experience requires long-lived connection to each user via web socket.
- Pinned context switch: 1.2 to 1.5 microsecond
- Unpinned context switch: 2.2 microsecond
- Thread launch: 5 microsecond
- Process launch: 22 microsecond
- Go routine: 170ns
The default per-thread stack size on Linux is 8MB virtual memory and can be adjusted. One cpu with enough memory can run 10,000 threads.
The Linux I/O evloved in the following phases:
write. The calling threads sleep till an operation is completed.
epoll. But they only work for socket and pipe, not for files. Storage I/O are handled by a thread pool. Modern I/O devices have a latency of single-digit microsecond - the same order of magnitude of a context switch.
AIOinterface allows file IO. It works with un-cached file access. It is buggy.
io_uringin Linux 5.0 implements the async interface for all kinds of I/O. It uses poll-based, not interruption-based, for fast devices.
io_uring an application becomes an event-loop that constantly add things to a shared buffer, deals with the previous entries that completed, rinse, repeat.