Music media streamer serves audio bytes to numerous users of https://ok.ru/music. Peak traffic reaches 100 Gbps through hundreds of thousands of connections and the first byte of any response is sent in less than 100 ms. The previous generation of the streamer was based on a file storage + Apache Tomcat. It was deployed on a huge cluster of nodes and was not able to utilize modern hardware. When developing the next generation of the streamer we aimed at shrinking the machine cluster and providing strong scalability and fault-tolerance.
We will explain how the architecture of the service provides scalability and fault-tolerance through audio track distribution and replication. Then we will look into the design of a service node, based on the reactive streams approach, it's network and storage subsystems. We will explain in details several gotchas we faced and solutions which helped us improve the system performance and simplify debugging and maintainability.
The talk targets software developers willing to discover various approaches and instruments for the development of distributed and/or highload I/O intensive systems.