Chat Scale
February 11, 2021•372 words
Let's say you were building a chat app like Teams. What would the architecture initially look like?
First Iteration / MVP
Most likely, your architecture will have a client talking to a middle-tier API with a backing database. Given the nature of chat data, it would be reasonable to go for a NoSQL database. Unlike SQL, NoSQL lends itself to being easier to horizontally scale which will come in handy later.
The client will need some sort of 2 way channel with the server in order to send and receive messages in real-time. WebSockets or XMPP would work there.
Second Iteration / More Teams Join
Now that more teams have started using your app the load on the database is too much. The next obvious thing to do would be to shard the database. The question is how?
We could try sharding by team. The benefit would be great isolation from other teams, you would only need to make one database call to get all the data for the team, easier to reason with and clients remain simple and mostly unchanged.
Third Iteration / Hot Spots
The second iteration works great for typical scenarios but now it's time to deal with anomalies. You've onboarded a team of 10000+ users. It's getting to the point where there's too much data for a single database to handle. The next thing you could try is to shard by channels. The finer grained sharding unit allows load to be better distributed.
Now the client can expect to make many different database calls. This adds a lot of complexity as with more database calls, the more likely failure is going to occur, tail latencies may affect overall performance and multiple results have to be stitched together.
Anyways
Distributed systems are always about trade-offs, every system has different requirements and there can't be solutions to solve scale without discussing the trade-offs.
Even among chat apps, an app like Teams is much different from an app like Slack. Teams is all about vertical integration, where the users within are typically from the same company / team. Slack on the other hand is all about horizontal integration, where individuals are part of unique combinations of channels.