Server Side IO and Bandwidth

graph.gifLast November, I wrote a little about scalability of server side services that have to interact with mobile phones. I mentioned cloud services briefly and how these can impose restrictive design practices on your application.

There’s also another issue with using cloud services for mobile. The problem is well described by a non-mobile article that was published yesterday

"Second fundamental rule of cloud deployments: All machines will be affected by the same networking and IO constraints

Amazon lets people get pretty big virtual boxes in terms of processor or memory power. However, IO and networking are a completely different issue. For in-house data grid deployments, getting a separate set of network cards and putting them on a dedicated VLAN or even their own switch is a really good idea, because of the broadcast traffic between the nodes. You can’t do that on the cloud. Putting a card with hardware TCP offloading is not an option (and broadcast is also not an option at least on Amazon, but that’s another story). So the architecture has to work around this. Bottlenecks can’t be just solved by getting better hardware. Beware of this while designing an architecture that depends on all traffic going through a single file server, database machine or load balancer. If all the traffic goes through a single point, the entire capacity of the cluster will be limited to that machine’s IO or network constraints (which is probably shared with who knows how many other virtual machines on the same physical box)."

With most mobile applications, with many phones regularly reporting status to a server, IO and network constraints can be more of an issue. The problem with the cloud is that you are sharing the server IO and network capacity with someone else. When you start asking how much available IO and bandwidth you will get, what happens when this exceeded and how to detect when this exceeded, things usually tend to get a bit vague. The bottom line is that you can’t assume, just because its the Cloud, that you have unlimited server IO and networking bandwidth.

For very busy mobile services you get more control and more certainty if you can control network cards, switches and bandwidth. However, whether you use the cloud or not, you may eventually run out of IO and bandwidth. The article I referenced, mentions the solution… "Partition, partition, partition". You need to design for scalability. What does this mean? Well, think about your service and split it by type of user, category, geography or whatever makes most sense. While not mobile specific nor cloud specific, there’s a great pdf about ebay on how they have partitioned the system in various ways over the years.