Go talk to Amazon, Microsoft, CNN, HP, IBM, Oracle, United States Armed Forces, etc...or just pick up some books on "Best Practices".
This whole event demonstrates you all have a massive Single Point of Failure, the login server farm. That needs to be distributed so that if the one data center goes down, it doesn't wipe you out at an enterprise level.
If your authentication servers (login servers) were redundant, then everyone serviced out of Seattle would've experienced no problems. Those of us serviced out of San Diego would've still been unable to reach the game servers, but you wouldn't have ANY talk of some people being able to stay online and others not being able to sign on out of the Seattle data center. The only customers you'd have to worry about are those of us serviced out of San Diego.
That is for just this event. If it was only the authentication that went down and not the entire network connection to San Diego, if the authentication was distributed, you would have NO interruption in service, as in that situation, the game servers in San Diego would've still been operational and the authentication would've been handled by the redundant system in Seattle.


Reply With Quote
Bookmarks