Cloud availability -- is 99.999% uptime a myth or reality?
Varying response times of major cloud computing services in the US
As "five nines (99.999%) availability" becomes a quasi tagline of cloud computing services,
CloudSleuth, a free web-based cloud performance monitoring platform, suggests that the five nines do not naturally fall into place.
On 14 October 2010 16:58 (GMT+8) in CloudSleuth, the recorded response times of five major cloud computing services within the last 30 days were:
Microsoft Windows Azure (6.731 sec),
Google App Engine (6.843 sec),
OpSource (7.022 sec),
Amazon EC2 (US East Coast - North Virginia) (7.176 sec), and
Rackspace (7.484 sec). (see figure 1 below)
Figure 1. Response times of five cloud computing services within the last 30 days
Figure 2. Availability of five cloud computing services within the last 24 hours
The above numbers for availability and response times should not be understood as universal availability regardless of geographies, however. Take Amazon EC2 (APAC - Singapore) as an example, the availability and response time that were recorded at the same point in time were 99.82% and 21.712 seconds respectively.
How do these numbers measure up according to your expectations of your cloud services? Given the "five nines" imprinted on your cloud computing
service level agreement, how else would you
benchmark the performance of your subscribed cloud services against their alternative solutions?

|
"The rest of the world, even some West Coast monitoring stations, registered response times that are below the six second threshold."
-- Mark Hillman, Compuware |
Location matters
Availability "anytime and anywhere" has become the standard feature of any cloud computing service. But as a multinational enterprise, is your subscribed cloud service performing the same universally regardless of geographies?
In other words, is your staff in the Hong Kong or Singapore office experiencing the same availability and response times from Windows Azure or Amazon EC2 or Rackspace as their counterparts are in the US?
Mark Hillman, currently
Compuware's vice president of strategy and product line management, leads the strategy and implementation teams for the company's products and services.
He said, "From the screen grab (refer to Fig. 1), you can see quite a difference in the response time that our monitoring stations have picked up. The fastest response times are recorded along the East Coast of the US. The rest of the world, even some West Coast monitoring stations, registered response times that are below the six second threshold."
"The main reason for this is likely to be the physical location of service providers' servers. While the services they offer may be global, many of these companies have built up and maintain a large majority of their infrastructure along the East Coast of the US," he said.
"You can [also] see that across the world, the availability of the major cloud providers remains fairly consistent (refer to Fig. 2), indicating that at least over the last 24 hours, there have not been any major downtime issues recorded," Hillman added.
"When I was a boy, availability at 99.89% wouldn't be acceptable."
-- Mark Hillman, Compuware |
Launched on 19 April 2010, the current beta version of CloudSleth is a product of Compuware, a US-based applications performance solutions provider. Positioned as "a thought leadership community" of cloud computing, CloudSleth is aimed at developing "viable strategies on cloud availability, responsiveness and security -- through collective intelligence and advanced cloud performance visualization."
Impact on banks
"Most companies today talk about 99.9 something availability. When I was a boy, availability at 99.89% wouldn't be acceptable," said Hillman.
Today, an average banking application uses about 15 to 20 validation and content services to complete a whole set of transaction -- whether it is about checking bank account balances, transferring money or withdrawing money -- a lot of validation processes are taking place in the back end.
"And so, potentially for an online banking customer, either [99.98% availability] would be so slow that it could appear to the customer that the service was unavailable, or the customer would wait, wait, wait and then leave the website. So the impact is directly on the customers, spoiling customer retention efforts.
"The banking market is more regional -- they tend to spread across different geographies, and so they can probably think about such problems a little more simplistically. Obviously, some very large banks like
HSBC and
Citigroup really need to have a global footprint. Thus they would have to pay much more attention to where their customers are and what their cloud computing strategies should be," Hillman said.
0 reader's comment