The Inquirer-Home

TCO study has lessons for Unix and Linux

Downtime, lost time, spare time
Sun Feb 22 2004, 09:18
IN EARLY DECEMBER 2003 we reported on a Techwise Research study into the total cost of ownership (TCO) of various configurations of clusters. Unlike many other reports about TCO, this report took into account the cost of downtime and examined the major reasons for that lost time.

The earlier report to which we referred was somewhat dated but just a few weeks ago Techwise Research updated their report after interviews with 94 US companies who used either HP's OpenVMS on Alphaservers, IBM's AIX on p-series machines or Sun's Solaris on Sun Fire clusters. Their study looked at clusters of machines with up to 16 processors and that had been in production use for more than 6 months and the representation by manufacturers was almost equal.

The new report is quite detailed in its discussion of the TCO of clusters of different processing powers but our interest here lies only with general findings and with the average downtime for each of the three types of cluster that are reported on.

One thing that the report makes very clear is that the vast majority of the TCO is due to costs associated with management and downtime, with the cost of downtime alone for more powerful clusters being more than 50% of the TCO. The costs associated with the purchase, installation and configuration of these clusters and any training of support staff are relatively minor by comparison.

The report goes on to say that the Sun systems provide lowest three-year TCO when downtime costs are negligible principally because they have a lower list price. Unfortunately for Sun those Solaris systems have a greater average annual downtime and at a downtime cost of just $1585 per hour the Solaris systems lose their TCO advantage and HP OpenVMS systems take over.

At a downtime cost of $10,000 per hour the three-year TCO of the HP system had average savings of $269,000 over the IBM systems and $645,000 over the Sun systems.

At a downtime cost of $25,000 per hour the additional average costs of the Sun system had increased significantly. The three-year TCO of the HP systems averaged out at $838,000 better than the IBM systems and $1.5 million better than the Sun systems. Of all the clusters in the report 71% had downtime costs of less than $25,000 per hour but 10% had downtime costs of $250,000 or more and total downtime costs could be very significant.

Earlier reports from Techwise on this subject have been found contentious by some IT professionals who questioned the equality of the systems being compared and the downtime costs being adopted.

Even when we ignore the specific hardware and the costs of downtime there are still some valuable lessons to be learned and some interesting implications for the use of Unix, and by extension, of Linux.

In particular the report breaks the causes of downtime into several categories - planned and unscheduled downtime due to hardware issues, storage array failure, operating system or cluster software failure, software virus or worm, end-user application causing the downtime and system management application causing the downtime - and presents the average downtime in each category for each set of clusters.

The Sun systems averaged 5.14 hours of downtime due to unscheduled hardware outage over 12 months compared to IBM's 2.73 hours. They also averaged 3.98 hours of scheduled hardware outage compared to 0.45 from HP. The total average hardware related downtime was 3.37 hours, 3.68 hours and 9.12 hours respectively for the HP, IBM and Sun systems.

To what extent the downtime from these hardware related causes is a reflection of the age of the hardware is open to some question but one would assume that even if it was just a few years old the hardware would not be greatly inferior to recent releases of comparable configurations.

Leaving that point aside with the faintest of question marks against it, let us look at the other causes of downtime and what they might tell us. Firstly there was downtime due to problems with the associated storage array and in that category the IBM/AIX clusters lost an average of just 0.3 hours per year compared to HP/OpenVMS clusters which averaged 0.8 hours and Sun/Solaris at 1.4 hours.

The four remaining categories of the cause of downtime are related to software and the vulnerability of the machine to viruses and worms.

Excluding all hardware related matters, the Sun Solaris systems averaged a total of 19.1 hours downtime per year compared to the IBM AIX systems at 13.2 hours and the HP OpenVMS systems at 4.0 hours. In other words, the average downtime for the IBM systems was 3.3 times greater than the HP systems, but the average downtime for Sun systems was 4.5 times greater.

In particular the Solaris systems averaged 2.45 hours downtime due to operating system or cluster software problems compared to the OpenVMS systems with 0.92 hours, 4.32 hours downtime due to software viruses or worms compared to 0.88 hours, 3.16 hours due to problems with system management applications compared to 0.78 and a massive 9.20 hours due to end-user applications compared to 1.39 hours.

The average downtime lost by the IBM systems fell somewhere between those for the HP systems and the Sun systems for three of these four categories but on average they suffered almost 6 hours due to viruses and worms.

Sun Microsystems often assert that Solaris is regarded as the most secure Unix operating system but if that statement is true then the findings of this report should be a cause for concern in the broader Unix and Linux communities.

The loss of 2.45 hours due to operating system and cluster software problems the Sun clusters was more than 2.5 times as much as OpenVMS. System management applications are 4 times more likely to cause downtime on Solaris than on OpenVMS. End-user applications accounted for more than 6 times the amount of downtime due to this cause on OpenVMS systems and what's more it this category accounted for almost 45% of software-related downtime on Solaris.

The Solaris systems also lost almost five times the number of hours than the OpenVMS systems when it came to viruses and worms. Some people would argue that HP's OpenVMS system is less susceptible to attack because fewer people are familiar with it but with more than 400,000 OpenVMS systems around the world at various times there is no doubt that there are sufficient numbers of knowledgeable users who might be tempted to try to break its security.

Sun's assertions about the security of Solaris should be quite worrying if they true and not marketing-speak. If Solaris is indeed a highly secure operating system then what of those similar systems - Unix and Linux - that are less secure than Solaris?

The exact nature of these failings with the Sun/Solaris systems is not clear but there should be some concern as to whether similar problems exist in other forms of Unix or in Linux since they all share similar system architectures.

It is hard to escape the notion that there must be fundamental flaws in Solaris that increase its vulnerability to software problems. By extension then, we need to question if similar flaws exist in other variants of Unix.

It has been argued previously in The INQUIRER that an operating system should not assume that software developers can be trusted to write secure, bug-free and non-malicious code and that the architecture of an OS should properly trap software problems. Surely it is time that Sun took a hard look at Solaris with these thoughts in mind and attempted to reduce that 19.1 hours of software related downtime, particularly the vulnerability to end-user application problems, to a more competitive figure.

Many commercial organisations have been wary about the adoption of Linux for critical business operations and with these kinds of figures for unscheduled downtime on supposedly one of the more secure forms of Unix it is not difficult to see why.

This Techwise Research report has made clear that the cost of downtime caused by underlying hardware and software vulnerabilities can very easily negate any price advantage that one operating system may have over another.

Linux developers really need to address these kinds of vulnerability issues if they want their operating system to be accepted at the top end of business and from this report it would appear that Solaris is not a role model they should be choosing.

L'INQ
Techwise Report

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Facebook starts selling shares

Will you buy Facebook shares?