THE STUNNING IT systems debacle at the Royal Bank of Scotland (RBS) and Natwest this past week suggests that serious IT management problems go all the way to the top at those banks.
The datacentre blunder and ensuing financial disruptions throughout the UK reportedly were caused by a failed software upgrade to the CA-7 batch job scheduling subsystem at the Natwest datacentre, and RBS has said that it is considering suing the software vendor, CA Technologies.
We can't know at this point, of course, but that threat of litigation by RBS likely will turn out to have been just arse-covering bluster by RBS executives desperate to say anything to deflect bad publicity. A major IBM mainframe systems software subsystem like CA-7 isn't a shrinkwrapped PC application, and no IT director in his right mind would ever consider just throwing it into production use without testing it first in a parallel test environment.
No, even if the CA-7 update somehow interacted with the complex RBS and Natwest IT systems configurations and requirements to cause this cascading disaster, IT management certainly had the responsibility to test the software and ensure that they had a well defined, methodical - and completely recoverable - upgrade procedure in place before installing it into the banks' production processing environments.
In a former life long ago I was an IBM mainframe systems programmer, part of a priesthood that IBM fostered to manage its customers' large datacentres. Subsequently I had a career as an IT management consultant and auditor in which I advised IBM mainframe datacentre directors and the CIOs of Fortune 500 companies both in the US and internationally, including several banks. Therefore I have some opinions about this IT systems 'glitch' based on experience.
First, it is absolutely inexcusable that this happened. Even assuming that the CA-7 software was flawed, RBS and Natwest IT management should have tested it and the procedure to upgrade the subsystem thoroughly before scheduling the change. That procedure should have included a full backup of the affected mainframe systems' software libraries as well as a fully detailed recovery process to fall back, so that the banks' production systems could be restored in the unfortunate event that the CA-7 update failed for any reason. This is an absolute certainty.
However, for whatever reason what should have been done wasn't done and millions of people and businesses across the UK have suffered the financial consequences of a complete breakdown in posting payments to their bank accounts at Natwest. The damage has been truly horrifying, and that makes this IT 'glitch' not merely a technical problem but an IT governance issue.
It's a truism that disasters are seldom the result of just a single failure, but usually stem from the cumulative effects of multiple lapses, oversights, mistakes and breakdowns that built up over time. That's likely what happened at Natwest, and without specific information it would be irresponsible to speculate about what occurred and why at this point.
It's up to the bank's senior IT and executive management to investigate how this problem occurred and unfolded, understand all of the underlying systemic IT management problems and particular root causes of this information systems disaster, and make necessary IT management changes.
One imagines that Parliament might provide RBS and Natwest executives with some motivation to do so in the coming months. µ