SOFTWARE HOUSE Microsoft has revisited the outage that affected its Live services earlier this month and explained that it was a Domain Name Service (DNS) issue that left users locked out.
The outage that happened on 8 September saw users unable to access glorious Microsoft services like Hotmail and Skydrive. Although it was doubtless annoying for users, no data was lost and the problems were in fact just down to a software update and a corrupted DNS file.
"A tool that helps balance network traffic was being updated and the update did not work correctly. As a result, configuration settings were corrupted, which caused a service disruption," wrote Arthur de Haan, VP of Windows Live Test and Service Engineering. "We determined the cause to be a corrupted file in Microsoft's DNS service."
This file corruption was the result of two conditions, according to de Haan, who added that they are rare. The first was a problem with a malformed input string, and the second was a problem with synchronising configurations across the DNS service.
"Each of these conditions was tracked to the networking device firmware used in the Microsoft DNS service," he explained.
"After restoring service, we have identified two streams of work to drive specific service improvements around monitoring, problem identification, and recovery. Along with these service improvements, Microsoft is focused on further hardening the DNS service to improve its overall redundancy and fail-over capability." µ