Click here to print

BSD UNIX finally fixes a 25-year old software bug

12 May 2008 | 08:44 BST

By Egan Orion

Lurking in there forever

A DEVELOPER of OpenBSD has squashed a software bug that's been lurking in all flavours of Berkeley UNIX systems for nigh on all of these last 25 years.

Marc Balmer got an email complaining that Samba crashed while serving an MS-DOS filesystem. Since he's a heavy Samba user, he investigated the issue.

Samba developers told him that the code to read file directories was flawed in all versions of BSD. He didn't believe them initially, but comments on a snippet of workaround code in Samba convinced him to look further.

"This is needed because the existing directory handling in FreeBSD and OpenBSD (and possibly NetBSD) doesn't correctly handle unlink() on files in a directory where telldir() has been used. On a block boundary it will occasionally miss a file when seekdir() is used to return to a position previously recorded with telldir()."

Convinced, Balmer coded a diagnostic program to confirm the problem. After playing with test parameters for a while, he found that he could reproduce the bug consistently:

"Suddenly, I had a case that shows the problem on every run, no more randomness: Create 28 files, delete file 25 and seekdir to file 26: You end up at file 27!"

He immediately saw the cause of the problem:

"Creating the directory with 28 files had created a directory that spans more than one block on the disk (2 in this case). File 25 was the first entry of the second block. Obviously the problem occured when you delete the first entry in a block of a directory and then return to the recorded position of the second entry in the same block. This would actually get you one entry [too] far."

The problem occured because the code that deleted a directory entry set its inode number to zero, but the code that read the directory skipped any entry having an inode number of zero. Balmer explained:

"This code will not work as expected when seeking to the second entry of a block where the first has been deleted: seekdir() calls readdir() which happily skips the first entry (it has inode set to zero), and advance to the second entry. When the user now calls readdir() to read the directory entry to which he just seekdir()ed, he does not get the second entry but the third."

The solution was to prevent the readdir() function from skipping any directory entries that have an inode of zero. As Balmer put it:

"The fix is surprisingly simple, not to say trivial: _readdir_unlocked() must not skip directory entries with inode set to zero when it is called from __seekdir()."

He found this bug in all other versions of BSD and BSD derivatives -- such as Mac OS/X -- that he checked. He found it in 4.4BSD Lite 2 and a helpful Samba developer also found it in 4.2BSD, which says the bug was about 25 years old.

Kirk McKusick, one of the original cadre of Berkeley UNIX developers, emailed Balmer privately:"As the original author of the *dir() library, you probably fixed one of my bugs. :-)" µ

See Also
The INQ takes a dip into open sauce

L'Inq
Marc Balmer's blog

© 2007 Incisive Media Investments Ltd. 2007

Click here to print

Close the window