The Inquirer-Home

BSD UNIX finally fixes a 25-year old software bug

Lurking in there forever
Mon May 12 2008, 09:44

A DEVELOPER of OpenBSD has squashed a software bug that's been lurking in all flavours of Berkeley UNIX systems for nigh on all of these last 25 years.

Marc Balmer got an email complaining that Samba crashed while serving an MS-DOS filesystem. Since he's a heavy Samba user, he investigated the issue.

Samba developers told him that the code to read file directories was flawed in all versions of BSD. He didn't believe them initially, but comments on a snippet of workaround code in Samba convinced him to look further.

"This is needed because the existing directory handling in FreeBSD and OpenBSD (and possibly NetBSD) doesn't correctly handle unlink() on files in a directory where telldir() has been used. On a block boundary it will occasionally miss a file when seekdir() is used to return to a position previously recorded with telldir()."

Convinced, Balmer coded a diagnostic program to confirm the problem. After playing with test parameters for a while, he found that he could reproduce the bug consistently:

"Suddenly, I had a case that shows the problem on every run, no more randomness: Create 28 files, delete file 25 and seekdir to file 26: You end up at file 27!"

He immediately saw the cause of the problem:

"Creating the directory with 28 files had created a directory that spans more than one block on the disk (2 in this case). File 25 was the first entry of the second block. Obviously the problem occured when you delete the first entry in a block of a directory and then return to the recorded position of the second entry in the same block. This would actually get you one entry [too] far."

The problem occured because the code that deleted a directory entry set its inode number to zero, but the code that read the directory skipped any entry having an inode number of zero. Balmer explained:

"This code will not work as expected when seeking to the second entry of a block where the first has been deleted: seekdir() calls readdir() which happily skips the first entry (it has inode set to zero), and advance to the second entry. When the user now calls readdir() to read the directory entry to which he just seekdir()ed, he does not get the second entry but the third."

The solution was to prevent the readdir() function from skipping any directory entries that have an inode of zero. As Balmer put it:

"The fix is surprisingly simple, not to say trivial: _readdir_unlocked() must not skip directory entries with inode set to zero when it is called from __seekdir()."

He found this bug in all other versions of BSD and BSD derivatives -- such as Mac OS/X -- that he checked. He found it in 4.4BSD Lite 2 and a helpful Samba developer also found it in 4.2BSD, which says the bug was about 25 years old.

Kirk McKusick, one of the original cadre of Berkeley UNIX developers, emailed Balmer privately:"As the original author of the *dir() library, you probably fixed one of my bugs. :-)" ยต

See Also
The INQ takes a dip into open sauce

L'Inq
Marc Balmer's blog

Share this:

Comments
freaks eh

--[I think anyone who understood that explanation on the first read is a total freak,]--

You might be on the wrong website, did you mean to type aol.com?

(Does an "average IQ of 120" mean that you think 120 is the average, or that your IQ fluctuates depending on the day of the week or something? If the latter, just try reading it again later.)

posted by : Stephen Brooks, 19 May 2008 Complain about this comment
This seems very familiar

That explanation reminds of the section of the newspaper that summarizes soap opera plotlines.

I think anyone who understood that explanation on the first read is a total freak, I have an average IQ of 120, and it looked like a total clusterf*** to me.

posted by : Jason Goatcher, 18 May 2008 Complain about this comment
Place Blame

Yeah, yeah, you hear that it isn't productive to place blame. Well, this one clearly is an issue where if the programmer had worked the bug when it was first reported he would have solved it 25 years ago. Clearly this is indicative of the problems with programmers. They only want to code for glory and are willing to give up on some bugs to add new features.

Fix you damn bugs up front so we don't have 25 years of problems propagated to every sort of variation of your code.

I've been saying it for years and years. Not only that if this guy could figure it out 25 years later, if he could develop the tools to test it out in order to reproduce it, then 25 years ago the original author could have done the same thing.

Pure laziness, just pure laziness.

posted by : Jim B., 13 May 2008 Complain about this comment
From the horse's mouth...

Marc's comment:
http://undeadly.org/cgi?action=article&sid=20080508193255&pid=11
and this reply:
http://undeadly.org/cgi?action=article&sid=20080508193255&pid=13
outline what really went on. seekdir() has presumably been little-used, or little-trusted, by developers after being deprecated as such in POSIX.

posted by : A. Peon, 12 May 2008 Complain about this comment
Wow

That was so interesting that I almost woke up from falling asleep while reading it!

posted by : Sleepy, 12 May 2008 Complain about this comment
:D

heh that is so cool. :D

posted by : ishwa, 12 May 2008 Complain about this comment
More?

Let's hope that this creates a fierce competition in which MS also starts fixing 25 year old bugs.

posted by : W.-, 12 May 2008 Complain about this comment
aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?