Re: [BUG] corruption when reopening mailbox
> > The bug seems to be in mutt_reopen_mailbox() in mbox.c. That
> > function does not actually reopen the mailbox (in version
> > 1.5.6i), but only does an fseek() to the beginning of the file.
>
> Two questions here: Is NFS involved, and are you sure that locking
> works on your machines?
NFS is not involved here. I'm not sure about locking - we're
using Debian Sarge and running an almost stock debian 2.6.8-2
kernel.
The problem came to light with two people editing the same
mailbox "at the same time", but we've put together a set of
steps that recreate the problem in a serial fashion -- the
mailbox is not accessed on disk at the same time by the two
processes. I think the locking of the mailbox is only done
when the mailbox is actually read/written, right?
Here is a rough sequence of steps we've used to recreate
the problem. (I say "rough", because it's not capturing
something exactly right -- I couldn't recreate the problem
when I was first given the steps, until I was given a
demonstration, but I don't think I was doing anything differently later.)
1. send 8 empty emails to an account, call it "muttbug".
2. log directly into the muttbug account. Call this shell "window 1".
3. log into a different account on the same machine.
Then "su - root" followed by "su - muttbug".
Call this shell "window 2".
4. Start mutt in window 1.
5. Start mutt in window 2.
6. Go to the very last message in window 1, then start to reply to
the message. When the editor comes up, do nothing.
7. In window 2, delete the second to last message, then quit mutt.
8. In window 1, complete the reply and send the message. Then quit
mutt.
At this point, you can examine the mailbox and find that
the file has been corrupted.
I don't remember all the details because I worked on this last December,
but my memory is that the mutt code which gets file positions for
each of the messages was "reading" the old data from the file,
and not the updated contents of the disk file. Which is weird,
unless the window 1 process is caching the old data somehow.
That's how I tracked it down to a need for freopen(), versus fseek()
in that function call. (I checked that the disk was updated after
step 7, by cat'ing the file.)
Thanks for looking at this!
-Bo Adler
thumper@xxxxxxxxxxxxxxxx