What if a user claims he hasn't received any new mail for five days, even though everything seems to be working according to his email client, and nobody's getting any bounced messages?
What if you go to investigate the problem on your OS X 10.4 Tiger mail server, and discover an error like this in the mail server mail access log?:
May 10 15:12:29 mail pop3: IOERROR: user.andrew zero index record 22313/23301
What does that mean? It means that you've got a seriously corrupt database in the Cyrus email program for one of your users.
How corrupt are we talking here?
So corrupt that issuing a "rebuild mailbox" command from the GUI Server Admin Tool claims success, but doesn't actually work.
So corrupt that logging in to the "cyradm" command-line admin tool on the server itself, and issuing a rebuild command, results in this error:
localhost> reconstruct -r 'Other Users/andrew'
reconstruct: Operating System Error
That's bad. That could possibly mean a bad block on the hard drive, since OS X is a journaled file system and even a power outage would theoretically leave the filesystem able to rename an old corrupt email database (which is where my particular instance of this problem occurred).
So things are obviously bad.
What if things are so badly corrupted that you can't even get access to the user's last five days of email? You want to delete the corrupted mailbox permanently, and recreate it, but first you need to get access to that email somehow. What do you do?
Well, first off, you can't just delete the user entirely using the admin tool. As of the time of this writing, OS X Server 10.4.6 won't actually delete the corrupt mailbox files, it will just stop using them for the time that the username is missing. They'll pop right back up again and prevent Cyrus from making a proper mailbox when you attempt to recreate the user with the same "short name" as before.
Instead, you should lave the user intact, and only delete the corrupted mailbox, through the Cyrus command-line admin tool.
But what to do about those emails?
Here's what you need to do:
Open your mail client and create a dummy POP account to the user's account. Get their password from them, enter it, and set the account to "leave messages on the server indefinitely". Disable all your automatic mail-routing rules. Then manually check email for that account, and watch as (probably) many many messages stream onto your machine. These are all messages that the poor user has locally, but due to corruption, they were never removed from the server like they should have been.
Note that there is eventually a beep, and observe that the list of emails stops short of about five days from today. There's five days of email rendered inaccessible by evil, evil corruption.
How are you gonna get that email?
Open up a terminal and connect to the mail server through "ssh". Become root, and go to the Cyrus mailbox folder for the troubled user, like so:
Then, since there's probably 23,000 individual files in that folder, get a complete listing and pipe it in place to a text file, so you can look at the text file instead of paging through sluggish re-issuances of the listing command.
ls -al > list.txt
Look at the beginning of the list with "more list.txt", and the end of the list with "tail -n 500 list.txt" where 500 can be any number of lines you want. You'll see a whole lot of numbered files like "31923." and "1000." The thing you're looking for is datestamps. What you want to do is locate the portion of this fileset that was created after the date of the last available email in your dummy POP inbox. The beginning of that fileset, the first file in that list, represents the first email you need to preserve.
Let's say for the sake of argument that the first file is named "39201.", and the last file in the listing is "39650." The Cyrus database may be corrupt, but these emails in their individual files probably aren't. Since we're deleting the whole mailbox anyway, how about if we futz with these data files in order to get around the corruption in the database?
Here's what we're gonna do: We're gonna find the first email file in the list (probably something like "10000.") We're gonna delete that file, and make a copy of one of our inaccessible files, and give this copy the name of the file we just deleted. That way, when we connect to Cyrus with POP and ask for all the emails, the ones we want to recover will be sent along with the ones our troubles user has already received back on his own machine. We'll be able to get those emails off the server, keeping them safe before we nuke the account.
But ... issuing around a thousand "delete" and "copy" commands in the shell is a horrible, tedious process. We need to automate it. How about if we create a shell script that will make the move for us? Sounds groovy.
I'm a Perl fiend at heart, so I wrote a script in Perl that, when run, will generate the appropriate shell script:
$move_script = "process_script.sh";
$first_email_to_save = 1000;
$last_email_to_save = 39201;
$first_email_to_overwrite = 39650;
$count = $first_email_to_save;
$base_count = $first_email_to_overwrite;
while ($count < $last_email_to_save)
$out_file .= "rm " . ($base_count) . ".\n";
$out_file .= "cp " . $count . ". " . ($base_count) . ".\n";
if (open(OUTFILE, ">" . $move_script ))
print OUTFILE $move_script;
If you replace the relevant variables, and save that script to a file like "make_script.pl" and run it with "perl make_script.pl", you'll get the script you need to make your move. Get the script onto the server, if it isn't there already, and issue a command to make it executable:
chmod o+x process_script.sh
Then go to your hapless user's email repository, with a
and run the script from there with a
Wait a while, watching the command line output. If you get messages saying that the the "cp" command failed on a particular filename (which is unlikely, but possible), it means that there will be a gap in your moved messages. Find that source filename in your generated shell script, note the file it was meant to replace (which has already been deleted at this point), and manually issue a "cp" command to copy some other email message (like the previous one in the script, for example) onto that replacement filename. Now you've eliminated the gap. You might need to do this once for every thousand or so emails.
Your script may have created these files as root, and possibly with bad permissions. To fix this, issue the following:
chown cyrusimap *
chmod o+rw *
Now, go back to your GUI email client, go to your account preferences, and DELETE that dummy account. RECREATE it, so it's never contacted the Cyrus server, and tell it to fetch new mail.
The missing emails should spill into the inbox, along with many others you don't need. Sort by date, delete the chaff, and you have the missing emails, safe and sound. Note that if any of them were flagged as spam, they were probably moved to your junk mail folder, even if you disabled all your automatic routing rules. This probably doesn't matter.
Go to the preferences for this account, and type garbage all over the login credentials for the user. You have just isolated that dummy POP account from the corrupt mailbox on the server. Well done!
Now how about we delete this evil mailbox!
First, you need to locate the admin user through which you'll be doing all your admin-ing. In our case, that's user "lim". You need to edit the Cyrus config file so that your admin user is added to the list of Cyrus admins. Become root, and open the config file in your favorite editor:
Then look for a line starting with "admins:", and add your admin shortname to that list, preceded by a space to separate it from any other names there may be (usually name "cyrus" is already present).
Write out that file, and then go to your Workgroup Manager GUI admin tool. Change the mailbox status of that admin user so that mail is enabled, and serving IMAP at least, if you haven't enabled it already. Then go to your Server Admin GUI tool and reset the mail service on the machine.
Back at your prompt, connect to the cyradm tool like so:
/usr/bin/cyrus/admin/cyradm -user lim localhost
Then issue a series of commands like this one to first delete, then recreate the corrupted mailbox. (The first command grants lots of rights in that mailbox for the admin user, which is necessary to nuke the mailbox. Just 'c' would probably suffice, but it doesn't matter ... the mailbox is going away immediately after all.)
setaclmailbox 'Other Users/andrew' lim lrswipcda
deletemailbox 'Other Users/andrew'
createmailbox 'Other Users/andrew'
Check the privileges on your user like so:
listacl 'Other Users/andrew'
Then exit the cyradm tool with an 'exit' command. At this point, you can end your ssh session on the server.
Go to your Workgroup Admin GUI tool, and verify that the troubled user has both IMAP and POP enabled for their email service. If it isn't set for both, set it now. Then reset the email service in the Admin Tool (you should be a pro at that by now).
Here's another tricky part. Ready?
Go back to the same GUI email program that you have the dummy POP inbox stored in. Create another dummy mailbox, with valid login credentials for the user, but make it an IMAP mailbox. Save your settings and check the inbox. It should be nice and empty now, but you may see some fresh spam has already accumulated in it since you issued the 'createmailbox' command just a few minutes ago. Lawks!
Now go to the old POP dummy, select all the mail in the inbox, and drag it into the inbox for the IMAP account. Watch as your email client puts a thousand emails back up on the server.
Once you're sure it's finished, go into the preferences and type garbage over the login credentials to make sure the connection's broken. Quit and relaunch your email client just to be safe, then delete both dummy accounts (and their messages) from your email client, cleaning up the transitory mess.
Congratulations, you have successfully restored a horribly mangled email box, and not lost a single email in the process. Your next step is to REPLACE THE DRIVE IN THAT MACHINE, because it's probably got something HIDEOUSLY WRONG WITH IT.