launchd Vagaries

|

A few evenings ago, I was the most frustrated person on the face of the planet. I can tell you this with some assurance because I have a meter on my Dashboard that tells me so. It measures my typing pace and the frequency of command line errors and... I digress.

The root cause of my frustrations was a file named #org.clamav.freshclam.plist# (whose origins I'm still unclear on, though I assume it's a leftover emacs temporary file or something). Every time the system booted, launchd tried to load that plist file, located in the directory /System/Library/LaunchDaemons/, and the net result was that postfix wouldn't start because amavisd wouldn't start. Why amavisd wouldn't start, I still don't know, because it seems to me it shouldn't depend on freshclam to run. But I couldn't tell what was going on by looking at any of the logs because there is not enough information shown.

And therein lies the heart of the matter: launchd, though it doesn't operate quite silently, doesn't have a debug mode. Furthermore, even in normal operation, it doesn't quite give enough information to make for productive bug hunting. I thought of these problems after the frustration level quieted down the following morning when I figured out what was going on.

First, What was going on?

What I knew was that postfix wasn't launching and neither was amavisd. I also knew from these entries in the system log

Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: Incorrect argument format for option <strike>-checks (</strike>c) Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: Can't parse command line options Oct 8 19:26:03 shr-g5 com.apple.launchd[1] (org.clamav.freshclam[3087]): Exited with exit code: 1 Otc 8 19:26:03 shr-g5 com.apple.launchd[1] (org.clamav.freshclam): Throttling respawn: Will start in 10 seconds that there was a problem with the file /System/Library/LaunchDaemons/org.clamav.freshclam.plist (or so I thought). I have mentioned this problem before, as well as the fix.

Great! I know how to fix that! So, armed with the confidence similar to that elicited by carrying a howitzer into a rock/paper/scissors match, I edited the file at fault because I had indeed messed around in Server Admin and knew that SA rewrites that file and screws it up again.

So I was perplexed when I kept seeing these errors over and over and over again. Stranger still, I could do a

sudo launchctl unload org.clamav.freshclam.plist followed by a sudo launchctl load org.clamav.freshclam.plist and freshclam was happy! I could then manually launch amavisd and postfix and life was good, until I rebooted and the same frickin' problem reappeared.

And that left me extremely frustrated, even more so than usual because I had a rented Bobcat-style digger sitting out in the driveway consuming my money without being productive for me at all. Eventually, I threw in the towel and started digging, which turns out to be very therapeutic.

[The paragraph that belongs here was removed because you really don't want to hear me gripe about restoring a MacOS X Server box from a Time Machine backup and all of the problems that that process entails. I've mentioned that before. Suffice it to say, I did so, it worked, but the problem remained.]

The following morning, again, with the digger outside consuming money at an alarming rate, I gave myself until 9am to figure out the problem. After that, my users, a.k.a. family, would have to live without E-mail until the digger was returned around 5pm. Fortunately, I did a ls without my usual -la and saw the aforementioned #'d file. I guessed that it was the cause of all my troubles, moved it out of the way, and the machine booted happily with no launchd errors.

So that leads me to the second part of my treatise, namely, What would have made this experience much less frustrating?

I have three suggestions. The first is that launchd should report the path to and name of the file it's parsing so that I would have seen immediately where the suspect file is. This information is especially important as there are several locations that launchd looks for plist files.

The second is that lauchd should tell me what command it tried that caused the problem. That's just good debugging practice.

Resulting logs would look like this in Bill's World:

Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: While processing '/System/Library/LaunchDaemons/#org.clamav.freshclam.plist#' Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: Issued command '/usr/bin/freshclam (rest of options from plist file here)' Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: Incorrect argument format for option <strike>-checks (</strike>c) Oct 8 19:26:03 shr-g5 org.clamav.freshclam[3087]: ERROR: Can't parse command line options Oct 8 19:26:03 shr-g5 com.apple.launchd[1] (org.clamav.freshclam[3087]): Exited with exit code: 1

The third thing I'd like to see is more info output by launchctl list. As it is, it doesn't show any info about what's loaded except what the base name of the plist file is. No information about the issued command, no information about the path to the plist file, names of plist files which were tried and found wanting, nothing else is available that would have made this debugging process any easier. And that would be helpful, too.

launchd is a really cool new thing, but it lacks some of the polish and usability features that the more mature things have in place already. These are my suggestions which, I think, will help improve launchd and make it a cool, mature thing.

Recent Comments