The site was inaccessible from March 22nd at 11pm until March 23rd at 11am at which point I had the server power cycled. It looks like this is the scenario. Yesterday it appeared to me (falsely) that there was something wrong with apache/cyrus in that when I would try to access my email over the web, the httpd process would go to 100% utilization. What was actually going on is that squirrelmail was seeing my browsers cookie and was trying to complete the task I’d been working on when everything got hung up (some email sort in a folder with thousands of emails). Jack told me about this phenomenon. Before talking to Jack, from my perspective with the poisoned cookie, I’d restarted the server for fear that everyone was unable to get to their email. This was the first restart in about 2 months (a level of stability that’s making me happy). During that 2 months I’d updated the Linux kernel from 2.6.10-1.10_FC2smp to 2.6.10-1.12_FC2smp. This didn’t take until the restart because it’s a kernel. I’m assuming this kernel change (since that is the only thing that happened yesterday) is what caused the outage last night. I’ve since rolled back to the previous kernel. This rollback caused two more restarts this morning at Mar 23 11:22:11 and 11:37:29. The second reboot was because I’m a little dumb sometimes.
So to summarize the cause of the outage was an unneeded reboot which enabled a new unneeded kernel. These were both my fault, won’t let it happen again.