Thursday, March 3, 2011

And here I was getting all happy about the uptime...

Only to have FRED go down hard for 6 hours this morning!  What happened: Every night logrotate rotates FRED's web server logs (among others) and restarts the web server process. Last night the webserver didn't come back from the restart for some reason. All it took was a simple (re)start to get FRED back up (not of the whole server, just the webserver process). It's hard to tell why it died, and I'm looking at its logs to see if I can tell, but ultimately the more important question is "how do we prevent this in the future?" Answer: I have installed SIM (http://www.rfxn.com/projects/system-integrity-monitor/), a cron initiated script that periodically checks to make sure certain services are running and healthy and (re)starts them automatically if they are not.

So hopefully this won't happen again.

No comments:

Post a Comment