Thursday, February 3, 2011

Does one usually monitor the Event Log on Windows servers?

Hello there, I'm a programmer not a sysadmin but as we had a lot of trouble with our servers I thought I would be proactive and help our overworked (and learning) sysadmins.

We have 20-25 or so Windows Servers (2003 and 2008). They range from SQL Servers, Web Servers, doing batch processing, hosting internal applications etc. We do use WhatsUp as a monitor software to monitor memory, processor activity, website status etc.

But at the moment it seems like we are not monitoring the Event Log at all. I've seen that we have a lot of Errors and Warnings popping up in that Event Log and while I don't understand the impact of them all some seem potentially bad.

What is standard practice in this scenario? Does sysadmins usually go through the Event Log on each server manually monthly/weekly/daily during some service window? Do you have some aggregator software so you all servers manually check that way? Or some software that raises an alarm or email as soon as a Error/Warning shows up in a Event Log?

I've seen that WhatsUp have a plugin (that costs money) that can do this and I've also seen for example OSSEC suggested here. Is this something I should suggest, if so how important is it?

  • You may use Splunk to gather and index Windows Events.

    From Maxwell
    • 2008 has an aggregator integrated, just configure it to forward events to a central server.
    • Most professional setups will use a system that manages the servers. Microsof has one,t too - SCCM is pretty good, once configured properly, comparing well to stuff like Tivoli. Whatsup is pretty - hm - unprofessional in this area. "being up" is not "being fast".
    From TomTom
  • We're using nagios as monitoring solution and with nsclient++ we can monitor windows logs.
    Normally we use this policy about windows logs:

    • Warning = if we intercept 1 to 3 errors in the log (system and application), 1 hour-timeframe
    • Critical = if we get more than 3 errors in the log (system and application), 1 hour-timeframe

    In nagios description we show a sum of all errors and a brief description.
    If the error seems important (disk failure, ntfs failure, installation failure, and so on) then we log into the server and we check.
    A normal server could show some errors if some printers are defined and shared, but normally a healthy server doesn't have many errors in the logs

    From Pier
  • An admin who doesn't monitor the event logs (or equivalent an non-Windows systems) is not much of an admin. However, there are many different ways and means of monitoring the logs and because they are cryptic at best the monitoring is best done programatically. That doesn't remove the need for periodic random manual checks but certainly makes a large complex job manageable.

    The key to this is a program (or suite of programs) that will pares the logs and extract the "interesting" bits. e.g. Why would we normally care that Betty sent a 50 page document to the Accounts HP printer, yet the logs are chock full of such stuff. The vast majority of event log entries are of no real concern to the day to day operation but can be very helpful when trying to isolate or debug problems.

    Use the filter extract the errors and warnings and then possibly even drop those that are normal and expected on a given system. Once you get this filtered down properly you should end up with a reasonably small number of events that require further investigation. Or at least one would hope that is the result.

    cc81 : So a normal setup would be some aggregator software that gathers/monitors Events from all servers and that has rules/filters for alerting administrators/programmers via mail (or issues)?
    John Gardeniers : An admin's time is better spent doing more useful work than manually reading logs. So yes, that would (should?) be the norm. As a programmer you might be able to help out there. ;)
    cc81 : I could help but I think they want to with some software that is truly tested, it seems like everyone should have this problem. I will checkout some alternatives posted here and see what they have now and then propose a suggestion on what software and workflow can aid them in this. Thank you for your help (and everyone else that has responded too)
    John Gardeniers : @cc81, I find Perl to be particularly easy to use for this kind of thing, plus you'll find all the hard work already done. Check out the modules on CPAN.
  • Zenoss does Windows Eventlog monitoring in addition to WMI, SNMP and SNMP traps, Syslog, SSH and lots of community-added protocols. Plus it's Open Source.

    From mray

0 comments:

Post a Comment