Hmmm. I’d like to clarify my comments a bit here and respond to yours. First, SIEMs – if used correctly – have every capability required to make that guarantee. Rather, what’s lacking is the correlation architect’s (and even the vendors) clear understanding of what it is they’re building.

When you create SELECT rules without some sort of (at least) simple ontology, you can’t predict that you’re doing a complete SELECT. This is a lot like IDS signatures looking for a list of known bad stuff – it’ll never be remotely complete.

However, if you define your inclusive as well as exclusive processing, you can in fact have something smart done to all events. The way Ive built it in the past has been to create multiple correlation paths: Known Behavior, Statistical Core Processors, and Behavior Based Auto Classing.

Known Behavior rules are self evident.

In Statistical Core Processors, you create rule-series which perform one single statistical transformation per series (kurtosis, increase over average, etc). You must remember, though, to derive your measurements and environment boundaries from known variables, so you’ll need some sort of automated system to re-baseline those numbers as part of the preprocessing and feed those numbers into the system.

The statistical stream should be split into two by source: Events unprocessed in other correlation streams, Events processed by at least one of the other two streams. The split can take the form of tuple tagging (/correlated/knownbadtype/statisticalchange, for example).

When we say Statistical Core Processors here, we mean that only a basic, simple transformation is made so that later in the stream, once events have all been accounted for, you can combine the outputs of these simple transformations into more complex ones.

The third stream, Behavior Based Auto-Classing deals with the failure of vendors to properly label information (like whats SPAM, in this example). Classing (labeling) information should best be done based on its behavior vs some hacked on system by utilizing any known properties possible (not all scenarios have known properties…in those cases, the events will at least fall into the statistical processors). Example: Email ends up in an inbox that has never been used before is spam, by definition. IDS events triggered by this mail can be classed as “SPAM” events whether or not the vendor labeled the events as such: Theyre either directly related to spam or else are generic enough not to be useful in differentiating between one actual activity and another. So, an automatically generated list of events associated with SPAM is sent to ArcSight or another SIEM, and the SIEM then classes/groups/tags those events as SPAM…and that tagging can be criteria for other rules, reprioritized, displayed, or filter ed out.

In conjunction, these three correlation streams make sure that no events fall out, you have created definable, repeatable criteria for every event, and you have known properties and facts about every single event that goes through your SIEM.

Interestingly, this also means you have created a system that will automatically place objects into predefined ontological classes. Huh. Thats really cool. Ontologies allow us to automatically create knowledge out of data.

And I guess that’s my comment on your comment: SIEMS can be used for more than selective real-time analytics. I’ve used complex time/day of week algorithms to bucketize my events to create more accurate pictures of the state of the network over time in the SIEM – which is useful for in-depth analysis. I used a number of tools (visualizations included) to look for times of day/days of week where traffic patterns were typically the same and most statistical measurements were automatically derived by comparing only similar temporal buckets to each other. That helped get rid of the start-of-business-day and its-the-weekend effects on the values.

In closing, this need for better classification is universal and is a problem facing us in the modern world in any system designed to help or requiring humans to make decisions.