DTD/Data Model changes
Subject: DTD/Data Model changes From: David A. Curry (davy@iss.net) Date: Wed Jan 03 2001 - 19:59:00 CET
The following is a list of changes that are being made to the IDMEF Data Model and/or the IDMEF XML DTD, or to the Internet-Draft that describes them, following the San Diego IETF/IDWG meetings. 1. The merged Data Model/XML DTD Internet-Draft is being rearranged and parts of it rewritten so that it reads in a more logical way. The tentative Table of Contents for the document is now: Status of This Memo 1. Abstract 2. Conventions Used in This Document 3. Introduction 3.1 The IDMEF Data Model 3.1.1 Problems Addressed by the Data Model 3.1.2 Data Model Design Goals 3.1.2.1 Representing Events 3.1.2.2 Content-Driven 3.1.2.3 Relationship Between Alerts 3.2 Implementing the IDMEF in XML 3.2.1 The Extensible Markup Language 3.2.2 Rationale for Implementing IDMEF in XML 4. Notational Conventions and Formatting Requirements 4.1 Universal Modeling Language 4.1.1 Relationships 4.1.1.1 Inheritance Relationship 4.1.1.2 Aggregation Relationship 4.1.2 Multiplicity Indicator 4.2 Extensible Markup Language 4.2.1 The IDMEF Document Prolog 4.2.1.1 XML Declaration 4.2.1.2 IDMEF DTD Formal Public Identifier 4.2.1.3 IDMEF DTD Document Type Definition 4.2.2 Character Data Processing in XML and IDMEF 4.2.2.1 Character Entity References 4.2.2.2 Character Code References 4.2.2.3 White Space Processing 4.2.3 Languages in XML and IDMEF 4.3 IDMEF Data Types 5. The IDMEF Data Model and DTD 5.1 Data Model Overview 5.2 The Core of the Data Model 5.2.1 The IDMEF-MESSAGE Class 5.2.2 The ALERT Class 5.2.3 The ANALYZER Class 5.2.4 The CREATETIME Class 5.2.5 The DETECTTIME Class 5.2.6 The ANALYZERTIME Class 5.2.7 The CLASSIFICATION Class 5.2.8 The SOURCE Class 5.2.9 The TARGET Class 5.2.10 The ADDITIONALDATA Class 5.2.11 The TOOLALERT Class 5.2.12 The CORRELATIONALERT Class 5.2.13 The OVERFLOWALERT Class 5.2.14 The Support Classes 5.2.14.1 The IDENT Class 5.2.14.2 The ADDRESS Class 5.2.14.3 The USER Class 5.2.14.4 The NODE Class 5.2.14.5 The PROCESS Class 5.2.14.6 The SERVICE Class 5.2.14.7 The WEBSERVICE Class 5.2.14.8 The SNMPSERVICE Class 6. Extending the IDMEF 6.1 Extending the Data Model 6.1.1 Extension by Aggregation 6.1.2 Extension by Subclassing 6.2 Extending the DTD 7. Special Considerations 7.1 XML Validity and Well-Formedness 7.2 Analyzer-Manager Time Synchronization 7.3 NTP Timestamp Wrap-Around 7.4 Unrecognized XML Tags 7.5 Digital Signatures 8. Examples 9. The IDMEF XML Document Type Definition 10. Security Considerations 11. References 12. Acknowledgements 13. Author's Addresses Full Copyright Statement 2. The existing <User> element and its sub-elements are being replaced with the new model defined by Glenn Mansfield and me, with modifications as suggested by Herve Debar, as proposed and agreed to on the idwg-public list in Oct/Nov 2000. This results in the following general format: <User category="unknown|application|os-device"> <Id type="original-user|current-user|target-user|user-privs| current-group|group-privs"> <name>user name</name> <number>user id</number> </Id> ... </User> 3. The representation of time is being simplified as proposed by Paul Sangree and agreed to in San Diego: a. Eliminate the <Time> element, and created a new <CreateTime> element b. Eliminate the <time> and <date> elements c. Date and time format to ISO 8601:2000 standard d. Change <ntpstamp> from an element to an attribute This results in the following general format: <Alert> <AnalyzerTime ntpstamp="0xBDFA4701.0x32C6"> 2000-12-25T01:00:01.15+0000 </AnalyzerTime> <DetectTime ntpstamp="0xBDFA4701.0x32C6"> 2000-12-25T01:00:01.15+0000 </DetectTime> <CreateTime ntpstamp="0xBDFA4701.0x32C6"> 2000-12-25T01:00:01.15+0000 </CreateTime> </Alert> Language will be added to the Internet-Draft to cover formatting the date/time strings according to ISO 8601:2000, and to cover some subleties of using the NTP timestamp. 4. Support for isolated networks and sensors with multiple interfaces is being added, as proposed by Paul Sangree and agreed to in San Diego: a. Add an optional <interface> attribute to <Source> and <Target> which can be used to identify the interface on which a network sensor saw the traffic. b. Add optional attributes "vlan-num" and "vlan-name" to the <Address> element. 5. The <Environment> and <Argument> elements are being removed, moving <env> and <arg> up one level. Proposed by Paul Sangree and agreed to in San Diego. 6. The value "unknown" is being removed from the list of possible values for the "type" attribute on <AdditionalData>, as it makes no sense. Proposed by Paul Sangree and agreed to in San Diego. 7. The content model for <AdditionalData> is being changed from "#PCDATA" to "ANY", and added "xml" as a possible value for the "type" attribute. The purpose of this is to fix the extensibility problems discussed in San Diego. By making this change, we can allow people to include additional DTDs in their IDMEF markup (e.g., one for packet headers), and to put all the new markup underneath <AdditionalData type="xml">. This change will also allow us to make better (i.e., correct) use of XML Namespaces. The text in previous Internet-Drafts tells people how to add or change elements in their IDMEF Messages. All of that language is being removed, and replaced with new requirements that basically say that the only way you can do extensions is by including new DTDs, and anything you include gets put under <AdditionalData type="xml">. Futhermore, any extensions you do add must use a different XML Namespace (i.e., they can't use "idmef" or the default namespace) to avoid conflicts with existing IDMEF elements and attributes. I believe this change also addresses the issues raised on the list by Tara Whalen (including data that uses a radically different data model, such as anomaly data) and Joe McAlerney (include packet header data). To do this, just write a DTD for the data you want to include, and put the data (with all your new tags) under <AdditionalData>. You get the data in the alert, and any managers that don't know what to do with it can just throw it away; the rest of the IDMEF format does not change. *** NOTE: I was tasked with investigating this problem in San Diego; the above is my solution. We are limited in how we can solve the problem, mostly because of the limitations imposed by DTDs. XML Schemas will, I think, allow us a more "elegant" solution, but they are currently only in Candidate Recommendation status within the W3C, which in effect means we'll be unable to use them until "Version 2" of IDMEF, given our current timetable. 8. The caveats on <AnalyzerTime> and time syncrhonization made in Paul Sangree's presentation in the appropriate section(s) of the Internet-Draft, are being included, as agreed to in San Diego. Also, some recommendations on how implementers should handle the time synchronization problem are being added. 9. The <portlist> element is being removed from the <Service> element. To achieve lists of ports, specify multiple <Target> elements. ========================================== The following questions have come up during the process of revising the data model and DTD, and need to be discussed as well. 1. The <CorrelationAlert> and <ToolAlert> elements each express relationships between multiple alerts. But neither of them seem to "fit" well into the model where they are now (for example, do <Classification>, <Source>, and <Target> have any meaning in the context of a <CorrelationAlert> or a <ToolAlert>). Is there a common abstraction to replace these two elements? For example: <AlertAssociation> <relationship> "attack tool," "recon sweep," etc. </relationship> <data> tool name, command, etc. </data> <alertid> id of a related alert </alertid> <alertid> id of another related alert </alertid> ... </AlertAssociation> Furthermore, should <Alert> be defined as an "either-or" sort of thing in which it's either a "plain" alert with <Classification>, <Source>, and <Target> or an "association" alert with <AlertAssociation>, like this: <!ELEMENT Alert ( Analyzer, CreateTime, DetectTime?, AnalyzerTime?, (AlertAssociation | (Classification+, Source*, Target*), AdditionalData* )> 2. <OverflowAlert> doesn't seem to fit very well either, in that its very presence indicates the attack method. Further, the attack method and the value of the <program> element can usually be determined indirectly through <Classification>, and are somewhat redundant. The <size> and <buffer> elements provide more detail than is perhaps warranted, and since many analyzers have no way to obtain their values, they may make more sense relegated to <AdditionalData>. 3. <WebService> and <SNMPService> seem kind of "special-case-ish" as well. Is there some kind of generalized representation (like the above for alert associations) we can use for this kind of thing? ============================================================== The following items were proposed by Paul Sangree, but not resolved at the San Diego meeting. 1. Add <AdditionalData> to <Classification> to allow the provision of other information besides a URL and a name. *** NOTE: This proposal was not discussed in San Diego. *** COMMENT: I'm not sure I'd do this; it would make processing the classification much harder for managers. You can use the existing <AdditionalData> to do this (especially once we allow it to contain arbitrary tags), and that keeps all the "weird" stuff in one place. 2. Add a <Context> element for representing alert context. *** NOTE: It was agreed in San Diego that this proposal needs more work. 3. Add summarized lists (count attributes). *** NOTE: It was agreed in San Diego that this is generally desirable, but that the issue of which particular elements should have count attributes added to them needs further work. 4. Add information about automated actions taken by an analyzer. *** NOTE: This proposal was not discussed in San Diego. *** COMMENT: We used to have these, and they were removed (I believe at the 2/2000 interim meeting) because we couldn't come up with a set of standard actions, and free-form information didn't seem appropriate. This information can always be included in <AdditionalData>. 5. Remove <sport> and <dport> from <Service> and add <port> to <Source> and <Target>, to allow specifying multiple source ports. *** NOTE: This proposal was not discussed in San Diego. 6. Add "class", "manufacturer", "model", and "version" attributes to the <Analyzer> element to allow analyzers to be better identified, and to give hints on what stuff in <AdditionalData> might mean. *** NOTE: This proposal was not discussed in San Diego. 7. Fix the definition of the "impact" attribute by breaking it into three smaller attributes, "severity", "completion", and "type". *** NOTE: It was agreed in San Diego that the idea of breaking this up is desirable, but that the specifics still need to be worked out. Most people liked a "severity" with three (hi/med/low) to seven (use the ones from syslog - emerg/alert/crit/err/warning/notive/info/debug) values. ===================================== The following items were proposed to the list by Andy Walther on 13-Dec-00. There has been no followup discussion on the list as of yet. 1. Add a <File> element (plus several sub-elements) to support host-based systems that want to provide information on files. *** COMMENT: If we were to add this, it surely would not belong under the top-level <Alert> as proposed. I would argue for putting it under the <Target> element -- it seems to fit there, and (I assume) most host- based systems that report file information are going to report it in the context of "somebody just did something to this file." 2. Add a <Connection> element (plus sub-elements) to consolidate information that's "buried in the hierarchy." *** COMMENT: I'm not against rearranging things if they need to be moved, but we should avoid putting the same information in more than one place.
Prepared by Robin Cover for The XML Cover Pages archive. See: "Intrusion Detection Message Exchange Format."