DTD/Data Model changes

Subject: DTD/Data Model changes
From: David A. Curry (davy@iss.net)
Date: Wed Jan 03 2001 - 19:59:00 CET

The following is a list of changes that are being made to the IDMEF Data Model 
and/or the IDMEF XML DTD, or to the Internet-Draft that describes them, 
following the San Diego IETF/IDWG meetings. 

 1. The merged Data Model/XML DTD Internet-Draft is being rearranged and parts 
    of it rewritten so that it reads in a more logical way. The tentative 
    Table of Contents for the document is now: 

    Status of This Memo 
    1. Abstract 
    2. Conventions Used in This Document 
    3. Introduction 
       3.1 The IDMEF Data Model 
          3.1.1 Problems Addressed by the Data Model 
          3.1.2 Data Model Design Goals 
             3.1.2.1 Representing Events 
             3.1.2.2 Content-Driven 
             3.1.2.3 Relationship Between Alerts 
       3.2 Implementing the IDMEF in XML 
          3.2.1 The Extensible Markup Language 
          3.2.2 Rationale for Implementing IDMEF in XML 
    4. Notational Conventions and Formatting Requirements 
       4.1 Universal Modeling Language 
          4.1.1 Relationships 
             4.1.1.1 Inheritance Relationship 
             4.1.1.2 Aggregation Relationship 
          4.1.2 Multiplicity Indicator 
    4.2 Extensible Markup Language 
          4.2.1 The IDMEF Document Prolog 
             4.2.1.1 XML Declaration 
             4.2.1.2 IDMEF DTD Formal Public Identifier 
             4.2.1.3 IDMEF DTD Document Type Definition 
          4.2.2 Character Data Processing in XML and IDMEF 
             4.2.2.1 Character Entity References 
             4.2.2.2 Character Code References 
             4.2.2.3 White Space Processing 
          4.2.3 Languages in XML and IDMEF 
       4.3 IDMEF Data Types 
    5. The IDMEF Data Model and DTD 
       5.1 Data Model Overview 
       5.2 The Core of the Data Model 
          5.2.1 The IDMEF-MESSAGE Class 
          5.2.2 The ALERT Class 
          5.2.3 The ANALYZER Class 
          5.2.4 The CREATETIME Class 
          5.2.5 The DETECTTIME Class 
          5.2.6 The ANALYZERTIME Class 
          5.2.7 The CLASSIFICATION Class 
          5.2.8 The SOURCE Class 
          5.2.9 The TARGET Class 
          5.2.10 The ADDITIONALDATA Class 
          5.2.11 The TOOLALERT Class 
          5.2.12 The CORRELATIONALERT Class 
          5.2.13 The OVERFLOWALERT Class 
          5.2.14 The Support Classes 
             5.2.14.1 The IDENT Class 
             5.2.14.2 The ADDRESS Class 
             5.2.14.3 The USER Class 
             5.2.14.4 The NODE Class 
             5.2.14.5 The PROCESS Class 
             5.2.14.6 The SERVICE Class 
             5.2.14.7 The WEBSERVICE Class 
             5.2.14.8 The SNMPSERVICE Class 
    6. Extending the IDMEF 
       6.1 Extending the Data Model 
          6.1.1 Extension by Aggregation 
          6.1.2 Extension by Subclassing 
       6.2 Extending the DTD 
    7. Special Considerations 
       7.1 XML Validity and Well-Formedness 
       7.2 Analyzer-Manager Time Synchronization 
       7.3 NTP Timestamp Wrap-Around 
       7.4 Unrecognized XML Tags 
       7.5 Digital Signatures 
    8. Examples 
    9. The IDMEF XML Document Type Definition 
    10. Security Considerations 
    11. References 
    12. Acknowledgements 
    13. Author's Addresses 
    Full Copyright Statement 

 2. The existing <User> element and its sub-elements are being replaced with 
    the new model defined by Glenn Mansfield and me, with modifications as 
    suggested by Herve Debar, as proposed and agreed to on the idwg-public 
    list in Oct/Nov 2000. 

    This results in the following general format: 

    <User category="unknown|application|os-device"> 
      <Id type="original-user|current-user|target-user|user-privs| 
                    current-group|group-privs"> 
        <name>user name</name> 
        <number>user id</number> 
      </Id> 
      ... 
    </User> 

 3. The representation of time is being simplified as proposed by Paul Sangree 
    and agreed to in San Diego: 

    a. Eliminate the <Time> element, and created a new <CreateTime> element 
    b. Eliminate the <time> and <date> elements 
    c. Date and time format to ISO 8601:2000 standard 
    d. Change <ntpstamp> from an element to an attribute 

    This results in the following general format: 

    <Alert> 
      <AnalyzerTime ntpstamp="0xBDFA4701.0x32C6"> 
        2000-12-25T01:00:01.15+0000 
      </AnalyzerTime> 
      <DetectTime ntpstamp="0xBDFA4701.0x32C6"> 
        2000-12-25T01:00:01.15+0000 
      </DetectTime> 
      <CreateTime ntpstamp="0xBDFA4701.0x32C6"> 
        2000-12-25T01:00:01.15+0000 
      </CreateTime> 
    </Alert> 
    
    Language will be added to the Internet-Draft to cover formatting the 
    date/time strings according to ISO 8601:2000, and to cover some subleties 
    of using the NTP timestamp. 

 4. Support for isolated networks and sensors with multiple interfaces is 
    being added, as proposed by Paul Sangree and agreed to in San Diego: 

    a. Add an optional <interface> attribute to <Source> and <Target> which 
       can be used to identify the interface on which a network sensor saw the 
       traffic. 
    b. Add optional attributes "vlan-num" and "vlan-name" to the <Address> 
       element. 

 5. The <Environment> and <Argument> elements are being removed, moving <env> 
    and <arg> up one level. Proposed by Paul Sangree and agreed to in San 
    Diego. 

 6. The value "unknown" is being removed from the list of possible values for 
    the "type" attribute on <AdditionalData>, as it makes no sense. Proposed 
    by Paul Sangree and agreed to in San Diego. 

 7. The content model for <AdditionalData> is being changed from "#PCDATA" to 
    "ANY", and added "xml" as a possible value for the "type" attribute. 

    The purpose of this is to fix the extensibility problems discussed in San 
    Diego. By making this change, we can allow people to include additional 
    DTDs in their IDMEF markup (e.g., one for packet headers), and to put all 
    the new markup underneath <AdditionalData type="xml">. This change will 
    also allow us to make better (i.e., correct) use of XML Namespaces. 

    The text in previous Internet-Drafts tells people how to add or change 
    elements in their IDMEF Messages. All of that language is being removed, 
    and replaced with new requirements that basically say that the only way 
    you can do extensions is by including new DTDs, and anything you include 
    gets put under <AdditionalData type="xml">. Futhermore, any extensions 
    you do add must use a different XML Namespace (i.e., they can't use 
    "idmef" or the default namespace) to avoid conflicts with existing IDMEF 
    elements and attributes. 

    I believe this change also addresses the issues raised on the list by Tara 
    Whalen (including data that uses a radically different data model, such as 
    anomaly data) and Joe McAlerney (include packet header data). To do this, 
    just write a DTD for the data you want to include, and put the data (with 
    all your new tags) under <AdditionalData>. You get the data in the alert, 
    and any managers that don't know what to do with it can just throw it 
    away; the rest of the IDMEF format does not change. 

    *** NOTE: I was tasked with investigating this problem in San Diego; the 
        above is my solution. We are limited in how we can solve the problem, 
        mostly because of the limitations imposed by DTDs. XML Schemas will, 
        I think, allow us a more "elegant" solution, but they are currently 
        only in Candidate Recommendation status within the W3C, which in 
        effect means we'll be unable to use them until "Version 2" of IDMEF, 
        given our current timetable. 

 8. The caveats on <AnalyzerTime> and time syncrhonization made in Paul 
    Sangree's presentation in the appropriate section(s) of the 
    Internet-Draft, are being included, as agreed to in San Diego. 

    Also, some recommendations on how implementers should handle the 
    time synchronization problem are being added. 

 9. The <portlist> element is being removed from the <Service> element. To 
    achieve lists of ports, specify multiple <Target> elements. 

==========================================

The following questions have come up during the process of revising the data 
model and DTD, and need to be discussed as well. 

1. The <CorrelationAlert> and <ToolAlert> elements each express relationships 
   between multiple alerts. But neither of them seem to "fit" well into the 
   model where they are now (for example, do <Classification>, <Source>, and 
   <Target> have any meaning in the context of a <CorrelationAlert> or a 
   <ToolAlert>). 

   Is there a common abstraction to replace these two elements? For example: 

           <AlertAssociation> 
           <relationship> 
              "attack tool," "recon sweep," etc. 
           </relationship> 
           <data> 
              tool name, command, etc. 
           </data> 
           <alertid> 
              id of a related alert 
           </alertid> 
           <alertid> 
              id of another related alert 
           </alertid> 
           ... 
        </AlertAssociation> 

   Furthermore, should <Alert> be defined as an "either-or" sort of thing in 
   which it's either a "plain" alert with <Classification>, <Source>, and 
   <Target> or an "association" alert with <AlertAssociation>, like this: 

           <!ELEMENT Alert ( 
            Analyzer, CreateTime, DetectTime?, AnalyzerTime?, 
            (AlertAssociation | (Classification+, Source*, Target*), 
            AdditionalData* 
          )> 

2. <OverflowAlert> doesn't seem to fit very well either, in that its very 
   presence indicates the attack method. Further, the attack method and 
   the value of the <program> element can usually be determined indirectly 
   through <Classification>, and are somewhat redundant. The <size> and 
   <buffer> elements provide more detail than is perhaps warranted, and 
   since many analyzers have no way to obtain their values, they may make 
   more sense relegated to <AdditionalData>. 

3. <WebService> and <SNMPService> seem kind of "special-case-ish" as well. 
   Is there some kind of generalized representation (like the above for 
   alert associations) we can use for this kind of thing? 

==============================================================

The following items were proposed by Paul Sangree, but not resolved at the San 
Diego meeting. 

 1. Add <AdditionalData> to <Classification> to allow the provision of other 
    information besides a URL and a name. 

    *** NOTE: This proposal was not discussed in San Diego. 

    *** COMMENT: I'm not sure I'd do this; it would make processing the 
        classification much harder for managers. You can use the existing 
        <AdditionalData> to do this (especially once we allow it to contain 
        arbitrary tags), and that keeps all the "weird" stuff in one place. 

 2. Add a <Context> element for representing alert context. 

    *** NOTE: It was agreed in San Diego that this proposal needs more work. 

 3. Add summarized lists (count attributes). 

    *** NOTE: It was agreed in San Diego that this is generally desirable, but 
        that the issue of which particular elements should have count 
        attributes added to them needs further work. 

 4. Add information about automated actions taken by an analyzer. 

    *** NOTE: This proposal was not discussed in San Diego. 

    *** COMMENT: We used to have these, and they were removed (I believe at 
        the 2/2000 interim meeting) because we couldn't come up with a set of 
        standard actions, and free-form information didn't seem appropriate. 
        This information can always be included in <AdditionalData>. 

 5. Remove <sport> and <dport> from <Service> and add <port> to <Source> and 
    <Target>, to allow specifying multiple source ports. 

    *** NOTE: This proposal was not discussed in San Diego. 

 6. Add "class", "manufacturer", "model", and "version" attributes to the 
    <Analyzer> element to allow analyzers to be better identified, and to 
    give hints on what stuff in <AdditionalData> might mean. 

    *** NOTE: This proposal was not discussed in San Diego. 

 7. Fix the definition of the "impact" attribute by breaking it into three 
    smaller attributes, "severity", "completion", and "type". 

    *** NOTE: It was agreed in San Diego that the idea of breaking this up is 
        desirable, but that the specifics still need to be worked out. Most 
        people liked a "severity" with three (hi/med/low) to seven (use the 
        ones from syslog - emerg/alert/crit/err/warning/notive/info/debug) 
        values. 

=====================================

The following items were proposed to the list by Andy Walther on 13-Dec-00. 
There has been no followup discussion on the list as of yet. 

 1. Add a <File> element (plus several sub-elements) to support host-based 
    systems that want to provide information on files. 

    *** COMMENT: If we were to add this, it surely would not belong under the 
        top-level <Alert> as proposed. I would argue for putting it under the 
        <Target> element -- it seems to fit there, and (I assume) most host- 
        based systems that report file information are going to report it in 
        the context of "somebody just did something to this file." 

 2. Add a <Connection> element (plus sub-elements) to consolidate information 
    that's "buried in the hierarchy." 

    *** COMMENT: I'm not against rearranging things if they need to be moved, 
        but we should avoid putting the same information in more than one 
        place.

Prepared by Robin Cover for The XML Cover Pages archive. See: "Intrusion Detection Message Exchange Format."