Classifications

A classification is a partition of a given collection of items into mutually exclusive and collectively exhaustive sub-collections. A classification depends upon a pre-existing specification of a hierarchy of values, names, and codes called a classification scheme. Registry items in a Registry may be classified by as many classification schemes as deemed appropriate by the Submitting Organization.

A classification scheme can be one of several different styles, or some combination of those styles. The following paragraphs describe three separate styles of classification schemes.

Simple 1-level classification scheme

A simple 1-level classification scheme is a list of distinct values that can be used to partition a collection of objects. The name of the classification scheme can be viewed as the root of the hierarchy, with each of the distinct values considered as a node at the first level. An example of such a simple classification scheme is the StudentStatus attribute of a student population, which can be used to partition the students into Freshmen, Sophomores, Juniors, Seniors, and Special students.

Each node of a classification scheme will allow distinctions to be made between ItemValue and ItemName. An ItemValue will always be considered as a reference to be used in place of the ItemName. With this capability, we could refine the StudentStatus classification scheme to specify the item values FR, SO, JR, SR, SP as references and replacements for the longer item names. A classification scheme defines only one item value for each item name.

Multi-level naming classification scheme

A multi-level naming classification scheme uses scoped names to identify the nodes in a classification hierarchy. Each name is meaningful only if the name of its parent node is known. Names need be unique only as children of the parent node, so in order to uniquely identify a node it is necessary to know the names of each node in a path from the root to the given node.

An example of a multi-level naming classification scheme is the scheme used by biologists to classify all living things. The scheme consists of seven levels: Kingdom, Phylum, Class, Order, Family, Genus, and Species, each with a list of recognized values. However, it is not required that names within each level be unique; there could be a species for tree that is the same as the species of some animal, because trees and animals are in different kingdoms, and thus in different branches of the hierarchy. A complete classification of all living things would thus require a value for each of the seven levels.

A classification scheme can be used for classification of a population taken from just one of its nodes. For example, since primate is an instance of Order, a classification of all primates could consist of just just three values, one for each of the levels Family, Genus, and Species. Similarly, a classification of all modern-day-trees could consist of values for just Genus and Species.

In using a naming classification scheme for classification of a given population, it is usually necessary to identify the following items: 1) a globally unique name for a classification scheme, 2) a unique identifier for each level within the scheme, and 3) a value for each level. This is easily accomplished by defining a classification to be a set of ordered triples, {(SchemeURN, LevelCode, ItemValue)}, where SchemeURN is the unique name of a classification scheme, LevelCode is a short name used to reference a longer and possibly more descriptive LevelName, and ItemValue is a short name or value used to represent a longer ItemName. Or, to eliminate the repetition of SchemeURN, and to allow optional inclusion of the full names, one could define a classification as the following XML element definition:

     <!ELEMENT classification  (levelValuePair+)>
     <!ATTLIST classification
          schemeURN   CDATA  #REQUIRED
          schemeName  CDATA  #IMPLIED >
 
     <!ELEMENT levelValuePair (comment?)
     <!ATTLIST levelValuePair
          levelCode  CDATA  #REQUIRED
          itemValue  CDATA  #REQUIRED
          levelName  CDATA  #IMPLIED
          levelNbr   CDATA  #IMPLIED
          itemName   CDATA  #IMPLIED >

Multi-level coded classification scheme

A multi-level coded classification scheme uses a string of codes to represent a path down the classification scheme hierarchy. Each node has a code that is unique under its parent; then, in an N-level hierarchy, a sequence of node codes from Level 1 to Level N uniquely determines a definition path through the tree. As above, each code is considered to be an ItemValue that represents an ItemName. As in the named classification scheme, item names do not need to be unique. However, the codes are chosen so that it is convenient to represent the path from the root to the given node as a short string of codes.

In a coded classification scheme, the LevelCode is defined to be the sequence of item codes from the root to the given node. Then a classification using this scheme need only supply a value for a single node. The values for each item in the path can be inferred from the sequence, thereby identifying the name for each item in the path.

An example of a coded classification scheme is one for newspaper articles that uses a 3-level scheme with 2 digits to identify Level-1 and three digits each to identify Level-2 and Level-3. The coded value "15052003" thus represents a named classification path as Sport (15), followed by Ski Jumping (052), followed by K180 Flying Jump (003).

In coded classifications, if the structure of the code path is known, i.e. 2:3:3 digits for the three levels, then the level name is unimportant. An application receiving the itemValue “15052003” would know to break it up into three codes 15, 52, and 3 to retrieve the three item names for the item values “15”, “15052”, and “15052003”. If the code structure is not known, then it could be implied by using a separator between the codes in the path, e.g. “15:052:003”, or one could begin with the leaf node and successively find each parent node until reaching the root. In any classification scheme, we will always assume that if any node of the scheme is known, then the sequence of parent nodes, and their associated level names, can always be determined.

In coded classifications where the code structure is not known, it is convenient to have a default code for each level, e.g. Level1, Level2, etc., so that the person or application receiving the coded item value knows how many itemNames to look for.

Subset classification scheme

A subset classification scheme presents a list of options to pick from, and then classifies a member of a population based on the specific subset of those options assigned to that member. For example, a classification scheme may list 5 hardware/software options as pre-requisites for being able to use an application program. Each potential user is classified by the specific subset of the pre-requisites they satisfy, from 0 to all 5. Since there are 32 possible subsets of the 5 options, the 5 options become a collection of 32 different classification items.

The subset classification scheme generalizes to any previously defined classification scheme provided that it is possible to assign multiple classification item values to any member of the population being classified. If the initial classification scheme has N possible classification items, then the derived subset classification scheme has 2**N possible classification values. We support subset classification schemes in our information model by allowing a classification to consist of multiple item values at each level of the classification scheme hierarchy.

As a more complex example, consider a classification scheme that specifies Age Groups of people. A software product is then classified by the different age groups that might be able to use the product. Suppose the initial Age Group classification scheme is as follows:

            Adults (A)
                        Older Adults (AO)
                                    Advanced Age 85+ (AOA)
                                    Post Retired 70-85 (AOP)
                                    Retired 55-75 (AOR)
                                    Near Retirement 50-65  (AON)
                        Middle Adults (AM)
                        Young Adults (AY)
                                    Post 25 (AYO)
                                    Pre 25 (AYY)
            Teens (T)
                        Upper Teens (TU)
                        Lower Teens (TL)
            Children (C)
                        Schoolage Children (CS)
                        Preschool Children (CP)
                                    Upper Preschool ages (CPU)
                                    Lower Preschool Ages (CPL)
                        Infant Children (CI)
                                    6-24 months (CIU)
                                    < 6 months (CIL)

This Age Group classification scheme consists of 21 items. Then there could be a variety of Subset classification schemes derived from this one by allowing a choice of one or more of these items. An arbitrary subset would yield 2**21 different classifications, whereas a classification scheme allowing from one to three choices would yield 21 + 210 + 1330 = 1561 different classifications. For example, if up to three choices are allowed, the set {AYY, T, CS} might be used to classify software that appeals to young people from approximately 6 to 25 years of age.

Using the XML defined above, the classification {AYY, T, CS} could be transmitted in XML as follows:


<classification
            schemeURN="oasis:example:AgeGroup3"
            schemeName="Age Group - One To Three Items"
<levelValuePair
            levelCode="3"            itemValue="AYY"            itemName="Young Adults Pre 25"/>
<levelValuePair  
            levelCode="1"               itemValue="T"            itemName="Teens"/>
<levelValuePair
            levelCode="2"            itemValue="CS"            itemName="Schoolage Children"/>
</classification>