recombine

Recombination of igML guidelines with base directories

igML guidelines are produced on the assumption that the base standard is available for filling in the missing parts of the guideline. igML is intended to be the standard import and export format for our products, and there's no need for us to carry along information which isn't needed and can be inferred from the base standards (X12, EDIFACT). This allows the igML guideline export file to be of manageable size; to carry along all redundant information would cause the files to balloon to enormous and unusable size. Expected igML export file size would probably be around 20 to 50K, depending on how many transaction sets or messages were contained in the guideline. A full standard, by contrast, like the complete ASC X12 004010 standard in igML format, would take a gargantuan 6 megabytes!

Some vendors may choose to "fill in all the blanks" in an igML rendition of a guideline. There's nothing in the DTD definition of igML that would prevent one from doing this. FORESIGHT has chosen not to, for the practical reasons stated. Most of the time, a guideline author will change very little in an EDI message or transaction set when building a guideline, except to mark a few segments and groups unused or mandatory, likewise with elements in segments. And it is not uncommon to then mark many codes in an element context as "unused." That, plus adding a few notes of one's own, is what goes into 90% of all guidelines produced. If so, many would feel it unreasonable that an exported portable representation of that guideline consumed 700K, a megabyte or even more.

The igML guideline as produced by EDISIM will contain the minimum amount of information needed to reconstruct the guideline, assuming the presence of the necessary base standard at the target machine. This is also the approach taken by IMPDEF, in that guidelines are represented as a "delta" from a base standard. Take a tiny igML guideline from your trading partner, add water (i.e., the base standard), and voilà, you can resurrect the complete implementation guideline looking exactly the same as it did on the partner's machine! We feel our job is not to ferry all the information in the X12 standards just because two trading partners want to exchange EDI implementation guidelines.

So why isn't there any information about the ST segment in the segment dictionary, <SegmentDictionary>, in the sample guideline APP810.xml? Because no changes were made to it in the guideline and therefore, if we need to know anything about the ST, we can always go look for that information in the base standard, X12-3030.

The free sample C++ program that FORESIGHT is putting together is meant to illustrate the process. It will rely on having igML renditions of the full standards available (whereas EDISIM doesn't need versions of the standards in igML since it already has the standards in its own database) for filling in the blanks within igML guidelines. FORESIGHT can supply the igML versions of the EDIFACT standard directories; DISA would supply the igML renditions of the X12 standards (in addition to DISA table data and SEF). The IGML3030.xml on the igMLDEV Web page is an example of what DISA might be expected to distribute.

The inference process is not exactly trivial:

An igML document containing an implementation guideline may contain any number of transaction sets or messages, though most will contain only 1 or 2 messages. It will have a reference to a base standard (in the <Standard> element DerivedFrom attribute) whence all missing information can be inferred. We're making the assumption that any igML-capable software already has all of the ASC X12 and EDIFACT standards built into it and can find all pieces of information which need to be inferred for the guideline. Or, in the case of the free sample code, it is provided the full base standard in an igML document.

Each transaction set in the guideline is completely laid out in terms of <Table>, <SegmentRef> and <Group> elements. Unlike IMPDEF, we don't rely on position numbers to match up segment references in the guideline message with the message in the standard, since the position numbers might very well change (IMPDEF makes no provision for this possibility). But each <SegmentRef> has only enough information to identify the segment reference: if you want to know the name of the segment, you'd have to go to the corresponding <DictSegment> in the segment dictionary (<SegmentDictionary>). Only if the name were overridden for that particular segment location in the guideline would a <Text> element belonging to the <SegmentRef> be provided. And if the guideline doesn't have an entry for the segment in its <DictSegment>, you have to look at the segment dictionary in the base standard!

It goes on further like this. Even if there were a segment dictionary entry in the guideline, it may not have the elements listed which belong to that segment. I.e., the <DictSegment> may have been in the guideline simply to show that the name of the segment had been overridden in the guideline, but none of the constituent elements were changed! You'd have to rely on the base standard to give you the list of elements in <DataElementRef> entries. Only if some of the elements had changed (e.g., been marked as "unused" or had code subsets provided) would <DataElementRef> elements be provided in that particular context in the guideline message.

In summary, to find out information about a segment referred to by a message, you have to go to the segment dictionary in the guideline. If you still don't have enough information, then you have to go to the segment dictionary in the base standard. Same for elements: once you know that a segment uses particular elements, you have to look in the guideline for overrides in the <ElementDictionary>. If that doesn't provide all the information about that element, a side trip to the base standard's <ElementDictionary> is necessary. igML very much depends on a dictionary-driven process; in this it's not much different from IMPDEF and DIRDEF, nor the DISA table data nor even the Washington Publishing table data for ICs (as used for VICS, UCS and HIPAA).

Again, all this logic does have some rhyme and reason, and will be illustrated in the free sample C++ code. The recombinant logic (i.e., re-combining an igML guideline with the base standard information) is already present in our EDISIM system, but we have to pull it out in a form suitable for the demonstration programs using an igML rendition of the full base standard. The logic is complex, which is why we're providing the C++ source code to illustrate it.

Again, any vendor is free to slap everything they want in their igML documents, even stuff that can be easily derived from the base standard. But after they try that on a few guidelines and end up with half a megabyte, or even a megabyte, of XML per guideline, they'll probably give up.

Last Updated Friday, March 10, 2000