eAI Journal - Is CWMI the Holy Grail of Meta Data Standards?

[From: http://www.eaijournal.com/DataIntegration/HolyGrail.asp; use this canonical version if possible.]

Front Office

Home

Departments

Application Integration

Back Office

Is CWMI the Holy Grail of Meta Data Standards?
By Rich Seeley

The Object Management Group (OMG), headquartered in Needham, Mass., is close to adopting the Common Warehouse Metadata Interchange (CWMI) standard, which is touted as the Holy Grail sought by developers working on integration, data warehouse and e-business applications. OMG calls the specifications it has developed for the CWMI standard “a landmark submission that follows the creation of the Extensible Markup Language (XML) by the Worldwide Web Consortium (W3C) and XML Metadata Interchange (XMI) by the OMG.”

Backed by industry leaders including Oracle, IBM, Unisys, NCR, and Sun Microsystems, CWMI is viewed as the next step toward establishing a standard for metadata interchange among all data warehousing, business intelligence, knowledge management and portal technologies. The proposed standard provides an object model with a set of Application Programming Interface (APIs), interchange formats and services for the wide range of metadata involved in the extraction, transformation, transportation, loading, integration and analysis phases of data warehouse projects. OMG also advocates CWMI as the standard which “resolves potential integration issues by enabling users to extend the model to meet their specific needs.” The proposed standard is viewed as a boon to developers of Web-based projects.

"CWMI lowers implementation costs for our customers by extending metadata interoperability into the world of Web-based data warehousing environments," said Sridhar Iyengar, Unisys Fellow, member of the OMG Architecture Board and architect of Unisys Universal Repository (UREP). eAI Journal recently spoke to Iyengar, who has been cast by OMG as the King Arthur in this quest for the Holy Grail of metadata. Iyengar discussed the potential for CWMI and how it will interact with XML, XMI, Meta Objects Facility (MOF), Common Object Request Broker Architecture (CORBA), Java Beans and other existing standards.

eAIJ: What is driving this quest for the Holy Grail in metadata standards?

Iyengar: The biggest issue people are mentioning is, “How do you manage enterprise data?” You can’t manage enterprise data without effective data and metadata standards. XMI, MOF and CWM are the crux of what is needed for a distributed or federated database environment, because not all the data is going to go into one Oracle database or DB2 database or Structured Query Language (SQL) server. It’s going to be in files, Web servers, XML documents and databases. Without having a distributed data and metadata architecture, you are not going to be able to manage it.

eAIJ: Can you give us a brief history of the OMG metadata standards?

Iyengar: The roots of CWMI go back to 1997 when OMG standardized the Unified Modeling Language (UML). Along with UML, the OMG also standardized the MOF So basicallyMOF is the core distributed metadata management architecture for OMG. UML was the first information model, and we call them meta models in OMG, which is MOF-compliant. Development tools and application server building tools for sharing software components have used that. CWM was the second major meta model that the OMG began working on to address end-to-end management for databases, data warehouses, data marts and to some degree, the emerging world of the enterprise information portals.

eAIJ: Who has been working on the CWMI proposal, and how long has it taken to develop it?

Iyengar: The Request For Proposal (RFP) was issued in September 1998. The full design took approximately a year and two months. The CWM team, which is now comprised of IBM, Unisys, Oracle, NCR, Union Bank of Switzerland, Sun Microsystems and a few other companies, have been meeting for approximately five to seven days every month for the past year. Top industry experts in metadata management, databases and data warehouses designed the specifications. That’s why this standard is so solid.

eAIJ: Are you now close to the adoption and publication of CWMI?

Iyengar: For all practical purposes, the technical work on the core model is complete. At a February OMG meeting in Denver, both the Architecture Board and the Analysis and Design Task Force unanimously voted to adopt the CWM specification. Since then, it has been going through the final stages, where all the members vote. In June, in Oslo, the OMG board of directors will take it up. I fully expect that CWM will be an adopted specification in June.

eAIJ: How soon can we expect to see implementations of CWMI?

Iyengar: You can expect implementations of this to start coming out in the next few months. CWM will be used initially for managing and capturing all the data in databases, data warehouses and portals. OMG has chosen to work in the data warehousing area because that’s a pretty big market. Business intelligence and data warehousing is expected to be somewhere between $80 billion to $100 billion over the next few years.

eAIJ: What about the business-to-business (B2B) arena?

Iyengar: With the growth of B2B, there needs to be integration between application servers, which serve up the data, and the back-end databases for the whole thing to be unified. The component part is already available using specifications such as Enterprise Java Beans, and CORBA component model. And the data part that B2B integration requires will be available using CWM.

eAIJ: In the B2B arena, will everyone you connect to have to have accepted the CWMI standard for it to work?

Iyengar: Not necessarily. Basically, one of the reasons that the industry is rushing to produce all of these XML Document Type Definitions (DTDs) in various domains is so companies such as IBM, Oracle, Unisys and NCR and also end-user companies such as Union Bank of Switzerland can have common enterprise metadata. Now, there will be cases where vendors will say, “I want to do it my own way.” In that case, there’s a model in CWM called a transformation model, which can be used to automatically transform proprietary data formats into CWM and visa versa.

eAIJ: Can you explain the architecture for B2B projects?

Iyengar: In projects for B2B, you will see in the common warehouse models, information for different types of data resources and systems management. So the core architecture for CWMI builds on MOF, which is used to define the CWM model itself. The design of the CWM model uses the UM L.

eAIJ: How are existing standards, such as UML, then employed with development tools?

Iyengar: We used UML to design a common warehouse meta model and to precisely define it such that we can automate the interchange between databases. Between data warehouses using XML, we use XMI. So the relationship between UML, MOF and XMI is cooperative. Basically, you start with UML and you describe the objects, the relationships, the collaborations and the whole design aspect of your model. Then, you register this design in a MOF-compliant repository or tool. An example of an MOF-compliant tool is Rational Rose with the plug-ins Unisys has developed. An example of a MOF-compliant repository is the Unisys Universal Repository.

eAIJ: What are the basic components of CWMI?

Iyengar: CWM is broken into two major parts. One is called the core, which addresses the standard data that’s in relational, network, hierarchical or XML-based data sources. Then there are extensions for Information Management System (IMS), Virtual Storage Access Method (VSAM), COBOL, and DMS2. These give examples to various vendors on how to take the information that’s already there in CWM and customize it for their specific database, file system or XML environment if they have any proprietary extension. Fundamentally, CWM lets you use standard metadata for enterprise databases and data warehouses. It also provides a framework for extending it to your native proprietary system. This completely conforms to OMG’s MOF architecture, which addresses how data and metadata can be made available in a distributed environment programmatically using Java, CORBA, Component Object Model (COM) or XML.

eAIJ: Does an example of how that will work come to mind?

Iyengar: An example of what you could do with CWM is automatically and dynamically share database schemas, data warehouse plans, transformation rules and the business process of how you automate when the data moves from your operation environment into the data warehouse. All of this metadata is captured in the CWM meta model. And just like UML, this is an abstraction of all the vocabularies you need for data warehousing in UML terminology. The benefit of that is that UML is significantly richer in semantics and relationships than CORBA Interface Definition Language (IDL), XML DTD, or the COM IDL. It’s just a much higher level of abstraction.

eAIJ: So this is a higher level than XML?

Iyengar: Absolutely. XML does not understand the basic principals of inheritance, polymorphism, collaboration and different types of relationships. XML understands containment, which means you can embed documents and links and you can have access across content and links, but you cannot have different types of relationships. You can go beyond the basic XML, to use things such as X-links and X-pointers if you want to better define relationships. All of these are fundamental concepts in both the MOF and UML model. It’s much richer, but because it’s richer, it needs to be translated into something highly concrete such as Java or XML DTDs — so programmers can deal with it.

eAIJ: Is that where XMI comes in?

Iyengar: What XMI does is automatically take these UML models and generate XML DTDs from them. So XMI — and many people don’t realize this — is actually an OMG standard that you can use to automate the process of going from a business model to an XML DTD, using the XMI specifications. The other part of the MOF, which is a close relative of XMI, is that from the same abstract UML model, we can generate CORBA IDL, C++, COM interfaces, Java interfaces, etc. So you can programmatically, in a distributed environment, use Java or COM to get at the metadata. Or, you could use XML and Hypertext Transfer Protocol (HTTP) to go after the metadata — depending on the type of access you need.

eAIJ: So developers are not locked into CORBA?

Iyengar: The OMG standards don’t force you to use only CORBA. You can use CORBA if it makes sense for getting secure transactional distributed metadata data. If you just want to go and use the Web protocols, HTTP or XML, you can do that, too. The reason is that the same metadata is automatically rendered as XML using the XMI standard. So, basically, our master is the abstract model. So, for people who are designing data warehouse architecture, component architecture or XML-based architecture, if you start with UML, you can go to any of these middleware platforms. And your design is at a much higher level of abstraction.

eAIJ: Why is that higher level of abstraction important?

Iyengar: If you think about complex data and complex content, you realize that you need the relationships, the inheritance, the semantics and the business rules. All these can be represented using theUML, which is why we made UML the baseline for defining all the other information models in OMG, including the CWM. In UML, we talk about use cases, collaborations, components, classes, and interfaces. In CWM, we talk about databases, tables, dimensions, queues, indexes, the business nomenclature, data warehouse plans and the warehouse process. So, we use UML to define the vocabulary, and whenever possible, we use the base definitions in UML.

eAIJ: So, how will programmers begin to use these standards?

Iyengar: For example, some people will start with a UML model and create a Java class, which they will programmatically use with an Enterprise Java Beans (EJB) application server such as WebLogic or WebSphere. Others would start with the same model and define an NTS component in the Microsoft environment. Still others, especially those looking at B2B integration between two companies on the Internet, could use XML with HTTP, CORBA, or MQSeries to exchange B2B document information. The merits of using UML and CWM for capturing your database definitions and your component definitions is that you can now publish the content that is being stored in your back-end databases as XML documents.