- Ref: DataReferenceModel_09_2004 (2K5C)
- Comment: These comments on Volume I should be considered supplementary to the comments submitted by Michael Daconta regarding the January 2004 version of the DRM. He has raised many excellent points that need no duplication, although we refer to some of his comments in these remarks. Note that Mr. Daconta’s review was based on the January 2004 edition of the DRM. The edition I have in hand and am evaluating is dated September 2004. (2K5D)
- Issue: The Scope of Commonality (2K5E)
- The idea of sharing data is of course not new. There have been many efforts to share data within work groups, companies, government agencies, whole industries, and some efforts to share data globally. Progress in sharing data has been uneven, often following the pattern of two steps forward and one step backward. (2K5F)
- One of the issues that has hindered progress is the difficulty of obtaining broad agreement on data definitions. As a sharing effort expands from a small local work group to a wider scope, the human, organizational, and technical barriers to agreement mount. (2K5G)
- This is not so say that we should not undertake sharing efforts in the federal government. However, based on past experience, it would be wise to distinguish the most modest and hence most achievable objectives, while laying the groundwork for more ambitious objectives. (2K5H)
- A Basic Separation of Concerns (2K5I)
- It is necessary to distinguish between three forms of sharing in the data world: (2K5J)
- Having a common means for defining data elements (2K5K)
- Having common data element definitions (2K5L)
- Having common data, i.e. common instances of data elements (2K5M)
- For example, two healthcare organizations might conceivably share the same means for defining data elements, but that does not necessarily mean that they share specific data element definitions. On the other hand, suppose they do in fact share the definitions of some data elements that describe an immunization schedule; that does not necessarily mean that they share actual immunization schedules that realize the data element definitions. (2K5N)
- It is important to clearly differentiate among these three levels of commonality in order to separate concerns properly in a data architecture, and in order to allow organizations to develop an organized approach to gradually progressing from relatively easy to more difficult tasks. In any of its discussions, the DRM must make it crystal clear which of these concerns it is talking about. (2K5O)
- A Common Means for Defining Data Elements (2K5P)
- The ability to share the means for defining data elements, while the most modest of the three objectives, would be no small achievement. ISO/IEC 11179 provides a common means for defining the structure of a data element and thus supports this objective. A means to define the business context of a data element, as the DRM envisions, also supports this objective. (2K5Q)
- Standardization across the federal government on the means for defining data elements would make it easier for one governmental organization to examine and assess the data element definitions of another organization. Common tooling could be used to support the following functions: (2K5R)
- Defining data elements (2K5S)
- Storing data element definitions in repositories (2K5T)
- Registering data element definitions (2K5U)
- Querying repositories of data element definitions (2K5V)
- Interchanging data element definitions (2K5W)
- Securing access to data element definitions (2K5X)
- Thus, even in the absence of common use of the same data element definitions across organizations, having a common means for defining data elements is useful. (2K5Y)
- For example, consider our immunization data scenario. Two organizations might have different data elements defined for immunization records, but if they share the means of definition, they can readily review each other’s definitions. (2K5Z)
- Common Data Element Definitions (2K60)
- Sharing data element definitions across organizations is a more ambitious undertaking. Having a common means of definition is a necessary but not sufficient condition for sharing data element definitions. Additional conditions that have to be satisfied in order to share data element definitions across organizations include: (2K61)
- Agreement on the business context of the data element (2K62)
- Agreement on the structure (syntax) of the data element (2K63)
- Agreement on the meaning (semantics) of the data element (2K64)
- Agreement on all of the above is necessary to share the definition of a business element. This is a considerably higher hurdle than that posed by simply sharing the means of definition. In our immunization example, agreeing on how to structure the immunization data elements and on the meaning of each of the data elements depends on organizational and interpersonal factors, not to mention the technical challenges posed by disparate ways in which different organizations maintain and use immunization records. (2K65)
- Commonality of data definitions, when achieved, makes it possible to share applications based on those definitions across organizations, even when the organizations do not share the actual data that the applications manipulate. (2K66)
- Note that the current version of the DRM does not have provisions for the semantic grounding of data element definitions. The DRM relegates the specification of the meaning of data elements to informal descriptions in English. Semantic Web technologies having something to offer here, as discussed later in this paper. (2K67)
- Common Data (2K68)
- Sharing actual data among organizations—i.e. sharing instances of the defined data elements—is even more difficult. The sharing of data element definitions is a necessary but not sufficient precondition for the sharing of data. Additionally, the organizations must: (2K69)
- Satisfy security concerns including confidentiality, authentication, and authorization (2K6A)
- Resolve potential conflicts over administrative control of the data, i.e. determine who is responsible for the data’s integrity under what conditions (2K6B)
- Resolve technical problems of sharing data across firewalls (API-based access across firewalls is usually not possible, so architects usually have to rely on HTTP-based bulk transfers, which have performance issues when transferring large amounts of data) (2K6C)
- OMG MDA® Modeling Levels (2K6D)
- In Michael Daconta’s comment #43 he recommends tying the DRM to the modeling levels that the OMG’s Model Driven Architecture® (MDA) defines. This section briefly establishes that linkage. (2K6E)
- The OMG modeling levels—called metalevels Footnote 3 — are related to the levels of sharing as follows: (2K6F)
- Metalevel M2: The specification of a means for defining data elements is at level M2. A formal M2 specification—i.e. a formal specification of a means for definition—is called a metamodel. ISO 11179 is an M2 thing. (2K6G)
- Metalevel M1: A particular data element definition is at level M1. Thus, a definition of an immunization date data element—defined via the means specified at M2—is an M1 thing. (2K6H)
- Metalevel M0: A data record that contains a specific immunization date—which is an instance of the immunization date data element—is an M0 thing. (2K6I)
- Thus, we can use MDA shorthand to refer to “sharing at the M2 level,” which would mean settling on a common means of definition such as ISO 11179, or “sharing at the M1 level,” or “sharing at the M0 level.” Footnote 4 (2K6J)
- MDA also has an M3 metalevel, which is the language in which M2 things are specified. This language is called MOF (Meta Object Facility). There is an M3 because there is more than one kind of definition. A data definition is one kind of definition. A service definition is another kind of definition. There are yet more kinds of definitions, including business process definitions, definitions of software deployments, and more. A full treatment of MOF is beyond the scope of this document. (2K6K)
- Organizational Scope (2K6L)
- We have seen that we can categorize sharing according to metalevels. We also can categorize it according to the organizational scope of the sharing. (2K6M)
- As described earlier, we might intend to share only within a work group, or among work groups within one department (department in the corporate sense) or among departments within one division, or among divisions within a company, or among companies within an industry. We can also describe a continuum of organizational scope using governmental rather than corporate terms, such as agency, department, and so on. (2K6N)
- The topology of sharing has at least two dimensions—metalevel and organizational scope. (2K88)
- The Diagram on a Topology of Sharing illustrates these two dimensions combining to form a sharing topology. (2K85)
- Value Chain Scope (2K6R)
- A particular way of scoping a sharing effort organizationally is by value chain. Continuing our example scenario, suppose that two health care organizations that maintain immunization records have a legacy whereby they have no commonality at any metalevel, yet they need to exchange data in the context of one or more e-government value chains. (2K6S)
- The value chain scenario requires sharing at the M0 level, the most challenging metalevel for sharing, which of course requires sharing at the M1 level which, in turn, requires sharing at the M2 level. However, by restricting the organizational scope of sharing to a particular value chain, sharing at all three metalevels becomes more tractable. In practice this means that the organizations would not share common operational data stores, but would exchange M0 data in the context of the value chain. (2K6T)
- Thus, within the context of the value chain, the organizations must establish commonality at all three metalevels. However, value chain scope does not require commonality within the internal scope of the individual organizations that play the roles of partners in the value chains. When one of the organizations receives the M0 data via a link in the value chain, the organizations systems transform the data into the internal form as the data passes out of the context of the value chain and into the internal context of the organization. The transformation may result in changes at the M0 level only, or at the M0 and M1 levels only, or at all three metalevels. (2K6U)
- It is more tractable and achievable to agree on a means (M2) of defining data elements and on a common set of definitions (M1) for the scope of a value chain than for the entire scope of an organization. In this approach, organizations leverage the common means of definition to publish data element definitions that they use to support various value chains, without attempting to publish all of their internal data element definitions. Over time, to varying degrees, organizations may find utility in publishing internal definitions, but they would not be used directly by e-government value chains. (2K6V)
- Issue: Leveraging Semantic Web and MDA Technology (2K83)
- An impressive amount of effort is going into building out infrastructure that supports the Semantic Web, and infrastructure based on MDA’s MOF is spreading as well. Reasoners based on RDF and OWL, the two primary Semantic Web specifications, are proliferating and becoming more sophisticated. At the same time, MOF technology is deeply wired into Eclipse, an open-source integrated development environment (IDE) that is gaining enormous traction in tooling for enterprise systems development. (2K6X)
- Michael Daconta has recommended that the DRM leverage the Semantic Web, and has also indicated in his comments and in other public remarks that leveraging MDA is worthwhile as well. (2K6Y)
- Leveraging the Semantic Web (2K6Z)
- As indicated earlier, grounding the DRM semantically using Semantic Web technology would be the key to leveraging Semantic Web reasoners that can help detect inconsistency in definitions and aid architects in finding definitions that meet certain criteria. Space does not permit going into this matter in detail, but suffice it to say that generating an OWL-based definition of the M2 data element definition mechanisms that DRM proposes would be a critical step along this path. This would entail producing an OWL-based ontology of the ISO 11179 definition mechanisms, and probably of 11179’s registration mechanisms as well. (2K70)
- Leveraging MDA (2K71)
- The build-out of MOF-based infrastructure is evolving to support enterprise-class repositories that manage UML models, entity-relationship models, relational database models, OLAP star schema models, workflow and business collaboration models, software deployment models, and additional models that are part of a model-driven enterprise architecture. Until now, these different kinds of models have been isolated from each other in silos within the enterprise, but MDA’s MOF-based common approach to managing disparate kinds of models in an integrated fashion is beginning to knit these islands together. (2K72)
- There are several approaches to leveraging MDA’s MOF technology and thereby positioning the DRM to take advantage of the growing Eclipse ecosystem and other MOF tooling. One approach involves creating MOF metamodels of the M2 data element definition mechanisms that DRM proposes. MOF metamodels are very much like UML class models, and MOF architects use UML tools to create them. (2K73)
- For example, the three diagrams below are parts of a prototype MOF metamodel of ISO 11179 Part 5 developed and released into the public domain by Visa International's Visa Data Authority. (2K87)
- It is not a complete metamodel of 11179. The blue boxes represent classes for which constraints expressed in UML’s Object Constraint Language (OCL) have been written. For example, the Equivalence class in Diagram 3 has a formal constraint that reads: (2K86)
- Eclipse-based and other MOF-based tooling uses such M2 metamodels to generate XML schemas and Java APIs for representing M1 instances of the metamodel. In this case, the generated XML schema and Java APIs for ISO 11179-5 would be used to render specific data element definitions in the form of XML documents and Java objects, respectively. Such tooling also generates code that parses conforming XML documents into Java objects—that is, it converts XML based representations of the definitions into Java objects, where the XML documents conform to the generated schema and the Java objects expose the generated APIs. It also generates code that serializes definitions from Java form into XML document form. All of this generation follows patterns that have been codified in the MOF standards, and the consistent patterns facilitate the integration of different kinds of models. (2K7B)
- Using the Semantic Web and MDA Synergistically (2K7C)
- There are a number of ways to use the Semantic Web and MDA synergistically. Michael Daconta has pointed out that bridging these two technologies is important. Footnote 5 A project is underway in the OMG to define a UML Profile for OWL and a more general UML-to-OWL mapping, which would include a MOF-to-OWL mapping. The project is also defining MOF metamodels of RDF and OWL. (2K7D)
- The basic value proposition for the DRM to leverage both the Semantic Web and MOF Footnote 6 is to gain the reasoning capabilities of the Semantic Web, and to gain the model and metadata management facilities of MOF-based tools and thereby to: (2K7E)
- Ease the integration of the Semantic Web ontologies into enterprises where MOF-based tooling is being applied to the management of UML models, entity-relationship models, and more (2K7F)
- Enable MDA to take advantage of the Semantic Web reasoners (2K7G)
- Furthermore, as Michael Daconta has pointed out, a UML Profile for OWL also makes it possible for UML-savvy architects to use UML tools to model OWL ontologies. (2K7H)
- Issue: Contexts and Business Collaborations (2K7I)
- The DRM proposes to use the BRM to establish the business context of a data element definition. Michael Daconta’s Powerpoint presentation entitled “Designing the FEA DRM for Information Sharing” expands the notion of context to include a Subject Context, a Service Context, and a Security Context. (2K7J)
- It may be advisable to also establish a business collaboration context. E-government value chains are collaborations in which a number of services are invoked among parties to the value chain. These collaborations constitute protocols that are an important part of the context of data element definitions, when such definitions are used by value chains. (2K7K)
- There are a number of ways to model collaborations. UML interaction models, the UML Profile for EDOC’s component collaboration models, and other means exist. Business collaboration models are closely related to business process models, which also can help to define the business context of a data element. A number of means exist to define business processes, including BPMN, UML, and so on. (2K7L)
- Conclusion (2K7M)
- The main points of this review of the DRM are: (2K7N)
- Michael Daconta’s proposals make eminent sense. (2K7O)
- The DRM must separate concerns among: (2K7P)
- Sharing the means for defining data elements (M2 commonality) (2K7Q)
- Sharing specific data element definitions (M1 commonality) (2K7R)
- Sharing instance data (M0 commonality). (2K7S)
- The DRM must guide users to carefully define their sharing topology along the dimensions of metalevel and organizational scope. (2K7T)
- The DRM must leverage the progress in building out infrastructure for the Semantic Web and the OMG’s MDA. (2K7U)
- Business collaborations and business processes are an essential part of the business context of a data element. (2K7V)