- Footnote 1: Section 515 of the Treasury and General Government Appropriations Act for Fiscal Year 2001 (2IYR)
- Footnote 2: The focus areas are adapted from, The Three Pillars, an Adaptation of Information Management and Data Quality, by Bryan Aucoin, Panel 1, proceedings of the Eighth International Conference on Information Quality, (ICIQ-03) (2IYS)
======================================================================================= (2K7X)
Footnotes from Comments (2K7Y)
- Footnote 3: MDA did not invent the concept of metalevels. Metalevels have been a concept in the field of knowledge representation for some time. (2K7Z)
- Footnote 4: Note that the Federal Enterprise Architecture’s Business Reference Model (BRM) contains both M2 and M1 specifications. It specifies a means for defining business contexts in terms of Business Areas, Lines of Business, and Sub-Functions. This means is an M2 thing. It also contains particular business context definitions, including four Business Areas, 39 Lines of Business, and 153 Sub-Functions, which are M1 things. (2K80)
- Footnote 5: GCN Interview with Michael Daconta, February 12, 2004 (2KBV)
- Footnote 6: A paper written cooperatively by Semantic Web and MDA experts about this project explains the potential for synergy between the Semantic Web and MDA in more detail. See Frankel, Kendall, Hayes, and McGuinness, A Model-Driven Semantic Web, MDA Journal, July 2004] (2K89)
- Footnote 7: www.markle.org/downloadable_assets/nstf_report2_full_report.pdf (2KFG)
- Footnote 8: The DRM ontology shown is one of the FEA Ontology Models developed by TopQuadrant for the FEA Capabilities Manager Pilot Project in the eGOV SICoP group. (2KFH)
- Footnote 9: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/ (2KFI)
- Footnote 10: http://www.w3.org/2001/sw/ (2KFJ)
- Footnote 11: For an insight into functional data types, see for example, the functional programming data types of ‘dessy’, for example, CommutativeCollection, AssociativeCollection, OrderedMap - www.stud.tu-ilmenau.de/~robertw/dessy/fun/ (2KFK)
- Footnote 12: partners.adobe.com/asn/tech/xmp/pdf/xmpspecification.pdf (2KFL)
- Footnote 13: http://www.prismstandard.org/ (2KFM)
- Footnote 14: http://dmag.upf.es/ontologies/2003/12/ipronto.owl (2KFN)
- Footnote 15 Note that while ISO1500-5 and the CCTS are ideally suited to XML they confer an additional advantage to the FEA in that they are syntax neutral and can be used with any exchange standard. (2LVU)
- Footnote 16: For example in the purchasing process, a purchase order (PO) leads to a PO confirmation, a shipment notice, a receipt, and an invoice – all with potential modification transactions. This group of transactions contain an extremely high percentage of redundant data elements and groups of data elements (e.g. buying organization, selling organization, ship-to organization, etc.). Within our department the requisition process exhibits all the same patterns. (2LVV)
============================================================================================================= (2K82)
The Three Pillars, an Adaptation of Information Management and Data Quality: Context, Issues, Concepts, Strategy and Lessons Learned (A Personal Perspective) (2IYU)
by Bryan Aucoin (2IYV)
The Problems with Data: (2IYW)
Many organizations in both the private and public sector are increasing their focus on the need for quality data. They have a variety of motivations. One of the most critical of these factors is that senior managers cannot get reliable, timely answers to seemingly simple questions - answers that they need to make informed decisions. They find that there is no one place to go to, to get the "correct" answer, or worse, there are competing "right" answers. They find that the lines of responsibility for obtaining the data are blurry, and the data themselves are not current, consistent or correct. From a Chief Information Officer's perspective, a large percentage of resources available for development are spent building things to pass data around, since the data required to support any given application are more than likely stored in multiple repositories. The structure and the semantics of the data must then be reconciled to make the data useful. Once interfaces are built, they must be maintained. (2IYX)
I believe the "knee jerk" reaction by most senior managers when faced with a problem of data is to call in the CIO and tell him/her to "Fix it!". I think the first tendency by CIOs and their line managers is to seek technology-oriented solutions. This tendency is aided and abetted by tool vendors and the internal technologists who really believe that with the appropriate suite of products and enough cogitation, the data integrity problems will be solved within the enterprise. I believe that in most cases, the insertion of technology can lead to improved accessibility to available data, but this, in turn, leads the enterprise to a new set of issues that are rooted more fundamentally in questions about the business and the data that supports it. (2IYY)
In this paper, I hope to lay out a perspective and strategy for organizations who have reached this state of maturity. (2IYZ)
The Starting Point: The Evolution of Data Repositories (2IZ0)
It would be an extremely interesting study to trace the evolution of the processes and technologies that organizations use to provision data. In absence of such a body of work, I will make some assertions and leave it to others to prove or disprove. (2IZ1)
I believe that in many medium-to-large enterprises, applications development is typically tightly bound to a line of business organization. Multiple applications development shops support specific lines of business, even when applications development has been consolidated into a single organization. Also, typically, data is tightly bound to the applications that use them, even when an n-tiered application architecture is the endorsed approach. (2IZ2)
If we look inside each organization and it's corresponding IT support organization, we will typically find heavy investments in "vertical" information technology. The line organizations will have one or more large "ERP-ish" systems that support the core business processes. Over time, new business requirements drive new information requirements. Many times, "local" IT support staffs are driven to respond quickly to their customers' requirements. Because updating large applications requires more time and resources, the IT organization create additional small applications (and databases) in order to be responsive. Usually, data must be interchanged between the large application(s) and the small applications, or data is simply rekeyed into the smaller applications, resulting in a web of interfaces and redundant work. (2IZ3)
Usually, requirements for information access and interchange emerge as the enterprise attempts to better manage available resources, improve processes, make informed decisions and chart direction. Again, the IT organizations create a web of interfaces to transfer and translate data to support executive information systems, integrated business processes and so on. A variety of approaches are used: data warehouses/data buses, batch processes, and middleware. (2IZ4)
Factors and Outcomes (2IZ5)
The evolution above tends to result in a number of outcomes: (2IZ6)
- Data are not generally created to support enterprise needs. There are typically technical and political boundaries that inhibit this. To "line" applications development organizations, enterprise-level requirements for data are typically viewed as "external", as their direct customers, and typically the sponsor of the application, is not rewarded for serving the greater good, but for locally optimizing the performance of their organization. (2IZ7)
- Assuming the political and technical issues surrounding the sharing of data can be resolved, the differences in the data themselves constrain their usefulness. Data produced to support a particular organization may not meet the enterprise-level requirements for currency and consistency. Further, definitions differ and it may be that no amount of calculation will resolve the semantics (There is a Nyquest theorem analog for data. If you do not capture data at the appropriate specificity, it is computationally impossibly to recover the desired information.) (2IZ8)
- This brings us to Dr. Wang's 12 dimensions of data quality. All of these dimensions may be satisfactory from the standpoint of people responsible for creating it. In fact, one can make an argument that for any given system, a core set of data maintained within that system must be of sufficient quality, otherwise, the system could not be used. However, when the data are made available for broader use, the quality of the data as perceived by the information consumer is much less. (2IZ9)
- From an information technology standpoint, the evolution I described above leads IT service providers into an "O&M box canyon". The service providers must maintain their core systems, they must maintain the tactically-focused small applications deployed to support an urgent customer need. (And, the customer has come to rely on these applications as parts of daily business.) Changes to applications affecting shared data require modification of the web of interfaces, and are very expensive. The service provider has no funding to transition the small applications to a more robust platform. The customer will not fund a major new IT effort, is demanding more fast, tactical solutions, and is wondering why it takes so long to make seemingly minor changes. (2IZA)
Major Types of Problems in the Use of Data (2IZB)
In my work as a data architect, I have found that there are three major inhibitors to the effective use of shared data: (2IZC)
- Irreconcilable Semantics: The definitions of data obtained from disparate repositories are different enough so that "answers" obtained by aggregating data from these repositories are not valid. As an example: EMPLOYEE in one database is defined in one fashion. EMPLOYEE in another database is defined differently. Reconciling the definitions using other ancillary data in the repositories may be possible, but is not always. (2J0H)
- Missing Relationships: There is no business process or system that captures the association of one entity to another. As an example, an organization's procurement division may be interested in who manages a contact for contract employees, but may not be interested in the specific internal organizations who receive these services or the specific number of people. The facilities may have a keen interest in such information to support charge-backs and planning. (2J0I)
- Currency and Consistency of the Information: Aggregation of data from disparate repositories will not yield reliable information if shared data is not current, particular those data that are used to establish keys. (2J0J)
The Three Pillars: (2IZG)
In our work on data architecture and data quality, I have found that there are three pillars to the development of a successful data management program. All of them must be addressed concurrently to be able to bring an enterprise forward - standards, policy and process, and technology. I'll discuss each in turn. (2IZH)
Data Architecture (2IZI)
Data architecture is an element of an enterprise architecture that comprises a data model and assigns accountability for data integrity (this is also referred to as Information Architecture). The data architecture reflects business area entities with attributes and establishes accountability for this information in business process improvements. The data model that describes the data architecture illustrates the interrelationships among real-world objects and events integral to an enterprise's business. This data architecture is independent of hardware, software, or machine performance considerations. (2IZJ)
Our core premise in developing data architecture is that it must be defined incrementally by specific business requirement. While one should start with a high-level framework, specific business requirements for shared information should drive the definition of standards. The road to success in developing an architecture is to bind it to the business. If architecture solves a specific business need, it will be accepted. (2IZK)
Data Governance (2IZL)
Data governance is the practice of making enterprise-wide decisions regarding an organization's information holdings. Data governance includes the determination of data sources, responsibilities for integrity, defining requirements for business process development and change, and mechanisms of arbitrating differences among stakeholders. In a nutshell, there has to be way to make hard decisions within the enterprise. (2IZM)
We have found that formally assigned, empowered stewardship is a critical factor in governance. In our model, the stewards are line business managers fully empowered to mandate enterprise-wide standards and process changes. We have established an arbitration process when stewards disagree. The key aspects of stewardship is authority and accountability. The stewards are the place to go to get definitive answers for their respective subject areas. In practice, most of the decisions are made within a line level board composed of the stewards' representatives, but stewardship provides the framework for discussion. (2IZN)
Data Sharing Architecture (2IZO)
Data sharing is the practice of provisioning data from an information source to an information consumer in response to a business requirement. A data sharing architecture is a standard, repeatable technical pattern for sharing data. If an enterprise can enforce its architecture through a governance process as data is shared to support real business needs, then enterprise has a good chance of creating quality data. (2IZP)
An Interesting Problem for the Future (2IZQ)
The paper has focused on data that an enterprise creates and manages for its own use. However, we are increasingly dependent upon data that we do not create, but obtain from others. Characteristically one may access data, but has no control of either the semantics or syntax of the external sources of data. Further, data from external stores are aggregated with each other and with internal stores to create new information and insight. How do we assess the quality of information derived from aggregated data? One can begin to understand association of people, locations, things and events by aggregating data, but the question remains "How good is the association?". (2IZR)