=======================================================================================    (2K7X)

Footnotes from Comments    (2K7Y)

=============================================================================================================    (2K82)

The Three Pillars, an Adaptation of Information Management and Data Quality: Context, Issues, Concepts, Strategy and Lessons Learned (A Personal Perspective)    (2IYU)

by Bryan Aucoin    (2IYV)

The Problems with Data:    (2IYW)

Many organizations in both the private and public sector are increasing their focus on the need for quality data. They have a variety of motivations. One of the most critical of these factors is that senior managers cannot get reliable, timely answers to seemingly simple questions - answers that they need to make informed decisions. They find that there is no one place to go to, to get the "correct" answer, or worse, there are competing "right" answers. They find that the lines of responsibility for obtaining the data are blurry, and the data themselves are not current, consistent or correct. From a Chief Information Officer's perspective, a large percentage of resources available for development are spent building things to pass data around, since the data required to support any given application are more than likely stored in multiple repositories. The structure and the semantics of the data must then be reconciled to make the data useful. Once interfaces are built, they must be maintained.    (2IYX)

I believe the "knee jerk" reaction by most senior managers when faced with a problem of data is to call in the CIO and tell him/her to "Fix it!". I think the first tendency by CIOs and their line managers is to seek technology-oriented solutions. This tendency is aided and abetted by tool vendors and the internal technologists who really believe that with the appropriate suite of products and enough cogitation, the data integrity problems will be solved within the enterprise. I believe that in most cases, the insertion of technology can lead to improved accessibility to available data, but this, in turn, leads the enterprise to a new set of issues that are rooted more fundamentally in questions about the business and the data that supports it.    (2IYY)

In this paper, I hope to lay out a perspective and strategy for organizations who have reached this state of maturity.    (2IYZ)

The Starting Point: The Evolution of Data Repositories    (2IZ0)

It would be an extremely interesting study to trace the evolution of the processes and technologies that organizations use to provision data. In absence of such a body of work, I will make some assertions and leave it to others to prove or disprove.    (2IZ1)

I believe that in many medium-to-large enterprises, applications development is typically tightly bound to a line of business organization. Multiple applications development shops support specific lines of business, even when applications development has been consolidated into a single organization. Also, typically, data is tightly bound to the applications that use them, even when an n-tiered application architecture is the endorsed approach.    (2IZ2)

If we look inside each organization and it's corresponding IT support organization, we will typically find heavy investments in "vertical" information technology. The line organizations will have one or more large "ERP-ish" systems that support the core business processes. Over time, new business requirements drive new information requirements. Many times, "local" IT support staffs are driven to respond quickly to their customers' requirements. Because updating large applications requires more time and resources, the IT organization create additional small applications (and databases) in order to be responsive. Usually, data must be interchanged between the large application(s) and the small applications, or data is simply rekeyed into the smaller applications, resulting in a web of interfaces and redundant work.    (2IZ3)

Usually, requirements for information access and interchange emerge as the enterprise attempts to better manage available resources, improve processes, make informed decisions and chart direction. Again, the IT organizations create a web of interfaces to transfer and translate data to support executive information systems, integrated business processes and so on. A variety of approaches are used: data warehouses/data buses, batch processes, and middleware.    (2IZ4)

Factors and Outcomes    (2IZ5)

The evolution above tends to result in a number of outcomes:    (2IZ6)

Major Types of Problems in the Use of Data    (2IZB)

In my work as a data architect, I have found that there are three major inhibitors to the effective use of shared data:    (2IZC)

  1. Irreconcilable Semantics: The definitions of data obtained from disparate repositories are different enough so that "answers" obtained by aggregating data from these repositories are not valid. As an example: EMPLOYEE in one database is defined in one fashion. EMPLOYEE in another database is defined differently. Reconciling the definitions using other ancillary data in the repositories may be possible, but is not always.    (2J0H)
  2. Missing Relationships: There is no business process or system that captures the association of one entity to another. As an example, an organization's procurement division may be interested in who manages a contact for contract employees, but may not be interested in the specific internal organizations who receive these services or the specific number of people. The facilities may have a keen interest in such information to support charge-backs and planning.    (2J0I)
  3. Currency and Consistency of the Information: Aggregation of data from disparate repositories will not yield reliable information if shared data is not current, particular those data that are used to establish keys.    (2J0J)

The Three Pillars:    (2IZG)

In our work on data architecture and data quality, I have found that there are three pillars to the development of a successful data management program. All of them must be addressed concurrently to be able to bring an enterprise forward - standards, policy and process, and technology. I'll discuss each in turn.    (2IZH)

Data Architecture    (2IZI)

Data architecture is an element of an enterprise architecture that comprises a data model and assigns accountability for data integrity (this is also referred to as Information Architecture). The data architecture reflects business area entities with attributes and establishes accountability for this information in business process improvements. The data model that describes the data architecture illustrates the interrelationships among real-world objects and events integral to an enterprise's business. This data architecture is independent of hardware, software, or machine performance considerations.    (2IZJ)

Our core premise in developing data architecture is that it must be defined incrementally by specific business requirement. While one should start with a high-level framework, specific business requirements for shared information should drive the definition of standards. The road to success in developing an architecture is to bind it to the business. If architecture solves a specific business need, it will be accepted.    (2IZK)

Data Governance    (2IZL)

Data governance is the practice of making enterprise-wide decisions regarding an organization's information holdings. Data governance includes the determination of data sources, responsibilities for integrity, defining requirements for business process development and change, and mechanisms of arbitrating differences among stakeholders. In a nutshell, there has to be way to make hard decisions within the enterprise.    (2IZM)

We have found that formally assigned, empowered stewardship is a critical factor in governance. In our model, the stewards are line business managers fully empowered to mandate enterprise-wide standards and process changes. We have established an arbitration process when stewards disagree. The key aspects of stewardship is authority and accountability. The stewards are the place to go to get definitive answers for their respective subject areas. In practice, most of the decisions are made within a line level board composed of the stewards' representatives, but stewardship provides the framework for discussion.    (2IZN)

Data Sharing Architecture    (2IZO)

Data sharing is the practice of provisioning data from an information source to an information consumer in response to a business requirement. A data sharing architecture is a standard, repeatable technical pattern for sharing data. If an enterprise can enforce its architecture through a governance process as data is shared to support real business needs, then enterprise has a good chance of creating quality data.    (2IZP)

An Interesting Problem for the Future    (2IZQ)

The paper has focused on data that an enterprise creates and manages for its own use. However, we are increasingly dependent upon data that we do not create, but obtain from others. Characteristically one may access data, but has no control of either the semantics or syntax of the external sources of data. Further, data from external stores are aggregated with each other and with internal stores to create new information and insight. How do we assess the quality of information derived from aggregated data? One can begin to understand association of people, locations, things and events by aggregating data, but the question remains "How good is the association?".    (2IZR)