Department of Justice Feedback - FEA DRM 2.0 Review    (3023)

Kshemendra Paul Chief Architect USDOJ    (3024)

1.1 SUMMARY Much has been accomplished in the development of the abstract model currently contained in the FEA DRM 2.0 document. The existing DRM may be a first step in the direction toward an implementable data architecture model. However, there remains much to be accomplished to create a model that supports an actionable, operational data and information strategy which will succeed in addressing the mission of the Department of Justice (DOJ). Information sharing, data exchange, and collaboration between the various entities that comprise the DOJ (and the Federal Government), all rely on such a strategy and supporting DRM. The current model is not sufficient to address semantic interoperability, not specific enough to provide useful implementation planning guidance, and not realistic in its provision of solutions to data issues.    (3025)

1.2 SEMANTIC INTEROPERABILITY Broad semantic interoperability is far from a certain success and the path chosen to accomplish it is too grounded in a specific technology (XML/RDF/OWL) without accounting for the limitations of that technology and considering alternatives. The DRM 2.0 outlines, mostly in the form of forward-looking statements, that implementation of certain guidelines will lead to the emergence of a broad information sharing environment where all data concepts are universally understood and can be made accessible and available for any mission need. At the same time, the document trivializes the complexities of the information exchange by identifying various types of systems and focusing attention on information transfer between these various types of systems in the context of a single mission, but not across missions or across department. The DOJ has a real and immediate need for a model, strategy, and solution for data discovery, understanding, and sharing across its various components and systems as well as across departments.    (3026)

1.3 SPECIFICITY AND USEFULNESS OF THE MODEL FOR IMPLEMENTATION The DRM 2.0 states sound principles of data management and presents a good reference (but not necessarily a reference model) for such, but it does not present specific recommendations for implementation based on various levels of maturity of the data and overall enterprise architecture of any given department. The model also does not account for the makeup of the departments and provides the same recommendation for smaller, mostly homogeneous, environments as it does for very complex, heterogeneous, environments that consist of many large components, each with a separate mission and a robust data model developed specifically to support that mission. In such a complex environment, building the department data model and accounting for similar or semantically equivalent data across all components may prove unproductive and possibly detrimental to the mission of each department by diverting development resources. The DRM’s definition of the community of interest (COI) is vague at best and, while the main value in information sharing should come from increased interoperability within COIs, ways for creating and identifying these communities are not yet suggested or defined. Finally, the DRM alludes to registries and repositories of data and metadata. However, it does not address how and when these mechanisms should or will materialize.    (3027)

1.4 FEASIBILITY OF THE SUGGESTED DATA SOLUTION Where data description is concerned, the DRM categorizes data into structured, semi-structured and unstructured. This explanation is somewhat simplistic and does not offer significant context for handling data that is not structured—the DRM suggests a Dublin Core set of metadata attributes and the promise of Semantic Web. While that may be a good starting point for the automatic processing of the semi- and unstructured information, Semantic Web technologies are not yet sufficient even with addition of the automatic classification and inferencing mechanisms. XML provides significant advantages for processing of semi-structured data. However, it should be considered that a significant portion of existing information is not in XML format and may never be expressed in XML. In this case, XML introduces an additional level of complexity without providing a complete solution—there is no good way of expressing imperative logic over schema-flexible data and there is no language to describe complex integrity constraints and assertions. Data context and discovery mechanisms do not take into account the dynamic nature of data (and metadata, in the case of the semi-structured and unstructured data). Attempts to classify all information collected by the Federal Government in a simple and consistent manner will inevitably fail, drowning in the massive volumes of new information generated in the process, with no clear owner and steward for this classification scheme.    (3028)