Gary, Pat & All:
Regarding the hub approach:
In our first meeting John Sowa suggested that we tried this before and we better have something new and different. The technical and philosphical principles of the world wide web answer John's question and the issue at hand is how ontology and taxonomy co-ordination informs ontology and taxonomy design, not how design restricts co-ordination. The more interesting of these princples include tolerance, decentralization, test of independent invention, principle of least power, free extension, language mixing, and partial understanding. See http://www.w3.org/DesignIssues/Principles.html and http://www.w3.org/DesignIssues/Evolution.html
Where these principles apply, information flows through channels based on observable regularities. A channel provides an infomorphism across classifications at its source and destination based on an interpretation that meets conditions that can preserve structure or semantics. The expressiveness of a language restricts the logic used in classifications.
Our team at GSA is currently working through a use case based on this approach using MDA and Semantic Web technologies.
TopQuadrant's report calls out a design pattern called an Axiom Bridge which links the PRM and the BRM. So where TQ has already shown how to "connect" two FEA-RMO ontologies, what's different about other connections? Other options also include Marco Schorlemmer's IF-Maps methodology referenced here
So, I think it's in the context of information flow that will give us the new and different we need here ...
"Cassidy, Patrick J." <pcassidy@xxxxxxxxx>
Sent by: ontac-forum-bounces@xxxxxxxxxxxxxx
11/18/2005 04:09 AMPlease respond to"ONTAC-WG General Discussion"
To "ONTAC-WG General Discussion" <ontac-forum@xxxxxxxxxxxxxx>
bcc Richard C. Murphy/IAA/CO/GSA/GOV
Subject RE: [ontac-forum] Some thoughts on hub ontology and merging sources
Gary Berg-Crosshas articulated a number of legitimate issues concerning the method for moving toward a Common Semantic Model. His points are diverse, and think it best to respond to some of these points by describing my own views -- which may not be those ultimately chosen by the COSMO-WG as a whole.
(1) Hub? I think that ultimately one COSMO will best serve the purposes of interoperability, and for me the question is how to get there -- to start with one upper ontology and elaborate it by mapping to the multiple KCSs used by ONTACWG members, or to start with with several competent ontologies, and to merge them. My suggestion was to try both approaches simultaneously, i.e. in one part to investigate the potential for merger by formalizing the UMLS semantic network using each candidate upper ontology separately, and to compare the resulting fragments of each upper ontology to determine how closely those parts are related and just how easy a merger would be. In a second part, the FEA-RMO and DoD Upper taxonomy could also be formalized with respect to one of the upper ontologies, chosen by some criteria such as those suggested by Eric Peterson. That is, we would not prejudge the issue of whether or not to choose one existing hub from the start, but to gather more information to determine whether a merger would be so difficult that choosing one existing ontology is the only practical course. The question of using an existing ontology as a hub for formalization of includes the question of what criteria would be used to choose the hub, and any thoughts on that issue would be welcome right now.
Concerning Gary's feeling that:
>> I don’t see how this could be done with merging some existing ontologies and various people have talked about using UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926, FEA-RMO and the DoD Core Taxonomy.
It is important to distinguish the formal upper ontologies from the FEA-RMO and DoD Core taxonomy. The latter two are candidates for formalization with respect to one of the upper ontologies, they would not serve as upper ontologies on the same level as the others.
It is not clear a priori without careful investigation that any one of the potential hubs has everything that is in all of the others, and does not need supplementation. The formalization of the UMLS-SN is a non-trivial but tractable test case that should provide some objective information to help evaluate the candidates and estimate the difficulty of merger. I am hoping that we can begin discussing specifics soon.
(2) The FEA-RMO and DoD core taxonomy do indeed have problems if looked at as starting ontologies. The DoD Core doesn't pretend to be an ontology, and the FEA-RMO is, as Gary points out, closer to a Reference Model, though expressed in ontology format. There are nevertheless good reasons to try to connect these to the UMLS. (a) The FEA-RMO is an active project, which may well form part of any common ontology ultimately developed for use within that federal government. There are expectations that the SICoP will help in some way in the development of that model. The ONTACWG could be helpful in making recommendations on the best way to formalize the FEA-RMO and align it with other knowledge representation efforts within the government, even if much of the work will in fact be done by contractors. I think it would be unfortunate if we ignored it, since we can bring a broad perspective that can make ontologies developed from or related to the FEA-RMO more functional. (b) The DoD core taxonomy is still an active project, and its purpose is very much similar to that of the COSMO, to create a hub with which domain taxonomies in DoD can be aligned and related. But it is not intended to function as an ontology for logical inferencing. By defining the terms of the DoD Core using concepts from the COSMO, we can make it more useful for logical inferencing, and the mapping of the DoD Core will have a multiplicative effect because of its already existing relations with other taxonomies. If there are hierarchical links in the DoD core that appear invalid form an inheritance point of view, those links would not appear in the formalized version, but the individual terms could still be defined by reference to the COSMO. So the lack of formality of the DoD Core would make the precise formalization more time-consuming, but would not affect the formalized version, which would have only valid relations Combined with the UMLS-SN, these two knowledge classifications provide a diverse set of topics to serve as test cases for building a COSMO and defining domain knowledge classifications by means of the COSMO. To be feasible for a volunteer group yet broad enough to provide a realistic test of the potential for semantic integration, I think that this set of knowledge classifications contains the proper balance of complexity and restricted subject matter, and they also relate directly to the purpose of the SICoP, the parent of the ONTACWG.
(3) The ANSI X3/T2 Ad hoc group on ontology standards:
I participated in the activity of this group until it faded away around 1998 or so. As I recall, there were some productive discussions of issues, but at least one proposal to obtain funding for the activity was rejected. In the absence of support, it was not possible for participants to spend enough time to resolve the very complex technical issues, and participants returned to work on the projects for which they could get funding. Neither OpenCyc nor SUMO nor DOLCE were available at that time.
Another smaller project was funded by the Klaus Tschira foundation, to bring together 20 or so ontologists for two weeks in 1998 to come to agreement on the structure of the topmost levels of an upper ontology. Most (not all) participants seemed to think that useful progress was made, but that a lot more additional work was needed. I do not think that a formal report was prepared with details of the agreements reached.
A later effort, the IEEE-SUO project (2001 - present), did get some financial support, and the SUMO ontology was created from that. The SUMO had some input from the IEEE group, but was mostly built in-house by Teknowledge. The absence of funding for anyone outside of Teknowledge left in place the same problems that bedeviled the ANSI group, namely that the concerns of multiple communities could not be addressed effectively, and the resulting ontology represented the views predominantly of a small group, who of necessity had to move forward quickly to provide the deliverables, regardless of whether external participants agreed with the results. This problem was not primarily a result of any lack of interest of the Teknowledge builders for input from the wider community, but was a structural problem resulting from the amount and distribution of funding. This project illustrated one major barrier to agreement -- that in the absence of applications that could test alternative suggested ontological representations, and without a mechanism for efficiently including alternative representations, an ontology, even built to some extent collaboratively, will inevitably have a lot of elements that are unsatisfactory to one or another potential user, or lack some they want.
The resources that were discussed by the ANSI group then are still valid, but have been supplemented by additional more developed ontologies and lexical resources. We have invited participation by all groups, including those that created those older resources, but as a volunteer group, it is only possible for us to make substantive efforts using the resources that are of interest to ONTACWG participants. The WordNet could be investigated directly if there were any members with a special interest in that classification. But it will be included in any case indirectly, because there are mappings of SUMO and Cyc to the WordNet.
There never has been adequate funding for a substantial collaborative project involving multiple groups, to find agreement on a common upper ontology, even though proposals have been made since the time of the ANSI efforts. We can do what is possible without funding, and hope to encourage agencies to provide funding at some point soon. The cBio project may help indirectly, if any people participating in that project include some work with upper ontologies -- or better yet, with the COSMO -- in their proposals.
The ONTACWG, without funding, can make progress only to a limited extent, but we have more experience and more examples now to work with than the earlier projects did. So our task will be easier than the earlier ones. A plausible goal would be, not to create a common upper ontology de novo, but to investigate within some limited compass how to choose or adopt one or more existing upper ontologies to serve as the hub to which multiple domain knowledge classifications can be related. As mentioned above, I think the ONTACWG can serve a valuable function by investigating the alternative approaches of choosing one upper ontology versus merging several. Even within the limited scope I suggested, this will take a lot of detailed work. To address the problem of accommodating the views of different users, my suggestion has been to structure the COSMO so that it will incorporate all views considered useful by participants. The result is likely to be a lattice of theories. The precise nature of the lattice that will be needed is not agreed on at present. Discovering the details of what is needed to accommodate multiple ontological views will be, I think, one of the important results of the attempt to formalize the UMLS-SN versus multiple upper ontologies.
(4) DOLCE and OntoClean
The DOLCE ontology is one of the upper ontologies that the COSMO-WG will be investigating as a base for the formalization of the UMLS-SN, and perhaps of the other KCSs that we are interested in. DOLCE and BFO are the base ontologies that Olivier Bodenreider and Lowell Vizenor are planning to use in their already ongoing effort to formalize the UMLS-SN. The other upper ontologies of interest to ONTACWG participants will be investigated if any of the participants decide to take the lead in that effort. So the OntoClean method will be considered as part of the COSMO-WG agenda. There are other formal methodologies that have been discussed in the literature, such as Michael Gruninger's "Semantic integration through invariants" in the Spring 2005 AI magazine. I hope that it will also will be possible to use such ideas in the COSMO effort.
(5) Other approaches
The suggestions made thus far do not preclude alternatives. It is, I think, potentially useful for multiple approaches to be tried at the same time, because we are in a position to share intermediate results on a regular basis and take advantage of each other's experience as soon as anything useful is discovered or proposed, and posted to our Wiki. The ONTACWG is not a competition, but a collaboration. So the inefficiencies of trying alternative approaches in traditional research, where one learns of the results of other work a year or so after a project has been completed, should not occur in the work we are discussing. If any participants have a particular interest in trying a specific approach toward relating KCSs to each other, the ONTACWG environment should provide feedback and support that will make individual efforts more productive.
The paper by Doerr, Hunter, and Lagoze starts off expressing a sentiment that is also shared by myself and a lot of others, that some common top-level ontology will be needed for accurate semantic interoperability. And some of the specific observations they make as a result of their study are, I believe, valid. But as has been pointed out, their proposed merged ontology does have some problems, and falls far short of the best developed ontologies that are already available. A one-on-one alignment of that kind may be useful for applications that are restricted to two or a small group of communities, but our interest is much broader. I think we can do a lot better.
260 Industrial Way
Eatontown, NJ 07724
Mail Stop: MNJE
Email: pcassidyat mitre.org
From: ontac-forum-bounces@xxxxxxxxxxxxxx [mailto:ontac-forum-bounces@xxxxxxxxxxxxxx] On Behalf Of Gary Berg-Cross
Sent: Thursday, November 17, 2005 5:32 PM
Subject: [ontac-forum] Some thoughts on hub ontology and merging sources
I wanted to follow up Eric Ps earlier message about a hub approach to building our common ontology. I think that his questions and issues got side tracked by the responses to Roy’s "general ontology".
Eric was curious about “how pervasive those anti-hub feelings really are. “ I’m not of one feeling on this issue, since I think it is complex and would welcome some discussion of some of it. Eric had particular ideas on Dr. Sowa’s sub-sumption lattice idea, but I haven’t heard responses to that and perhaps others can respond to it.
For myself, I could imagine going a hub or modular approach depending on the quality of the hub. I’d have to be convinced that it was doable with our resources, and would want to know the “seed” for it and the process or development. I don’t see how this could be done with merging some existing ontologies and various people have talked about using UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926, FEA-RMO and the DoD Core Taxonomy. Assuming even this as a start I have some issues and a approach to discuss as a strawman.
The FEA-RMO is an Ontology of a Reference Model and not of an actual domain such as health. It seems quite hard to connect to this to others.
Also, the DoD taxonomy, in my opinion, has the degree of problems that Barry and John pointed out in the "general ontology" so it may not be easy to assimilate. We might start without trying to merge these in and also might start with the best 2 or 3 as candidates to seed an effort.
Another point or question concerns leveraging the experience of past efforts. Back seven years or more there was an effort by the ANSI Ad Hoc group to construct a standard, called the Reference Ontology. They had a five-step approach for the following
Upper levels(approx. 100,000 terms): Bring into correspondence (to align) the terms of a small number of selected large-scale ontologies (eventual size approx. 100,000 items). Do so inclusively; that is, create a result in which users can choose which of the component ontologies' terms they wish to see and use. Domain models(under 2,000 terms each): Link into this Ontology selected domain-specific ontologies, developed to support reasoning about time, space, physics, geography, etc. Do so inclusively; allow the linkage of various different models of time, space, etc. Access tools: Create easy-to-use tools for Ontology access and extension. Dissemination: Place the resulting Reference Ontology on the Web, freely available. Theoretical basis: In ongoing work, have a team of highly qualified individuals comb through the Ontology to find powerful generalizations, to weed out unnecessary and inconsistent items, and to create a maximal factoring of the upper levels of the Ontology.
Seems quite similar to what we are talking about. Whatever happened? Did it fail because it didn’t have an upper ontology?
They listed the following are candidate sources for terms to be included into their “merged Reference Ontology” and a few of these (UMLS , CyC) have been mentioned as a base for us too :
- USC/ISI: Pangloss Ontology SENSUS approx. 70,000 terms, general coverage, little detail, taxonomization supports Natural Language applications.
- Princeton: WordNet approx. 70,000 terms, general coverage, little detail, taxonomized on Naive Semantics / Cognitive Science principles.
- CYCorp: Upper portion of CYC ontology approx. 2,500 terms, general coverage, little detail, taxonomized on Naive Semantics / AI principles. Later additions may include more of the 40,000-odd terms currently in CYC.
- EDR: Upper portion of EDR concept ontology approx. 1,000 terms, general coverage, medium detail, taxonomized for Natural Language applications. Later additions may include more of the approx. 400,000 terms in the EDR concept lexicon.
- New MexicoStateUniversity: MIKROKOSMOS approx. 4,000 terms, general coverage, detailed, taxonomized for Natural Language applications.
- European Union: EuroWordNetóunder construction; probably approx. 50,000 terms, little detail, taxonomized on Naive Semantics / Cognitive Science principles.
- LXT Inc.: UMLS medical ontology exceeds 50,000 terms, medium detail, taxonomized for medical reasoning applications.
Perhaps some of these should also be on our list if they have “matured”.
A last point/issue concerns alignment between our starting sources and how to start on this. Martin Doerr and others did some work reported in “Towards a Core Ontology for Information Integration” and described the comparison and convergence of 2 ontologies using the OntoClean approach. (Guarino, N. and Welty, C., “Evaluating ontological decisions with OntoClean,” Communiations of the ACM, 45 (2), pp. 61-65, 2002,)
This uses an analyses of top-level ontological distinctions related to:
1. instantiation versus membership
2. part-of and mereological axioms
5. location and extension
6. co-extension, co-connection
7. unity, singularity and plurality
The claim is that the OntoClean approach “enables: the detection of concept definitions that are lacking in clarity or rigidity; the justification of valid sub-sumption relations; and the detection of invalid sub-sumption declarations. “
Would it be useful to start looking at the matchup of some of our “seed” ontologies in this way (or a better way that might be proposed)? The Doerr paper discusses the process of finding common conceptualizations by equivalency between a concept of “Temporality” on Ont1 and “Temporal Entity” in Ont2 or “Action” and “Activity” etc. Some of the foundation concepts. It might be useful to make our discussions concrete if we are planning on using a merger of UMLS DOLCE/BFO, SUMO, OpenCyc, ISO15926etc.
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/