ontac-forum
[Top] [All Lists]

RE: [ontac-forum] Some thoughts on hub ontology and merging sources

To: "ONTAC-WG General Discussion" <ontac-forum@xxxxxxxxxxxxxx>
From: "psp" <psp@xxxxxxxxxxxxxxxxxx>
Date: Fri, 18 Nov 2005 09:35:46 -0700
Message-id: <CBEELNOPAHIKDGBGICBGCEPAGNAA.psp@xxxxxxxxxxxxxxxxxx>

 

Patrick communicated to ONTAC-WG General Discussion, a note starting with

 

1) Hub?  I think that ultimately one COSMO will best serve the purposes of interoperability, and for me the question is how to get there -- to start with one upper ontology and elaborate it by mapping to the multiple KCSs used by ONTACWG members, or to start with with several competent ontologies, and to merge them.  My suggestion was to try both approaches simultaneously, i.e. in one part to investigate the potential for merger by formalizing the UMLS semantic network using each candidate upper ontology separately, and to compare the resulting fragments of each upper ontology to determine how closely those parts are related and just how easy a merger would be.  In a second part, the FEA-RMO and DoD Upper taxonomy could also be formalized with respect to one of the upper ontologies, chosen by some criteria such as those  suggested by Eric Peterson.  That is, we would not prejudge the issue of whether or not to choose one existing hub from the start, but to gather more information to determine whether a merger would be so difficult that choosing one existing ontology is the only practical course.  The question of using an existing ontology as a hub for formalization of includes the question of what criteria would be used to choose the hub, and any thoughts on that issue would be welcome right now.

 

 

Respectfully, he seems stuck on wanting to do something that seems right but which has principled arguments against his persistent suggestions.  His voice is similar to others in the standards processes, but this voice fails to appreciate not only viewpoints but also the social reality as expressed by several other statement today to the ONTAC-WG General Discussion.  Several months ago, Patrick and I had phone conversations about this and I represented some of the conversation at

 

http://www.ontologystream.com/beads/nationalDebate/188.htm 

where I conjecture about the error in wanting to have ontology that supports general types of inference of the type observed only in human reasoning and not observed from AI. 

 

At

http://www.ontologystream.com/beads/nationalDebate/191.htm

I further expressed my view of his position and the error that is made in persisting with certain expectations. 

 

I made the mistake of mentioning where he worked (at a large government think-tank-consulting entity) and the discussion became about that (my mistake.)  I have edited this to remove this mention.  I do not think that he is fully aware of the grounding that stratified ontology has in the natural sciences, and that he is focused on a specific viewpoint regarding what makes business and consulting activities work in the current IT consulting world. 

 

The persistence of his arguments does not seem to be persuadable.  As such, I feel that there cannot be a discussion of alternatives to the notion of a single large general-purpose ontological model. 

 

****

 

I am struck with the notion that the proposition of an alignment of two Protege type ontologies, such as FEA-RMO and (say a OpenCyc upper ontology ? if there is one actually available to the public), is a proposition that ignores the reality of interpretation and viewpoint. 

 

Accepting something like the Dublin Core as an available ontology is a different type of path towards interoperability.  Dublin core is accepted by many because it is simple and specifies some really useful information standards, mostly related to the act of publishing information. 

 

But basic concepts to be included in his common large ontology are related to "the reality" of things like duration and composition (the new attack was experienced from Monday through Thursday).  To discuss the detailed nature of "this new attack" within a large community would require something beyond the non-functional properties proposed as a web services ontology model.  So suppose the attack has novel elements or elements that are known but function in an entirely new (and surprising way).  

 

The OASIS work on OASIS Reference Model for Service Oriented Architectures

as discussed briefly at

http://www.ontologystream.com/beads/nationalDebate/201.htm

creates a higher level abstraction (framework) in which web services can be made interoperable (either as a hard wired relationship between interacting systems or as a just in time aggregation activity). 

 

Human communication, at its best, iteratively spreads the knowledge of "a new attack" within a large community... but in the case of 9-11 and the American public the eventual knowledge came to reflect different individual and community viewpoints.

 

The alternative that is proposed by some, is a stratified ontology where some "observed" set of invariance (semantic primes) are commonly used by individuals and other processes (including pattern recognition algorithms) to express (in real time) emergent ontological models.

 

Like in physical chemistry, the diversity of possible chemical compounds is sufficient to cause the chemistry at a moment (the ontology at that moment); and to adapt to the response degeneracy (Gerald Edelman's term for many to many mappings) seen as living systems deal with structure-function choices.

 

I am struck by why this notion of stratified ontology is not ever discussed within any of the standards processes?  Stratified ontology can be grounded in physical science (chemistry), in speech production by humans (phonemes), and in language use and generation (Tom Adi's work)

 

http://www.bcngroup.org/beadgames/generativeMethodology/AdiStructuredOntology-PartI.htm

 

Adi's work has been around and in software form for two decades, but remains largely outside the Academy and outside of the consulting/standards processes.

 

But one can look at the Zackman Framework and Sowa's semantic primitives to see the stratified approach, even if not recognized as such.

 

This absence of discussion about stratified theory is why I am also bothered by insistence on a large common "fixed" and controling ontology. 

 

 

 

Paul Prueitt

703-981-2676

 
-----Original Message-----
From: ontac-forum-bounces@xxxxxxxxxxxxxx [mailto:ontac-forum-bounces@xxxxxxxxxxxxxx]On Behalf Of Cassidy, Patrick J.
Sent: Friday, November 18, 2005 2:10 AM
To: ONTAC-WG General Discussion
Subject: RE: [ontac-forum] Some thoughts on hub ontology and merging sources

Gary Berg-Cross has articulated a number of legitimate issues concerning the method for moving toward a Common Semantic Model.  His points are diverse, and think it best to respond to some of these points by describing my own views -- which may not be those ultimately  chosen by the COSMO-WG as a whole.
 
(1) Hub?  I think that ultimately one COSMO will best serve the purposes of interoperability, and for me the question is how to get there -- to start with one upper ontology and elaborate it by mapping to the multiple KCSs used by ONTACWG members, or to start with with several competent ontologies, and to merge them.  My suggestion was to try both approaches simultaneously, i.e. in one part to investigate the potential for merger by formalizing the UMLS semantic network using each candidate upper ontology separately, and to compare the resulting fragments of each upper ontology to determine how closely those parts are related and just how easy a merger would be.  In a second part, the FEA-RMO and DoD Upper taxonomy could also be formalized with respect to one of the upper ontologies, chosen by some criteria such as those  suggested by Eric Peterson.  That is, we would not prejudge the issue of whether or not to choose one existing hub from the start, but to gather more information to determine whether a merger would be so difficult that choosing one existing ontology is the only practical course.  The question of using an existing ontology as a hub for formalization of includes the question of what criteria would be used to choose the hub, and any thoughts on that issue would be welcome right now.
 
Concerning Gary's feeling that:
>>  I don?t see how this could be done with merging some existing ontologies and various people have talked about using UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926,  FEA-RMO and the DoD Core Taxonomy. 
 
It is important to distinguish the formal upper ontologies from the FEA-RMO and DoD Core taxonomy.  The latter two are candidates for formalization with respect to one of the upper ontologies, they would not serve as upper ontologies on the same level as the others.
 
It is not clear a priori without careful investigation that any one of the potential hubs has everything that is in all of the others, and does not need supplementation.  The formalization of the UMLS-SN is a non-trivial but tractable test case that should provide some objective information to help evaluate the candidates and estimate the difficulty of merger.  I am hoping that we can begin discussing specifics soon.
 
 
(2) The FEA-RMO and DoD core taxonomy do indeed have problems if looked at as starting ontologies.  The DoD Core doesn't pretend to be an ontology, and the FEA-RMO is, as Gary points out, closer to a Reference Model, though expressed in ontology format.  There are nevertheless good reasons to try to connect these to the UMLS.  (a) The FEA-RMO is an active project, which may well form part of any common ontology ultimately developed for use within that federal government.  There are expectations that the SICoP will help in some way in the development of that model.  The ONTACWG could be helpful in making recommendations on the best way to formalize the FEA-RMO and align it with other knowledge representation efforts within the government, even if much of the work will in fact be done by contractors.   I think it would be unfortunate if we ignored it, since we can bring a broad perspective that can make ontologies developed from or related to the FEA-RMO more functional.  (b) The DoD core taxonomy is still an active project, and its purpose is very much similar to that of the COSMO, to create a hub with which domain taxonomies in DoD can be aligned and related.  But it is not intended to function as an ontology for logical inferencing.  By defining the terms of the DoD Core using concepts from the COSMO, we can make it more useful for logical inferencing, and the mapping of the DoD Core will have a multiplicative effect because of its already existing relations with other taxonomies.   If there are hierarchical links in the DoD core that appear invalid form an inheritance point of view, those links would not appear in the formalized version, but the individual terms could still be defined by reference to the COSMO.  So the lack of formality of the DoD Core would make the precise formalization more time-consuming, but would not affect the formalized version, which would have only valid relations  Combined with the UMLS-SN, these two knowledge classifications provide a diverse set of topics to serve as test cases for building a COSMO and defining domain knowledge classifications by means of the COSMO.  To be feasible for a volunteer group yet broad enough to provide a realistic test of the potential for semantic integration, I think that this set of knowledge classifications contains the proper balance of complexity and restricted subject matter, and they also relate directly to the purpose of the SICoP, the parent of the ONTACWG.
 
(3)  The ANSI X3/T2 Ad hoc group on ontology standards:
    I participated in the activity of this group until it faded away around 1998 or so.   As I recall, there were some productive discussions of issues, but at least one proposal to obtain funding for the activity was rejected.  In the absence of support, it was not possible for participants to spend enough time to resolve the very complex technical issues, and participants returned to work on the projects for which they could get funding.   Neither OpenCyc nor SUMO nor DOLCE were available at that time.
     Another smaller project was funded by the Klaus Tschira foundation, to bring together 20 or so ontologists for two weeks in 1998 to come to agreement on the structure of the topmost levels of an upper ontology.  Most (not all) participants seemed to think that useful progress was made, but that a lot more additional work was needed.  I do not think that a formal report was prepared with details of the agreements reached.
    A later effort, the IEEE-SUO project (2001 - present), did get some financial support, and the SUMO ontology was created from that.  The SUMO had some input from the IEEE group, but was mostly built in-house by Teknowledge.  The absence of funding for anyone outside of Teknowledge left in place the same problems that bedeviled the ANSI group, namely that the concerns of multiple communities could not be addressed  effectively, and the resulting ontology represented the views predominantly of a small group, who of necessity had to move forward quickly to provide the deliverables, regardless of whether external participants agreed with the results.  This problem was not primarily a result of any lack of interest of the Teknowledge builders for input from the wider community, but was a structural problem resulting from the amount and distribution of funding.  This project illustrated one major barrier to agreement -- that in the absence of applications that could test alternative suggested ontological representations, and without a mechanism for efficiently including alternative representations, an ontology, even built to some extent collaboratively, will inevitably have a lot of elements that are unsatisfactory to one or another potential user, or lack some they want.
    The resources that were discussed by the ANSI group then are still valid, but have been supplemented by additional more developed ontologies and lexical resources.  We have invited participation by all groups, including those that created those older resources, but as a volunteer group, it is only possible for us to make substantive efforts using the resources that are of interest to ONTACWG participants. The WordNet could be investigated directly if there were any members with a special interest in that classification.  But it will be included in any case indirectly, because there are mappings of SUMO and Cyc to the WordNet.
 
    There never has been adequate funding for a substantial collaborative project involving multiple groups, to find agreement on a common upper ontology, even though proposals have been made since the time of the ANSI efforts.  We can do what is possible without funding, and hope to encourage agencies to provide funding at some point soon.  The cBio project may help indirectly, if any people participating in that project include some work with upper ontologies -- or better yet, with the COSMO -- in their proposals.
 
    The ONTACWG, without funding, can make progress only to a limited extent, but we have more experience and more examples now to work with than the earlier projects did.  So our task will be easier than the earlier ones.  A plausible goal would be, not to create a common upper ontology de novo, but to investigate within some limited compass how to choose or adopt one or more existing upper ontologies to serve as the hub to which multiple domain knowledge classifications can be related.  As mentioned above, I think the ONTACWG can serve a valuable function by investigating the alternative approaches of choosing one upper ontology versus merging several.  Even within the limited scope I suggested, this will take a lot of detailed work.  To address the problem of accommodating the views of different users, my suggestion has been to structure the COSMO so that it will incorporate all views considered useful by participants.  The result is likely to be a lattice of theories.  The precise nature of the lattice that will be needed is not agreed on at present.  Discovering the details of what is needed to accommodate multiple ontological views will be, I think, one of the important results of the attempt to formalize the UMLS-SN versus multiple upper ontologies.
 
(4) DOLCE and OntoClean
   The DOLCE ontology is one of the upper ontologies that the COSMO-WG will be investigating as a base for the formalization of the UMLS-SN, and perhaps of the other KCSs that we are interested in.  DOLCE and BFO are the base ontologies that Olivier Bodenreider and Lowell Vizenor are planning to use in their already ongoing effort to formalize the UMLS-SN.  The other upper ontologies of interest to ONTACWG participants will be investigated if any of the participants decide to take the lead in that effort.  So the OntoClean method will be considered as part of the COSMO-WG agenda.  There are other formal methodologies that have been discussed in the literature, such as Michael Gruninger's "Semantic integration through invariants" in the Spring 2005 AI magazine.  I hope that it will also will be possible to use such ideas in the COSMO effort.
 
 
(5) Other approaches
    The suggestions made thus far do not preclude alternatives.  It is, I think, potentially useful for multiple approaches to be tried at the same time, because we are in a position to share intermediate results on a regular basis and take advantage of each other's experience as soon as anything useful is discovered or proposed, and posted to our Wiki.  The ONTACWG is not a competition, but a collaboration.  So the inefficiencies of trying alternative approaches in traditional research, where one learns of the results of other work a year or so after a project has been completed, should not occur in the work we are discussing.  If any participants have a particular interest in trying a specific approach toward relating KCSs to each other, the ONTACWG environment should provide feedback and support that will make individual efforts more productive.
    The paper by Doerr, Hunter, and Lagoze starts off expressing a sentiment that is also shared by myself and a lot of others, that some common top-level ontology will be needed for accurate semantic interoperability.  And some of the specific observations they make as a result of their study are, I believe, valid.  But as has been pointed out, their proposed merged ontology does have some problems, and falls far short of the best developed ontologies that are already available.  A one-on-one alignment of that kind may be useful for applications that are restricted to two or a small group of communities, but our interest is much broader.   I think we can do a lot better.
 
Pat
   

Patrick Cassidy
MITRE Corporation
260 Industrial Way
Eatontown, NJ 07724
Mail Stop: MNJE
Phone: 732-578-6340
Cell: 908-565-4053
Fax: 732-578-6012
Email: pcassidy at mitre.org

 


From: ontac-forum-bounces@xxxxxxxxxxxxxx [mailto:ontac-forum-bounces@xxxxxxxxxxxxxx] On Behalf Of Gary Berg-Cross
Sent: Thursday, November 17, 2005 5:32 PM
To: ontac-forum@xxxxxxxxxxxxxx
Subject: [ontac-forum] Some thoughts on hub ontology and merging sources

I wanted to follow up Eric Ps earlier message about a hub approach to building our common ontology.  I think that his questions and issues got side tracked by the responses to Roy?s "general ontology".

 

Eric was curious about ?how pervasive those anti-hub feelings really are. ?   I?m not of one feeling on this issue, since I think it is complex and would welcome some discussion of some of it.  Eric had particular ideas on Dr. Sowa?s sub-sumption lattice idea, but I haven?t heard responses to that and perhaps others can respond to it. 

 

For myself, I could imagine going a hub or modular approach depending on the quality of the hub.  I?d have to be convinced that it was doable with our resources, and would want to know the ?seed? for it and the process or development. I don?t see how this could be done with merging some existing ontologies and various people have talked about using UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926,  FEA-RMO and the DoD Core Taxonomy.  Assuming even this as a start I have some issues and a approach to discuss as a strawman. 

 

The FEA-RMO  is an Ontology  of a Reference Model and not of an actual domain such as health.  It seems quite hard to connect to this to others. 

 

Also, the DoD taxonomy, in my opinion, has the degree of problems that Barry and John pointed out in the "general ontology" so it may not be easy to assimilate.  We might start without trying to merge these in and also might start with the best 2 or 3 as candidates to seed an effort.

Another point or question concerns leveraging the experience of past efforts.  Back seven years or more there was an effort by the ANSI Ad Hoc group to construct a standard, called the Reference Ontology.  They had a five-step approach for the following

  1. Upper levels (approx. 100,000 terms): Bring into correspondence (to align) the terms of a small number of selected large-scale ontologies (eventual size approx. 100,000 items). Do so inclusively; that is, create a result in which users can choose which of the component ontologies' terms they wish to see and use.
  2. Domain models (under 2,000 terms each): Link into this Ontology selected domain-specific ontologies, developed to support reasoning about time, space, physics, geography, etc. Do so inclusively; allow the linkage of various different models of time, space, etc.
  3. Access tools: Create easy-to-use tools for Ontology access and extension.
  4. Dissemination: Place the resulting Reference Ontology on the Web, freely available.
  5. Theoretical basis: In ongoing work, have a team of highly qualified individuals comb through the Ontology to find powerful generalizations, to weed out unnecessary and inconsistent items, and to create a maximal factoring of the upper levels of the Ontology.

Seems quite similar to what we are talking about.  Whatever happened?  Did it fail because it didn?t have an upper ontology?

They listed  the following are candidate sources for terms to be included into their ?merged Reference Ontology? and a few of these (UMLS , CyC) have been mentioned as a base for us too :

  • USC/ISI: Pangloss Ontology SENSUS approx. 70,000 terms, general coverage, little detail, taxonomization supports Natural Language applications.
  • Princeton: WordNet approx. 70,000 terms, general coverage, little detail, taxonomized on Naive Semantics / Cognitive Science principles.
  • CYCorp: Upper portion of CYC ontology approx. 2,500 terms, general coverage, little detail, taxonomized on Naive Semantics / AI principles. Later additions may include more of the 40,000-odd terms currently in CYC.
  • EDR: Upper portion of EDR concept ontology approx. 1,000 terms, general coverage, medium detail, taxonomized for Natural Language applications. Later additions may include more of the approx. 400,000 terms in the EDR concept lexicon.
  • New Mexico State University: MIKROKOSMOS approx. 4,000 terms, general coverage, detailed, taxonomized for Natural Language applications.
  • European Union: EuroWordNetóunder construction; probably approx. 50,000 terms, little detail, taxonomized on Naive Semantics / Cognitive Science principles.
  • LXT Inc.: UMLS medical ontology exceeds 50,000 terms, medium detail, taxonomized for medical reasoning applications.

Perhaps some of these should also be on our list if they have ?matured?.

 

A last point/issue concerns alignment between our starting sources and how to start on this.  Martin Doerr and others did some work reported in  ?Towards a Core Ontology for Information Integration? and described the  comparison and convergence of 2 ontologies using  the OntoClean approach.  (Guarino, N. and Welty, C., ?Evaluating ontological decisions with OntoClean,?  Communiations of the ACM, 45 (2), pp. 61-65, 2002,)

 

This uses an analyses of top-level ontological distinctions related to:

 

1. instantiation versus membership

2. part-of and mereological axioms

3. extensionality

4. connection

5. location and extension

6. co-extension, co-connection

7. unity, singularity and plurality

8. dependence/independence

 

 

The claim is that the OntoClean approach ?enables: the detection of concept definitions that  are lacking in clarity or rigidity; the justification of valid sub-sumption relations; and the  detection of invalid sub-sumption declarations. ? 

 

Would it be useful to start looking at the matchup of some of our ?seed? ontologies in this way (or a better way that might be proposed)? The  Doerr paper discusses the process of finding common conceptualizations by equivalency between a concept of ?Temporality? on Ont1 and   ?Temporal Entity? in Ont2  or ?Action? and ?Activity? etc.  Some of the foundation concepts.  It might be useful to make our discussions concrete if we are planning on using a merger of  UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926 etc.

 

 

Regards,

Gary Berg-Cross

 

 

 

 


_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config: 
http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki: 
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG    (01)
<Prev in Thread] Current Thread [Next in Thread>