| 
 Denise, 
    The "controlled defining vocabulary" 
used by LDOCE is quite different from a typical community controlled 
vocabulary.  My proposal to adapt such a vocabulary for 
ONTACWG purposes is to make it easier for people to write logically precise 
definitions and factual statements, by providing a vocabulary that can be 
accurately and automatically translated into a logical specification for 
inclusion in the COSMO ontology or some extension of it.  The LDOCE 
dictionary itself is not otherwise being used as a resource for this 
effort. 
    There are several proposals as to what 
this group should do first, and no conclusion has yet been 
reached. 
    If you can provide a sample (or the 
whole thing??) of the topical thesaurus, it would be helpful in deciding how 
that could be used for whatever projects are undertaken.  Likewise, 
examples of the other types of data structure that you discuss would help 
immensely in understanding them. 
  
   Pat 
    
  
Patrick Cassidy MITRE Corporation 260 Industrial 
Way Eatontown, NJ 07724 Mail Stop: MNJE Phone: 732-578-6340 Cell: 
908-565-4053 Fax: 732-578-6012 Email: 
pcassidy@xxxxxxxxx
 
  
   
  
  
  All, 
    
  May I ask which domain you have chosen to begin working on?  We 
  likely have rich vocabularies to contribute - our topical thesaurus for 
  development work is 200,000+ terms in English.   This source is 
  different from a 'development operations' vocabulary - which describes how 
  people do development work.  
     
  Just a couple of additional thoughts.     
    
  The reason I ask is that there are some significant differences between 
  'dictionaries' and more advanced controlled vocabulary sources such as 
  thesauri and semantic networks.   Dictionaries are single word, 
  grammatically singular entries.  For example, WordNet would have entries 
  for 'girl' and 'education' but not for 'girls education'.   
  Relationships among these individual words, their grammatical variations (girl 
  and girls) have already been built.  Wouldn't you want to begin with more 
  sophisticated sources and look for ways to resolve ambiguity through 
  contextualization? 
    
  I'd also like to share with you our experiences in using free form 
  expansion of grammatical variations at the search stage.  The results 
  have shown to increase irrelevance in search results.  It seems that 
  there is a false idea that increasing results through automated grammatical 
  expansion will improve recall, but in fact that may not be the case 
  either.  Dictionary level synonyms don't necessarily retain the context 
  of a concept so relevance may be lost in applications which build on 
  them.  It seems to us that only when the expansion is 'managed' at the 
  indexing stage is there a good result on relevance.  Doesn't the 
  relevance of the results have to be a major consideration, particularly if you 
  intend to do reasoning on the base going forward?   
   Best regards, 
  Denise  -----ontac-forum-bounces@xxxxxxxxxxxxxx 
  wrote: -----
  
  To: 
    "ONTAC-WG General Discussion" <ontac-forum@xxxxxxxxxxxxxx> From: 
    "Cassidy, Patrick J." <pcassidy@xxxxxxxxx> Sent by: 
    ontac-forum-bounces@xxxxxxxxxxxxxx Date: 10/15/2005 10:11PM Subject: 
    RE: [ontac-forum] A potential defining vocabulary for 
    definitions
  Peter,   I did not 
    anticipate that other candidates for a restricted English defining 
    vocabulary would be proposed, but if anyone has a specific interest in 
    such an English defining vocabulary, please do send it in. Any that exist 
    could be used, but I would expect that as the ONTACWG adds and subtracts 
    to create a vocabulary for our purposes the content of such a vocabulary 
    would quickly differ from that used in the LDOCE or another normal 
    dictionary.  So the LDOCE vocabulary is merely a convenient starting 
    point and illustrative example of what can be done. I believe that there 
    are other dictionaries that use a restricted vocabulary for definitions, 
    such as the Macmillan Student's Dictionary and The Cambridge Dictionary 
    of American English (which also uses about 2000 words).  I have not 
    attempted to compare them to that of the LDOCE; for our purposes I 
    wouldn't expect any one to be markedly better than any other, and I 
    wouldn't know how to measure their utility.  If we get files with 
    other basic defining vocabularies, I would be inclined to simply merge 
    them and use the merged version as our starting point.   WordNet 
    is of course of special interest, though not as a defining vocabulary -- 
    it has over 100,000 words. Although the "glosses" (definitions) in 
    WordNet do not use, as far I am aware, any restrictions on the words or 
    senses of the words, there is a project, the "Extended WordNet", which 
    has tagged those glosses so as to specify the actual sense of each word 
    used in most of the definitions. See:    http://xwn.hlt.utdallas.edu/wsd.html
  Such 
    disambiguated glosses could be useful as examples for defining concepts 
    in the COSMO, though I expect that those glosses would not in general be 
    parseable by an automatic controlled-language interpreter. There have 
    been programs designed to interpret dictionary definitions (for example, 
    see: http://www.clres.com/dict.html), but I haven't worked 
    with them, and don't know whether they would be useful for this purpose. 
     The existing upper ontologies have more or less precise mappings to 
    a lot of WordNet synsets (word senses), and I expect those mappings to be 
    helpful in analyzing the relations among the existing upper 
    ontologies.
  The COSMO will be an ontology in which the meanings of 
    the concepts are specified logically.  It will be grounded in 
    reality by specifying some unambiguously identifiable real-world 
    instances of things that are intended to be instances of the 
    logically-specified concepts.  A controlled defining vocabulary 
    would have to have its terms rigidly aligned with the concepts in the 
    ontology, so a "controlled English" for ONTACWG purposes would probably 
    differ significantly from any lexicographic controlled vocabulary. 
     But the LDOCE example would, I think, be a good starting point to 
    develop a vocabulary specifically designed for expressing logically 
    precise statements with detailed semantics associated with the defining 
    terms.  I provide it also for exploratory purposes.  Those who 
    are maintaining community knowledge classifications, if they are curious, 
    might try using those limited word to define the terms in their own 
    knowledge classifications.  As I mentioned, if they find that they 
    need additional terms to create good definitions, those additional terms 
    could be accumulated as candidates for themselves being defined 
    logically.
  Pat
 
  Patrick Cassidy MITRE Corporation 260 
    Industrial Way Eatontown, NJ 07724 Mail Stop: MNJE Phone: 
    732-578-6340 Cell: 908-565-4053 Fax: 732-578-6012 Email: 
    pcassidy@xxxxxxxxx
 
  -----Original Message----- From: 
    ontac-forum-bounces@xxxxxxxxxxxxxx [mailto:ontac-forum-bounces@xxxxxxxxxxxxxx] On Behalf Of 
    Peter P. Yim Sent: Saturday, October 15, 2005 7:57 PM To: ONTAC-WG 
    General Discussion Subject: Re: [ontac-forum] A potential defining 
    vocabulary for definitions
  Thank you, Pat.
  Are we inviting 
    the group to propose other candidates?
  In addition to your
  (a) 
    defining vocabulary from the Longman Dictionary of  Contemporary English 
    (LDOCE) - suggested: PatCassidy/2005.10.15
  The immediate ones that 
    come to mind would be:
  (b) the Oxford English Dictionary (OED), 
    and
  (c) Wordnet (http://wordnet.princeton.edu/)
  It would be nice for 
    those in the community who has working  knowledge of the above (plus 
    whatever other candidates) to  present the pros and cons of each, and 
    debate on which should be  adopted. ... Please.
  Regards. 
     =ppy --
 
  Cassidy, Patrick J. wrote Sat, 15 Oct 2005 
    18:59:59 -0400: > ONTACWG members: >   >  > I 
    have placed a file containing the defining vocabulary from the Longman 
     > Dictionary of Contemporary English (LDOCE) in the reference 
    folder at: >  > http://colab.cim3.net/file/work/SICoP/ontac/reference/LDOCE-definingVoc abularyList.txt > 
      >  > This is the set of words (about 2200) used by 
    Longman to define all of  > the 56,000 words and phrases in its 
    dictionary.  That dictionary is  > intended to be understandable 
    by those learning English, and the editors  > made a conscious 
    effort to write clearly understandable definitions  > using the 
    minimum vocabulary.  Many of the words are used in more than 
     > one sense, as with words that have multiple parts of speech; 
    the actual  > number of senses used may be more than 4000. 
     This is an example of the  > practice of specifying the 
    meanings of terms or concepts using a  > relatively small set of 
    defining concepts.  This is analogous to the  > process by which 
    we hope the Common Semantic Model (COSMO) will enable  > semantic 
    interoperability of knowledge-based systems built by different  > 
    groups, by providing a common conceptual defining vocabulary that will 
     > be independent of the terms used in community 
    knowledge classification  > systems, but capable of precisely 
    specifying the meanings of the  > community terms.. >  > 
      >  > The Longman defining vocabulary could also serve as 
    a starting point for  > the development of an English defining 
    vocabulary for ONTACWG, which  > could be used to make it easier to 
    create logically precise definitions,  > and assertions of fact. 
     There are several  "Controlled English"  > programs that 
    have been used to make logical statements in an  > English-like 
    grammar.  If we have a vocabulary of words with precisely  > 
    defined meanings, it should be possible to allow definitions to be  > 
    phrased in normal but moderately restricted English, and be interpreted 
     > correctly by the translator program.  Some ambiguity in the 
    defining  > vocabulary should be resolvable by the lexical context, 
    but it is  > possible that the full range of meanings actually used in 
    the LDOCE will  > be too wide to be resolvable, and the "defining 
    vocabulary" or the  > grammar for defining terms in ONTACWG databases 
    may need to be more  > restricted than the language the editors of 
    LDOCE use. >  >   >  > As with the COSMO, an 
    English "Defining vocabulary" would be open to  > additions as 
    required to accommodate the needs of the different  > communities. 
     It will always be convenient for specialized communities  > 
    to use terms with specific meanings in their contexts of interest,  > 
    including very technical terms.  If those terms themselves could be 
     > defined by both the logical specifications of the COSMO and the 
     > restricted vocabulary of an ONTACWG "English defining 
    vocabulary", they  > would constitute specialized extensions of the 
    COSMO and English  > vocabularies.  Then natural English 
    definitions even in those technical   > areas could be created 
    with accurately interpretable meanings. >  >   > 
     > Attempting to create definitions of community-specific terms 
    using such  > a defining vocabulary could help to recognize when 
    the logical concept  > inventory of the COSMO is inadequate and 
    needs supplementation, if it
  > becomes necessary to use English 
    terms that have no associated concept  > in the COSMO.  Prima 
    facie cases like that could allow domain  > specialists with only 
    modest familiarity with the COSMO to help the  > maintenance team to 
    decide which extensions should have greatest  > priority.  Simple 
    tools like a spell-checker using only the defining  > vocabulary as 
    its dictionary would help in using that vocabulary for  > creating 
    precise definitions. >  >   >  > It is likely 
    that similar controlled natural language vocabularies and  > 
    grammars could be created for other languages, but I myself have no  > 
    acquaintance with such work. >  >   >  > To use 
    an existing controlled-language system to create definitions for  > 
    the ONTACWG would require adaptation of such a system to reference the 
     > COSMO ontology.  This may take considerable effort, so it 
    will probably  > be necessary to find projects that are ongoing and 
    for which someone who  > is familiar with the system will be able 
    to spend some time doing the
  > adaptation.  If any ONTACWG 
    members are acquainted with such a project,  > perhaps an inquiry 
    to the developers would provide us with information  > to determine 
    the feasibility of adaptation in each case.  I will be  > happy 
    to participate in discussions of such a possibility.  Feel free to 
     > send suggestions to me directly, or to the list. >  > 
      >  > Pat >  > Patrick Cassidy > MITRE 
    Corporation > 260 Industrial Way > Eatontown, NJ 07724 > 
    Mail Stop: MNJE > Phone: 732-578-6340 > Cell: 
    908-565-4053 > Fax: 732-578-6012 > Email: pcassidy at 
    mitre.org >  >   >  > 
     > ----------------------------------------------------------------------- - > 
     >   > 
    _________________________________________________________________ > 
    Message Archives: http://colab.cim3.net/forum/ontac-forum/ > To Post: 
    mailto:ontac-forum@xxxxxxxxxxxxxx > 
    Subscribe/Unsubscribe/Config: http://colab.cim3.net/mailman/listinfo/ontac-forum/ > 
    Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/ > 
    Community Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin gWG
  _________________________________________________________________ Message 
    Archives: http://colab.cim3.net/forum/ontac-forum/ To Post: mailto:ontac-forum@xxxxxxxxxxxxxx Subscribe/Unsubscribe/Config: http://colab.cim3.net/mailman/listinfo/ontac-forum/ Shared 
    Files: http://colab.cim3.net/file/work/SICoP/ontac/ Community 
    Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin gWG
  _________________________________________________________________ Message 
    Archives: http://colab.cim3.net/forum/ontac-forum/ To Post: mailto:ontac-forum@xxxxxxxxxxxxxx Subscribe/Unsubscribe/Config: 
    http://colab.cim3.net/mailman/listinfo/ontac-forum/ Shared 
    Files: http://colab.cim3.net/file/work/SICoP/ontac/ Community 
    Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG
  
  
_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config: 
http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki: 
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG    (01)
 
 |