ontac-forum
[Top] [All Lists]

RE: [ontac-forum] A potential defining vocabulary for definitions

To: ONTAC-WG General Discussion <ontac-forum@xxxxxxxxxxxxxx>
From: dbedford@xxxxxxxxxxxxx
Date: Sun, 16 Oct 2005 12:23:08 -0400
Message-id: <OFD0045331.D5B7FCAA-ON8525709C.005A0246-8525709C.005A0254@xxxxxxxxxxxxx>
All,
 
May I ask which domain you have chosen to begin working on?  We likely have rich vocabularies to contribute - our topical thesaurus for development work is 200,000+ terms in English.   This source is different from a 'development operations' vocabulary - which describes how people do development work. 
 
Just a couple of additional thoughts.  
 
The reason I ask is that there are some significant differences between 'dictionaries' and more advanced controlled vocabulary sources such as thesauri and semantic networks.   Dictionaries are single word, grammatically singular entries.  For example, WordNet would have entries for 'girl' and 'education' but not for 'girls education'.   Relationships among these individual words, their grammatical variations (girl and girls) have already been built.  Wouldn't you want to begin with more sophisticated sources and look for ways to resolve ambiguity through contextualization?
 
I'd also like to share with you our experiences in using free form expansion of grammatical variations at the search stage.  The results have shown to increase irrelevance in search results.  It seems that there is a false idea that increasing results through automated grammatical expansion will improve recall, but in fact that may not be the case either.  Dictionary level synonyms don't necessarily retain the context of a concept so relevance may be lost in applications which build on them.  It seems to us that only when the expansion is 'managed' at the indexing stage is there a good result on relevance.  Doesn't the relevance of the results have to be a major consideration, particularly if you intend to do reasoning on the base going forward? 

Best regards,
Denise
-----ontac-forum-bounces@xxxxxxxxxxxxxx wrote: -----

To: "ONTAC-WG General Discussion" <ontac-forum@xxxxxxxxxxxxxx>
From: "Cassidy, Patrick J." <pcassidy@xxxxxxxxx>
Sent by: ontac-forum-bounces@xxxxxxxxxxxxxx
Date: 10/15/2005 10:11PM
Subject: RE: [ontac-forum] A potential defining vocabulary for definitions

Peter,
  I did not anticipate that other candidates for a restricted English
defining vocabulary would be proposed, but if anyone has a specific
interest in such an English defining vocabulary, please do send it in.
Any that exist could be used, but I would expect that as the ONTACWG
adds and subtracts to create a vocabulary for our purposes the content
of such a vocabulary would quickly differ from that used in the LDOCE
or another normal dictionary.  So the LDOCE vocabulary is merely a
convenient starting point and illustrative example of what can be done.
I believe that there are other dictionaries that use a restricted
vocabulary for definitions, such as the Macmillan Student's Dictionary
and The Cambridge Dictionary of American English (which also uses about
2000 words).  I have not attempted to compare them to that of the
LDOCE; for our purposes I wouldn't expect any one to be markedly better
than any other, and I wouldn't know how to measure their utility.  If
we get files with other basic defining vocabularies, I would be
inclined to simply merge them and use the merged version as our
starting point.
  WordNet is of course of special interest, though not as a defining
vocabulary -- it has over 100,000 words. Although the "glosses"
(definitions) in WordNet do not use, as far I am aware, any
restrictions on the words or senses of the words, there is a project,
the "Extended WordNet", which has tagged those glosses so as to specify
the actual sense of each word used in most of the definitions. See:
   http://xwn.hlt.utdallas.edu/wsd.html

Such disambiguated glosses could be useful as examples for defining
concepts in the COSMO, though I expect that those glosses would not in
general be parseable by an automatic controlled-language interpreter.
There have been programs designed to interpret dictionary definitions
(for example, see: http://www.clres.com/dict.html), but I haven't
worked with them, and don't know whether they would be useful for this
purpose.  The existing upper ontologies have more or less precise
mappings to a lot of WordNet synsets (word senses), and I expect those
mappings to be helpful in analyzing the relations among the existing
upper ontologies.

The COSMO will be an ontology in which the meanings of the concepts are
specified logically.  It will be grounded in reality by specifying some
unambiguously identifiable real-world instances of things that are
intended to be instances of the logically-specified concepts.  A
controlled defining vocabulary would have to have its terms rigidly
aligned with the concepts in the ontology, so a "controlled English"
for ONTACWG purposes would probably differ significantly from any
lexicographic controlled vocabulary.  But the LDOCE example would, I
think, be a good starting point to develop a vocabulary specifically
designed for expressing logically precise statements with detailed
semantics associated with the defining terms.  I provide it also for
exploratory purposes.  Those who are maintaining community knowledge
classifications, if they are curious, might try using those limited
word to define the terms in their own knowledge classifications.  As I
mentioned, if they find that they need additional terms to create good
definitions, those additional terms could be accumulated as candidates
for themselves being defined logically.

Pat


Patrick Cassidy
MITRE Corporation
260 Industrial Way
Eatontown, NJ 07724
Mail Stop: MNJE
Phone: 732-578-6340
Cell: 908-565-4053
Fax: 732-578-6012
Email: pcassidy@xxxxxxxxx


-----Original Message-----
From: ontac-forum-bounces@xxxxxxxxxxxxxx
[mailto:ontac-forum-bounces@xxxxxxxxxxxxxx] On Behalf Of Peter P. Yim
Sent: Saturday, October 15, 2005 7:57 PM
To: ONTAC-WG General Discussion
Subject: Re: [ontac-forum] A potential defining vocabulary for
definitions

Thank you, Pat.

Are we inviting the group to propose other candidates?

In addition to your

(a) defining vocabulary from the Longman Dictionary of
Contemporary English (LDOCE) - suggested: PatCassidy/2005.10.15

The immediate ones that come to mind would be:

(b) the Oxford English Dictionary (OED), and

(c) Wordnet (http://wordnet.princeton.edu/)

It would be nice for those in the community who has working
knowledge of the above (plus whatever other candidates) to
present the pros and cons of each, and debate on which should be
adopted. ... Please.

Regards.  =ppy
--


Cassidy, Patrick J. wrote Sat, 15 Oct 2005 18:59:59 -0400:
> ONTACWG members:
>  
>
> I have placed a file containing the defining vocabulary from the
Longman
> Dictionary of Contemporary English (LDOCE) in the reference folder
at:
>
>
http://colab.cim3.net/file/work/SICoP/ontac/reference/LDOCE-definingVoc
abularyList.txt
>  
>
> This is the set of words (about 2200) used by Longman to define all
of
> the 56,000 words and phrases in its dictionary.  That dictionary is
> intended to be understandable by those learning English, and the
editors
> made a conscious effort to write clearly understandable definitions
> using the minimum vocabulary.  Many of the words are used in more
than
> one sense, as with words that have multiple parts of speech; the
actual
> number of senses used may be more than 4000.  This is an example of
the
> practice of specifying the meanings of terms or concepts using a
> relatively small set of defining concepts.  This is analogous to the
> process by which we hope the Common Semantic Model (COSMO) will
enable
> semantic interoperability of knowledge-based systems built by
different
> groups, by providing a common conceptual defining vocabulary that
will
> be independent of the terms used in community knowledge
classification
> systems, but capable of precisely specifying the meanings of the
> community terms..
>
>  
>
> The Longman defining vocabulary could also serve as a starting point
for
> the development of an English defining vocabulary for ONTACWG, which
> could be used to make it easier to create logically precise
definitions,
> and assertions of fact.  There are several  "Controlled English"
> programs that have been used to make logical statements in an
> English-like grammar.  If we have a vocabulary of words with
precisely
> defined meanings, it should be possible to allow definitions to be
> phrased in normal but moderately restricted English, and be
interpreted
> correctly by the translator program.  Some ambiguity in the defining
> vocabulary should be resolvable by the lexical context, but it is
> possible that the full range of meanings actually used in the LDOCE
will
> be too wide to be resolvable, and the "defining vocabulary" or the
> grammar for defining terms in ONTACWG databases may need to be more
> restricted than the language the editors of LDOCE use.
>
>  
>
> As with the COSMO, an English "Defining vocabulary" would be open to
> additions as required to accommodate the needs of the different
> communities.  It will always be convenient for specialized
communities
> to use terms with specific meanings in their contexts of interest,
> including very technical terms.  If those terms themselves could be
> defined by both the logical specifications of the COSMO and the
> restricted vocabulary of an ONTACWG "English defining vocabulary",
they
> would constitute specialized extensions of the COSMO and English
> vocabularies.  Then natural English definitions even in those
technical  
> areas could be created with accurately interpretable meanings.
>
>  
>
> Attempting to create definitions of community-specific terms using
such
> a defining vocabulary could help to recognize when the logical
concept
> inventory of the COSMO is inadequate and needs supplementation, if it

> becomes necessary to use English terms that have no associated
concept
> in the COSMO.  Prima facie cases like that could allow domain
> specialists with only modest familiarity with the COSMO to help the
> maintenance team to decide which extensions should have greatest
> priority.  Simple tools like a spell-checker using only the defining
> vocabulary as its dictionary would help in using that vocabulary for
> creating precise definitions.
>
>  
>
> It is likely that similar controlled natural language vocabularies
and
> grammars could be created for other languages, but I myself have no
> acquaintance with such work.
>
>  
>
> To use an existing controlled-language system to create definitions
for
> the ONTACWG would require adaptation of such a system to reference
the
> COSMO ontology.  This may take considerable effort, so it will
probably
> be necessary to find projects that are ongoing and for which someone
who
> is familiar with the system will be able to spend some time doing the

> adaptation.  If any ONTACWG members are acquainted with such a
project,
> perhaps an inquiry to the developers would provide us with
information
> to determine the feasibility of adaptation in each case.  I will be
> happy to participate in discussions of such a possibility.  Feel free
to
> send suggestions to me directly, or to the list.
>
>  
>
> Pat
>
> Patrick Cassidy
> MITRE Corporation
> 260 Industrial Way
> Eatontown, NJ 07724
> Mail Stop: MNJE
> Phone: 732-578-6340
> Cell: 908-565-4053
> Fax: 732-578-6012
> Email: pcassidy at mitre.org
>
>  
>
>
>
-----------------------------------------------------------------------
-
>
>  
> _________________________________________________________________
> Message Archives: http://colab.cim3.net/forum/ontac-forum/
> To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
> Subscribe/Unsubscribe/Config:
http://colab.cim3.net/mailman/listinfo/ontac-forum/
> Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
> Community Wiki:
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin
gWG

_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config:
http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki:
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin
gWG

_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config: http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG


_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config: 
http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki: 
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG    (01)
<Prev in Thread] Current Thread [Next in Thread>