Denise,
The "controlled defining vocabulary"
used by LDOCE is quite different from a typical community controlled
vocabulary. My proposal to adapt such a vocabulary for
ONTACWG purposes is to make it easier for people to write logically precise
definitions and factual statements, by providing a vocabulary that can be
accurately and automatically translated into a logical specification for
inclusion in the COSMO ontology or some extension of it. The LDOCE
dictionary itself is not otherwise being used as a resource for this
effort.
There are several proposals as to what
this group should do first, and no conclusion has yet been
reached.
If you can provide a sample (or the
whole thing??) of the topical thesaurus, it would be helpful in deciding how
that could be used for whatever projects are undertaken. Likewise,
examples of the other types of data structure that you discuss would help
immensely in understanding them.
Pat
Patrick Cassidy MITRE Corporation 260 Industrial
Way Eatontown, NJ 07724 Mail Stop: MNJE Phone: 732-578-6340 Cell:
908-565-4053 Fax: 732-578-6012 Email:
pcassidy@xxxxxxxxx
All,
May I ask which domain you have chosen to begin working on? We
likely have rich vocabularies to contribute - our topical thesaurus for
development work is 200,000+ terms in English. This source is
different from a 'development operations' vocabulary - which describes how
people do development work.
Just a couple of additional thoughts.
The reason I ask is that there are some significant differences between
'dictionaries' and more advanced controlled vocabulary sources such as
thesauri and semantic networks. Dictionaries are single word,
grammatically singular entries. For example, WordNet would have entries
for 'girl' and 'education' but not for 'girls education'.
Relationships among these individual words, their grammatical variations (girl
and girls) have already been built. Wouldn't you want to begin with more
sophisticated sources and look for ways to resolve ambiguity through
contextualization?
I'd also like to share with you our experiences in using free form
expansion of grammatical variations at the search stage. The results
have shown to increase irrelevance in search results. It seems that
there is a false idea that increasing results through automated grammatical
expansion will improve recall, but in fact that may not be the case
either. Dictionary level synonyms don't necessarily retain the context
of a concept so relevance may be lost in applications which build on
them. It seems to us that only when the expansion is 'managed' at the
indexing stage is there a good result on relevance. Doesn't the
relevance of the results have to be a major consideration, particularly if you
intend to do reasoning on the base going forward?
Best regards,
Denise -----ontac-forum-bounces@xxxxxxxxxxxxxx
wrote: -----
To:
"ONTAC-WG General Discussion" <ontac-forum@xxxxxxxxxxxxxx> From:
"Cassidy, Patrick J." <pcassidy@xxxxxxxxx> Sent by:
ontac-forum-bounces@xxxxxxxxxxxxxx Date: 10/15/2005 10:11PM Subject:
RE: [ontac-forum] A potential defining vocabulary for
definitions
Peter, I did not
anticipate that other candidates for a restricted English defining
vocabulary would be proposed, but if anyone has a specific interest in
such an English defining vocabulary, please do send it in. Any that exist
could be used, but I would expect that as the ONTACWG adds and subtracts
to create a vocabulary for our purposes the content of such a vocabulary
would quickly differ from that used in the LDOCE or another normal
dictionary. So the LDOCE vocabulary is merely a convenient starting
point and illustrative example of what can be done. I believe that there
are other dictionaries that use a restricted vocabulary for definitions,
such as the Macmillan Student's Dictionary and The Cambridge Dictionary
of American English (which also uses about 2000 words). I have not
attempted to compare them to that of the LDOCE; for our purposes I
wouldn't expect any one to be markedly better than any other, and I
wouldn't know how to measure their utility. If we get files with
other basic defining vocabularies, I would be inclined to simply merge
them and use the merged version as our starting point. WordNet
is of course of special interest, though not as a defining vocabulary --
it has over 100,000 words. Although the "glosses" (definitions) in
WordNet do not use, as far I am aware, any restrictions on the words or
senses of the words, there is a project, the "Extended WordNet", which
has tagged those glosses so as to specify the actual sense of each word
used in most of the definitions. See: http://xwn.hlt.utdallas.edu/wsd.html
Such
disambiguated glosses could be useful as examples for defining concepts
in the COSMO, though I expect that those glosses would not in general be
parseable by an automatic controlled-language interpreter. There have
been programs designed to interpret dictionary definitions (for example,
see: http://www.clres.com/dict.html), but I haven't worked
with them, and don't know whether they would be useful for this purpose.
The existing upper ontologies have more or less precise mappings to
a lot of WordNet synsets (word senses), and I expect those mappings to be
helpful in analyzing the relations among the existing upper
ontologies.
The COSMO will be an ontology in which the meanings of
the concepts are specified logically. It will be grounded in
reality by specifying some unambiguously identifiable real-world
instances of things that are intended to be instances of the
logically-specified concepts. A controlled defining vocabulary
would have to have its terms rigidly aligned with the concepts in the
ontology, so a "controlled English" for ONTACWG purposes would probably
differ significantly from any lexicographic controlled vocabulary.
But the LDOCE example would, I think, be a good starting point to
develop a vocabulary specifically designed for expressing logically
precise statements with detailed semantics associated with the defining
terms. I provide it also for exploratory purposes. Those who
are maintaining community knowledge classifications, if they are curious,
might try using those limited word to define the terms in their own
knowledge classifications. As I mentioned, if they find that they
need additional terms to create good definitions, those additional terms
could be accumulated as candidates for themselves being defined
logically.
Pat
Patrick Cassidy MITRE Corporation 260
Industrial Way Eatontown, NJ 07724 Mail Stop: MNJE Phone:
732-578-6340 Cell: 908-565-4053 Fax: 732-578-6012 Email:
pcassidy@xxxxxxxxx
-----Original Message----- From:
ontac-forum-bounces@xxxxxxxxxxxxxx [mailto:ontac-forum-bounces@xxxxxxxxxxxxxx] On Behalf Of
Peter P. Yim Sent: Saturday, October 15, 2005 7:57 PM To: ONTAC-WG
General Discussion Subject: Re: [ontac-forum] A potential defining
vocabulary for definitions
Thank you, Pat.
Are we inviting
the group to propose other candidates?
In addition to your
(a)
defining vocabulary from the Longman Dictionary of Contemporary English
(LDOCE) - suggested: PatCassidy/2005.10.15
The immediate ones that
come to mind would be:
(b) the Oxford English Dictionary (OED),
and
(c) Wordnet (http://wordnet.princeton.edu/)
It would be nice for
those in the community who has working knowledge of the above (plus
whatever other candidates) to present the pros and cons of each, and
debate on which should be adopted. ... Please.
Regards.
=ppy --
Cassidy, Patrick J. wrote Sat, 15 Oct 2005
18:59:59 -0400: > ONTACWG members: > > > I
have placed a file containing the defining vocabulary from the Longman
> Dictionary of Contemporary English (LDOCE) in the reference
folder at: > > http://colab.cim3.net/file/work/SICoP/ontac/reference/LDOCE-definingVoc abularyList.txt >
> > This is the set of words (about 2200) used by
Longman to define all of > the 56,000 words and phrases in its
dictionary. That dictionary is > intended to be understandable
by those learning English, and the editors > made a conscious
effort to write clearly understandable definitions > using the
minimum vocabulary. Many of the words are used in more than
> one sense, as with words that have multiple parts of speech;
the actual > number of senses used may be more than 4000.
This is an example of the > practice of specifying the
meanings of terms or concepts using a > relatively small set of
defining concepts. This is analogous to the > process by which
we hope the Common Semantic Model (COSMO) will enable > semantic
interoperability of knowledge-based systems built by different >
groups, by providing a common conceptual defining vocabulary that will
> be independent of the terms used in community
knowledge classification > systems, but capable of precisely
specifying the meanings of the > community terms.. > >
> > The Longman defining vocabulary could also serve as
a starting point for > the development of an English defining
vocabulary for ONTACWG, which > could be used to make it easier to
create logically precise definitions, > and assertions of fact.
There are several "Controlled English" > programs that
have been used to make logical statements in an > English-like
grammar. If we have a vocabulary of words with precisely >
defined meanings, it should be possible to allow definitions to be >
phrased in normal but moderately restricted English, and be interpreted
> correctly by the translator program. Some ambiguity in the
defining > vocabulary should be resolvable by the lexical context,
but it is > possible that the full range of meanings actually used in
the LDOCE will > be too wide to be resolvable, and the "defining
vocabulary" or the > grammar for defining terms in ONTACWG databases
may need to be more > restricted than the language the editors of
LDOCE use. > > > > As with the COSMO, an
English "Defining vocabulary" would be open to > additions as
required to accommodate the needs of the different > communities.
It will always be convenient for specialized communities >
to use terms with specific meanings in their contexts of interest, >
including very technical terms. If those terms themselves could be
> defined by both the logical specifications of the COSMO and the
> restricted vocabulary of an ONTACWG "English defining
vocabulary", they > would constitute specialized extensions of the
COSMO and English > vocabularies. Then natural English
definitions even in those technical > areas could be created
with accurately interpretable meanings. > > >
> Attempting to create definitions of community-specific terms
using such > a defining vocabulary could help to recognize when
the logical concept > inventory of the COSMO is inadequate and
needs supplementation, if it
> becomes necessary to use English
terms that have no associated concept > in the COSMO. Prima
facie cases like that could allow domain > specialists with only
modest familiarity with the COSMO to help the > maintenance team to
decide which extensions should have greatest > priority. Simple
tools like a spell-checker using only the defining > vocabulary as
its dictionary would help in using that vocabulary for > creating
precise definitions. > > > > It is likely
that similar controlled natural language vocabularies and >
grammars could be created for other languages, but I myself have no >
acquaintance with such work. > > > > To use
an existing controlled-language system to create definitions for >
the ONTACWG would require adaptation of such a system to reference the
> COSMO ontology. This may take considerable effort, so it
will probably > be necessary to find projects that are ongoing and
for which someone who > is familiar with the system will be able
to spend some time doing the
> adaptation. If any ONTACWG
members are acquainted with such a project, > perhaps an inquiry
to the developers would provide us with information > to determine
the feasibility of adaptation in each case. I will be > happy
to participate in discussions of such a possibility. Feel free to
> send suggestions to me directly, or to the list. > >
> > Pat > > Patrick Cassidy > MITRE
Corporation > 260 Industrial Way > Eatontown, NJ 07724 >
Mail Stop: MNJE > Phone: 732-578-6340 > Cell:
908-565-4053 > Fax: 732-578-6012 > Email: pcassidy at
mitre.org > > > >
> ----------------------------------------------------------------------- - >
> >
_________________________________________________________________ >
Message Archives: http://colab.cim3.net/forum/ontac-forum/ > To Post:
mailto:ontac-forum@xxxxxxxxxxxxxx >
Subscribe/Unsubscribe/Config: http://colab.cim3.net/mailman/listinfo/ontac-forum/ >
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/ >
Community Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin gWG
_________________________________________________________________ Message
Archives: http://colab.cim3.net/forum/ontac-forum/ To Post: mailto:ontac-forum@xxxxxxxxxxxxxx Subscribe/Unsubscribe/Config: http://colab.cim3.net/mailman/listinfo/ontac-forum/ Shared
Files: http://colab.cim3.net/file/work/SICoP/ontac/ Community
Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatin gWG
_________________________________________________________________ Message
Archives: http://colab.cim3.net/forum/ontac-forum/ To Post: mailto:ontac-forum@xxxxxxxxxxxxxx Subscribe/Unsubscribe/Config:
http://colab.cim3.net/mailman/listinfo/ontac-forum/ Shared
Files: http://colab.cim3.net/file/work/SICoP/ontac/ Community
Wiki: http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG
_________________________________________________________________
Message Archives: http://colab.cim3.net/forum/ontac-forum/
To Post: mailto:ontac-forum@xxxxxxxxxxxxxx
Subscribe/Unsubscribe/Config:
http://colab.cim3.net/mailman/listinfo/ontac-forum/
Shared Files: http://colab.cim3.net/file/work/SICoP/ontac/
Community Wiki:
http://colab.cim3.net/cgi-bin/wiki.pl?SICoP/OntologyTaxonomyCoordinatingWG (01)
|