ONTACWG
members:
I have placed
a file containing the defining vocabulary from the Longman Dictionary of
Contemporary English (LDOCE) in the reference folder at:
http://colab.cim3.net/file/work/SICoP/ontac/reference/LDOCE-definingVocabularyList.txt
This is the
set of words (about 2200) used by Longman to define all of the 56,000 words and
phrases in its dictionary. That
dictionary is intended to be understandable by those learning English, and the
editors made a conscious effort to write clearly understandable definitions
using the minimum vocabulary. Many
of the words are used in more than one sense, as with words that have multiple
parts of speech; the actual number of senses used may be more than 4000. This is an example of the practice of
specifying the meanings of terms or concepts using a relatively small set of
defining concepts. This is
analogous to the process by which we hope the Common Semantic Model (COSMO) will
enable semantic interoperability of knowledge-based systems built by different
groups, by providing a common conceptual defining vocabulary that will be
independent of the terms used in community knowledge classification systems, but capable of precisely specifying the meanings of
the community terms..
The Longman
defining vocabulary could also serve as a starting point for the development of
an English defining vocabulary for ONTACWG, which could be used to make it
easier to create logically precise definitions, and assertions of fact. There are several "Controlled English" programs that have
been used to make logical statements in an English-like grammar. If we have a vocabulary of words with
precisely defined meanings, it should be possible to allow definitions to be
phrased in normal but moderately restricted English, and be interpreted
correctly by the translator program.
Some ambiguity in the defining vocabulary should be resolvable by the
lexical context, but it is possible that the full range of meanings actually
used in the LDOCE will be too wide to be resolvable, and the "defining vocabulary" or the grammar
for defining terms in ONTACWG databases may need to be more restricted than the
language the editors of LDOCE use.
As with the
COSMO, an English "Defining vocabulary" would be open to additions as required
to accommodate the needs of the different communities. It will always be convenient for
specialized communities to use terms with specific meanings in their contexts of
interest, including very technical terms.
If those terms themselves could be defined by both the logical
specifications of the COSMO and the restricted vocabulary of an ONTACWG "English
defining vocabulary", they would constitute specialized extensions of the COSMO
and English vocabularies. Then
natural English definitions even in those technical areas could be created with accurately
interpretable meanings.
Attempting to
create definitions of community-specific terms using such a defining vocabulary
could help to recognize when the logical concept inventory of the COSMO is
inadequate and needs supplementation, if it becomes necessary to use English
terms that have no associated concept in the COSMO. Prima facie cases like that could allow
domain specialists with only modest familiarity with the COSMO to help the
maintenance team to decide which extensions should have greatest priority. Simple tools like a spell-checker using
only the defining vocabulary as its dictionary would help in using that
vocabulary for creating precise definitions.
It is likely
that similar controlled natural language vocabularies and grammars could be
created for other languages, but I myself have no acquaintance with such
work.
To use an
existing controlled-language system to create definitions for the ONTACWG would
require adaptation of such a system to reference the COSMO ontology. This may take considerable effort, so it
will probably be necessary to find projects that are ongoing and for which
someone who is familiar with the system will be able to spend some time doing
the adaptation. If any ONTACWG
members are acquainted with such a project, perhaps an inquiry to the developers
would provide us with information to determine the feasibility of adaptation in
each case. I will be happy to
participate in discussions of such a possibility. Feel free to send suggestions to me
directly, or to the list.
Pat