Extending Semantic Interoperability
To Legacy Systems
And an Unpredictable Future
John F. Sowa
VivoMind Intelligence, Inc.
15 August 2006
http://www.jfsowa.com/talks/extend.pdf
Interoperability and Ontology
Computer systems interoperate by passing messages.
Every message has a meaning (semantics) and a purpose (pragmatics).
The role of ontology is to make the semantics and pragmatics explicit
in terms of the people, places, things, events, and properties involved.
Questions:
- How can ontology facilitate interoperability?
- What if the systems have different ontologies?
- Or no explicit ontology?
- Or a dynamically changing ontology?
- How can we support a smooth migration path
from legacy systems and to future systems?
To Predict the Future — Look at the Past
Questions:
- How have formal, logic-based ontologies been used?
- Are there any success stories for formal ontologies?
- Are there any success stories for formal methods of any kind?
- How do they compare with informal methods?
- Can formal and informal systems coexist?
- Is it possible to migrate from informal systems to formal ones?
- Is it possible to derive formal structures from informal resources?
Early History
1940s:
- Vannevar Bush: Proposed the WWW in 1947.
- Warren Weaver: Suggested machine translation
as an important application for "Giant Brains".
1950s:
- Formal grammars for natural and artificial languages.
- Semantic networks for machine translation.
- Knowledge representation languages for artificial intelligence.
- Hao Wang's theorem prover took a total of 7 minutes to prove
the first 378 theorems of Principia Mathematica on an IBM 704.
The New Field of Computer Science
1960s:
- First compiler-compilers based on formal methods.
- First Object-Oriented Programming Language (Simula 67).
- Vienna Definition Language (VDL) for formal specifications.
- Ted Codd proposes logic as a foundation for relational databases.
- Petri defines networks for specifying interacting events.
- Dijkstra designs a provably correct operating system.
In 5 years of operation, the only bugs were minor coding errors.
- Mac Hack chess program beats the philosopher Hubert Dreyfus,
who had claimed that a chess program could never beat an amateur.
- IBM implements the Generalized Markup Language (GML),
which later became SGML, HTML, and XML.
- Research terminated on the Georgetown Automatic Translator (GAT),
but under the name Systran it is very widely used today,
and under the name Babelfish it is available on the WWW.
Computer Science Reaches Maturity
1970s:
- Xerox PARC invents the WIMPy user interface
(Windows, icons, menus, pointing device).
- Algorithmic complexity theory.
- IBM designs SQL as a logic-based query language for relational DBs.
- A tiny company called Oracle implements SQL for the CIA.
- Rule-based expert systems.
- Prolog is the first logic-programming language.
Ted Codd said that he wished he had invented Prolog.
- Natural language query systems support full first-order logic.
Users love them, but defining the vocabulary is not easy.
- ANSI/SPARC Conceptual Schema for facilitating DB interoperability.
- Conceptual graphs as a schema language for databases
and NL interfaces.
ANSI/SPARC Conceptual Schema
Proposed as the basis for interoperable databases in 1978.
Revived as an ISO standards project, R.I.P. 1999.
The PC Revolution
A new generation of PC users are unaware of anything that went before.
Many great new "killer apps" — e.g., spreadsheets.
Most of the mistakes and some of the lessons of the past are repeated.
IBM mainframes begin a long, slow decline.
Old timers continue to do research on formal methods.
But anything that cannot be done by WIMPy tools is ignored.
The Unified Modeling Language (UML) has prettier diagrams than VDL.
Those diagrams could be defined in logic, but they're not.
Without a formal definition in logic, the diagrams
lack the precision and coherence of VDL.
World's Largest Ontology Project
Cyc project started in 1984 by Doug Lenat.
- Name comes from the stressed syllable of encyclopedia.
- Goal: implement the commonsense knowledge
of an average human being.
- After $70 million and 700 person-years of work,
600,000 categories
defined by 2,000,000 axioms
organized in 6,000 microtheories.
Project Halo
Project for evaluating methods of knowledge representation.
Goal: Build an intelligent tutor.
Test case: Encode knowledge from a chemistry textbook in order
to answer questions on a freshman chemistry exam.
Participants: Cycorp, OntoPrise, SRI International.
Results:
- Average score: about 40% to 47% correct.
- Cost to encode knowledge: average about $10,000 per page
from the textbook.
- Despite its large knowledge base, Cyc had the lowest score.
Lessons to be Learned
Why did UML succeed, but VDL was ignored?
- UML is much easier to learn.
- UML provides useful features for current technologies
as incremental, evolutionary additions.
Why hasn't Cyc been used in commercial applications?
- Cyc introduces a radically new paradigm that has
no point of contact with what commercial systems do.
- Cyc ignores existing applications and does nothing
to support them or even make use of them.
- Cyc has focused on "pure" research — that's not bad, but...
Recommendations
Take advantage of what developers already know:
- SQL is the most widely used logic-based notation on earth.
- UML diagrams haven't been formally defined, but
they can be defined and used as precisely as VDL.
- Use controlled English as a supplement to UML.
Focus on bottom-up tasks, rather than top-down ontologies:
- Look at the kinds of messages generated for a given task.
- Two people (or computer systems) can agree on the semantics
of a specific task without realigning their global ontologies.
- Even legacy systems can perform useful tasks while passing
messages to a more sophisticated knowledge-based system.
Use ISO Common Logic as the underlying formalism for the database
language, the UML diagrams, and controlled English.
ISO Common Logic
CL is a framework for a family of logic-based languages:
- Three general-purpose dialects: CLIF, CGIF, and XCL.
- With a semantics that is a superset of the semantics
of many other logic-based languages — including RDF, OWL, and SQL.
- With an abstract syntax that can be specialized to
the concrete syntaxes of other logic-based languages.
- Designed to preserve the semantics when information
is interchanged among heterogeneous systems.
Purpose: Guarantee that content exchanged between
CL-conformant languages has the same semantics in each language.
Representing Rules in Controlled English
Attempto Controlled English:
If a copy of a book is checked out to a borrower
and a staff member returns the copy
then the copy is available.
If a staff member adds a copy of a book to the library
and no catalog entry of the book exists
then the staff member creates a catalog entry
that contains the author name of the book
and the title of the book
and the subject area of the book
and the staff member enters the id of the copy
and the copy is available.
These statements can be automatically translated to or from CL.
Use of Controlled English
- Controlled English is a formal language that can be read
by people who had never studied logic or programming languages.
- Controlled English is not true English, and it requires
tools that ensure the authors stay within the limited subset.
- Can be implemented in integrated development tools.
- Can be generated automatically for help and diagnostics.
- Can serve as an agent communication language —
which is readable by both humans and computers.
- But the real challenge is to deal with legacy systems.
Elephant 2000
I meant what I said, and I said what I meant.
An elephant's faithful, one hundred per cent.
Moreover,
An elephant never forgets.
Proposal by John McCarthy:
A language based on logic and speech acts.
Conclusions
Legacy systems survive for many decades —
and their ontologies are inherited by their successors.
Communications among people *and* computers are always
based on task-oriented ontologies.
Those ontologies are bottom-up, highly specialized, and usually de facto.
Example: Amazon.com ontology, which suppliers are forced to adopt.
Formal definitions are important for both upper and lower ontologies.
Upper-level ontologies are important as guidelines.
When conflicts occur, the lower level wins.
At every level, intentions, expressed in speech acts, are fundamental.
Challenge to AI for the Next Fifty Years
Position paper by Alan Bundy, pioneer in automated problem solving:
-
A few minutes studying any particular representation rapidly reveals
deficiencies in expressivity or efficiency or both.
-
The world is infinitely complex, so there is no end to the
qualifications, ramifications and richness of detail that one
could incorporate, and that you might need to incorporate
for a particular application.
-
For a narrow application, it is often sufficient
to hand-craft a representation
that hits the desired sweet spot.
-
In general,
the representation itself needs to be manipulated automatically.
-
Such manipulation must be able to change the underlying syntax and
semantics of the ontology.
-
We believe that automatic representation development, evolution and
repair must be a major goal of artificial intelligence research
over the next 50 years.