Some simple (but rather long) musings on the
subject of interoperability, which seems to be very complex subject indeed:
Individual applications use or create
data. This data might reside in databases, files, be input directly from users,
or be calculated via algorithms or heuristics. The sum of this data, for
simplicity, will be referred to as the application’s information space. Some legacy
applications may be designed to share common data sources, such as the CIA
World Factbook. This sharing doesn’t make the systems interoperability,
but does increase the likelihood that similar results from each system might be
consistent. A potential problem occurs when applications that do not share data
across their information space attempt to augment, confirm or refute their
results using data from a different application. A simple example might be a
user who queries an application for the distance to Algol. The application has
a local definition of the term Algol to be the oceanographic ship by the name
of “Algol”. It properly processes the user request and returns the
answer - 1285 nautical miles. In an attempt at interoperability the user
request may be broadcast outside of the application’s information space
to see if some other application may be able provide additional information.
Another application can parse the term Algol and returns the unhelpful answer -
93 light years. Of course, the problem is that in that application’s
information space the term Algol is the name of a star. So, a key requirement
for interoperability within the same domain, or across domains, is an ability
to disambiguate terms by providing some level of semantics. An issue is just
how much disambiguation is needed. In the example above, the use of an XML
schema with a simple parent-child relationship (star – Algol, ship –
Algol) might be adequate. In other instances a more complete matching as
potentially provided by an expressive ontology language, such as OWL, may be
needed. Again, in the example what we really needed to know is if the term
Algol represented a ship in order to support interoperability, whether the desired
interoperability was within the same domain or cross-domain. The determination
from some higher level ontology that the term Algol from both systems matched
the concept of a physical object would not seem to contribute much to
interoperability. There are many other issues that might cause interoperability
between different systems to fail. For example, the systems above might very
well have the location of their users (system physical location) hard-wired
into the system code. In that case the systems results to the distance query
would give inconsistent results even if the term Algol was determined to be
semantically identical. It also doesn’t address such issues as systems
being on different networks for security or classification reasons, or being
behind local firewalls that would seriously hamper interoperability.
So, exactly which data terms need to be disambiguated?
It seems only the terms that have potential for confusion fit this category. To
identify these one might list all the terms in all the applications that are
being considered for interoperability and then see which ones could give rise
to confusion, like the two different meanings for Algol (actually there are at
least three if you include the computer language Algol). Again, the issue
of whether the systems to be integrated are in-domain or cross-domain doesn’t
seem to be an issue. Useful semantic term matching might come from any system
in any domain and it would be very hard to predict in advance just which
domains should be included and which should be excluded to ensure meaningful
interoperability and helpful cross-system results to the users. Once the terms
with potential for confusion have been identified they can be semantically
represented in whatever common ontology-like language that is adequate for the
task. The determination as to the degree of expressiveness needed for semantic
interoperability is something that needs to be researched. However, I think it might
have been Jim Hendler who said, “a little semantics goes a long way”.
Once the potentially confusing terms are adequately
semantically represented, what do you do next? Most legacy applications don’t
have any interface provisions for interacting with any sort of external
semantic representation. One approach to developing such an interface might be
to create something that might be called “interoperability agents”.
Such agents would accept user queries from an application, look at that
application’s external semantic representation of the terms, search for
other applications that use the same terms in the same way, query those
applications that have matching term use, and return the results to the
original application. There is still the question of what the original
application would do with the returned results since the application almost
surely wasn’t designed to support the input of additional results from
external systems. One approach to this problem might be to keep the whole
interoperability mechanism as physically separate as possible from the legacy
applications by creating a “shadow” web browser interface. In this
way the user of an application would get the results from his/her application
and the results from any other application presented in a separate web browser
window. Each application would have to develop an interface to the shadow web
interoperability system. This would provide only a limited form of
interoperability. Each separate application wouldn’t actually be
interoperating, but the potentially useful results from separate applications
could still be provided to the users.
Of course, if you want to design
interoperability for systems to be developed in the future that might natively use
semantics in their own information space, a semantic architecture providing
interoperability standards would be needed and would present a different set of
problems from attempting to create interoperability for legacy systems. I
suspect there is no silver bullet approach that will reasonably encompass both
legacy system interoperability and future system interoperability. However, it
would be a shame if ten years from now we are still debating how we might get
all the systems developed in that ten-year time frame to interoperate. Some
hard decisions will be needed to ensure (force) future systems to be built to
interoperability standards. Unfortunately, that may simply be politically
impossible to achieve. Remember Ada?
John