 |
Journal of Digital Information, Volume 1 Issue 8 Article No. 42, 2001-02-08
To cite this paper please include the details above in the full reference
Themes: Digital libraries, Information discovery
Peer reviewed paper Printable version available.
MetaNet - A Metadata Term Thesaurus to Enable
Semantic Interoperability Between Metadata Domains
Jane Hunter
DSTC Pty Ltd, University of Queensland, Qld, 4072, Australia
Email: jane@dstc.edu.au Web: home page
Key features References; Figures 1,
2, 3, 4; Tables
1, 2, 3,
4
Abstract
Metadata interoperability is a fundamental requirement for
access to information within networked knowledge organization systems.
The Harmony international digital library project [1]
has developed a common underlying data model (the ABC model) to enable
the scalable mapping of metadata descriptions across domains and media
types. The ABC model [2] provides a set of basic building
blocks for metadata modeling and recognizes the importance of 'events'
to describe unambiguously metadata for objects with a complex history.
To test and evaluate the interoperability capabilities of this model, we
applied it to some real multimedia examples and analysed the results of
mapping from the ABC model to various different metadata domains using
XSLT [3]. This work revealed serious limitations in the
ability of XSLT to support flexible dynamic semantic mapping. To overcome
this, we developed MetaNet [4], a metadata term thesaurus
which provides the additional semantic knowledge that is non-existent within
declarative XML-encoded metadata descriptions. This paper describes MetaNet,
its RDF Schema [5] representation and a hybrid mapping
approach which combines the structural and syntactic mapping capabilities
of XSLT with the semantic knowledge of MetaNet, to enable flexible and
dynamic mapping among metadata standards.
1 Introduction
Networked knowledge organisation systems typically contain objects of mixed
media types which are described using a multitude of diverse metadata schemas.
Hence machine understanding of metadata descriptions which conform to schemas
from different domains is a fundamental requirement for access to information
within networked knowledge organization systems. In particular, there are
three main scenarios in which interoperability among metadata descriptions
is required:
-
To enable a single search interface across heterogeneous metadata descriptions;
-
To enable the integration or merging of descriptions which are based on
complementary but possibly overlapping metadata schemas or standards;
-
To enable different views of the one underlying and complete metadata description,
depending on the user's particular interest, perpective or requirements.
Metadata descriptions from different domains are not semantically distinct
but overlap and relate to each other in complex ways. Achieving interoperability
between such metadata descriptions via manually-generated one-to-one crosswalks
[6]
is useful, but this approach does not scale to the many metadata vocabularies
that will develop. A more scalable and cost-effective approach is to exploit
the fact that many entities and relationships - for example, people, places,
creations, organisations, events, etc. - occur across all of the domains.
The Harmony project [1] has been investigating this more
general approach towards metadata interoperability and in the process has
developed the ABC model and vocabulary [2].
The hypothesis is that such an approach will lead to more efficient,
scalable machine-translations between heterogeneous metadata descriptions.
To test this hypothesis and to evaluate the interoperability capabilities
of the ABC model, we applied it to some real multimedia examples and analysed
the results of mapping from the ABC model to various different metadata
domains using XSLT [3]. This work revealed serious limitations
in XSLT's ability to support flexible dynamic semantic mapping. To overcome
this, we developed MetaNet [4], a metadata term thesaurus
which provides the additional semantic knowledge that is non-existent within
declarative XML-encoded metadata descriptions.
This paper describes the optimum metadata mapping approach determined
from applying the ABC model to a small test set of multimedia examples.
This approach combines:
-
the ABC event-aware metadata model, developed within the Harmony project,
as the underlying model for scalable generic mappings between domain-specific
vocabularies, with;
-
XSLT for parsing XML descriptions and performing structural and syntactic
mapping, and;
-
MetaNet, a metadata term thesaurus, to provide the semantic knowledge required
to enable semantic mapping between metadata terms from different domains
or standards.
2 Definitions of Terms
This section defines the key terms used throughout the remainder of the
paper:
-
Metadata - data about data - or more commonly "descriptive information
about Web resources". The use of standardized descriptive metadata can
substantially improve the discovery and retrieval of relevant networked
resources. Different communities or domains have developed their own standardized
metadata vocaularies to meet their specific needs.
-
Vocabularies - shared terminologies with commonly agreed-upon semantics
for a domain. Common vocabularies enable search engines, agents, authors
and users to communicate within a domain.
-
Schemas - provide a standard way of defining standard domain-specific
vocabularies by defining a common set of elements, their semantics and
the relationships between the elements.
-
Ontology - a formal description of the concepts, roles and relationships
that exist for an agent or community of agents. Ontologies provide a shared
and common understanding of a domain that can be communicated across people
and applications, and play a major role in supporting information exchange
and discovery.
-
Thesaurus - the vocabulary of a controlled indexing language, formally
organized so that the a priori relationships between concepts (for
example "broader" and "narrower") are made explicit. [7]
-
Metadata Thesaurus - a thesaurus (defined according to ISO 2788
standard for monolingual thesauri [7]) which defines the
relationships between metadata terms from different domain vocabularies.
3 Related Work
Thesauri have been used to improve the precision and recall of information
retrieval systems for over 30 years. The introduction of automated information
retrieval has caused a dramatic increase in the demand for vocabulary control,
particularly in the last decade. Examples of well known thesauri used to
provide authority control over the terms used for indexing documents in
the bibliographic, medical and cultural domains respectively are: the Library
of Congress Subject Headings (LCSH) [9], the Medical Subject
Headings (MeSH) [10] and the Art and Architecture Thesaurus
(AAT). [11] In addition, thesauri have been used within
information retrieval systems to improve retrieval effectiveness by providing
semantic roadmaps. [12], [13], [14]
Since the emergence of the Internet, a great deal of effort has been
invested in the development of metadata vocabularies to enable the exchange
and discovery of information across different applications and domains.
Metadata vocabularies such as Dublin Core [15], USMARC
[16],
INDECS [17], MPEG-7 [18], FGDC [19],
IEEE LOM [20] and CIDOC CRM [21] provide
standardized sets of descriptive elements to enable the exchange of resources
for specific applications or domains. Although these standards enable interoperability
within domains, they introduce the problem of incompatibility between disparate
and heterogeneous metadata descriptions or schemas across domains.
A literature survey reveals many different proposals for improving interoperability
between domain-specific vocabularies, thesauri and ontologies in the context
of information retrieval and exchange. These range from database schema
integration [22], to the use of ontologies in organizing
and integrating networked information systems (e.g. OBSERVER [23],
InfoSleuth [24], OntoSeek [25]) to
the merging of monolingual [26] and multilingual thesauri.
[8]
Two of the major research issues have been categorizing the complex kinds
of interthesaurus semantic relationships which exist
[27]
and automating the detection of these relationships during the merging
process. [28]
More recently the approach to merging thesauri has been to represent
them formally using RDF Schemas [29] and to use inference
engines to automate the merging - such as has been proposed in the Ontology
Inference Layer (OIL). [30]
In this paper we are not so much concerned with the specific process
by which MetaNet is generated or with expressing the complete set of possible
term relations (as described in ISO 2788) in MetaNet. Our primary objective
is to generate a thesaurus which specifies (an albeit simplified) set of
semantic relationships between metadata terms from a number of different
domain schemas relative to the ABC underlying vocabulary (the preferred
terms) and hence also to each other. Our goal is then to demonstrate how
this semantic knowledge can be represented in a machine-readable format
(RDF Schema) and extracted and combined with the syntactic and structural
mapping capabilities of XSLT to enable the implementation of flexible dynamic
mappings between metadata descriptions from different domains.
4 Overview of the ABC Underlying Metadata Model
The Harmony Project [1] is investigating a generic approach
to metadata interoperability through the development of an event-aware
metadata model. The ABC model [2] defines a set of fundamental
classes which provide the building blocks for expression (through sub-classing)
of application-specific or domain-specific metadata vocabularies. The base
classes, shown below, were determined by analysing commonalities between
different communities' metadata models (including: Dublin Core [15];
INDECS [17]; MPEG-7 [18]; CIDOC CRM
[21];
IFLA [31].)
-
Resources
-
Events
-
Inputs and Outputs
-
Acts
-
Context
-
Event Relations
ABC adopts an event-aware view for modeling the relationship between the
various manifestations of a creation. This event-aware view provides semantically
clear attachment points for the association of properties among the various
manifestations, events and contributors (agents) involved in a resource's
lifecycle. In addition, ABC provides a multiple views philosophy for metadata
modeling and recipes for inter-conversion between those views. If life-cycle
information is required, the event model can be used. When single resource
metadata is needed, a resource-centric state model is used. Figure 1 shows
the UML representation for the ABC metadata model.
Figure 1. UML representation of the ABC metadata model
5 A Simple Example
To test the ABC model and evaluate XSLT for metadata mapping, we considered
the following simple illustrative example:
"A resource which is a 130 min audio (MP3) recording of a 'Live
at Lincoln Center' performance. The Orchestra is the New York Philharmonic.
The performance was on April 7, 1998 at 8 pm Eastern Time. The musical
score performed is 'Concerto for Violin'. Copyright for the entire performance
is held by Lincoln Center for the Performing Arts."
First we describe this resource using the ABC model. We then attempt to
map from the ABC description to Dublin Core, MPEG-7 and ID3 [32]
descriptions respectively, using XSLT. Figure 2 illustrates the two steps
involved in mapping from the ABC metadata model to resource-centric models
such as Dublin Core, MPEG-7 and ID3:
-
The structural mapping step involves transferring event properties to the
output resource and creating a relationship between the output and input
resources associated with the event.
-
The semantic mapping step involves mapping the properties attached to the
output resource to semantically-equivalent properties in the output domain.
Appendix A contains the corresponding ABC, Resource-centric,
Dublin Core, ID3 and MPEG-7 descriptions.
Figure 2. Transformation from the ABC event-aware model to three different
resource-centric models
5.1 Structural Mapping Rules
For events which generate an output resource from an input resource, the
transformation from an event-aware metadata model to a simple resource-centric
metadata model consists of the following steps:
-
The Date, Time and Place properties within the Event's Context node can
be qualified using the Event Type and transferred to the target output
resource, e.g. Date.Performance, Time.Performance, Place.Performance;
-
The Role property of each Act associated with an event becomes a qualifier
on the Agent property which is attached to the target output resource and
its value is the Act's Agent Name, e.g. the Agent.Orchestra property has
value "New York Philharmonic";
-
A Relation property arc is generated from the event type (e.g. Performance
-> Relation.isPerformanceOf) and is attached to the target output resource.
The value of this property is the patient input resource of the event (e.g.
"comp523").
-
All other existing properties of the input and output resource remain the
same.
Other inheritance and metadata derivation rules may be possible but these
require further investigation.
For example, a Description property for the output resource can be
generated from the Event Type and the input resource's Title e.g. "Performance
of 'Concerto for Violin'". Or in many cases, the Title property
can be inherited by the output resource directly from the Title property
of either the input resource or the event.
6 An Evaluation of XSLT for Metadata Mappings
The Extensible Style Language (XSL) [3] consists of a
transformation language (XSLT) and a formatting language. The transformation
language XSLT (which acts independently of the formatting language) provides
elements that define rules for how one XML document is transformed into
another XML document. The transformed XML document may use the markup and
DTD of the original document or it may use a completely different set of
tags. The ability of XSLT to transform data from one XML representation
to another makes it appear to be ideal for metadata interchange applications.
An XSL document contains a list of templates and rules. A template rule
has a pattern specifying the trees it applies to and a template to be output
when the pattern is matched. When an XSL processor formats an XML document
using an XSL style sheet, it scans the XML document tree looking through
each sub-tree in turn. As each tree in the XML document is read, the processor
compares it with the pattern of each template rule in the style sheet.
When the processor finds a tree that matches a template rule's pattern,
it outputs the rule's template. This template generally includes some markup,
some new data and some data copied out of the tree from the original XML
document.
Using XSLT and the Xalan [33] XSLT processor we developed
XSL programs for transforming the ABC description above to DC, ID3 and
MPEG-7 descriptions, respectively. Appendix B
shows the resulting XSL files.
The mapping implementations in Appendix B revealed
that although XSLT works well for the structural mapping from an event
model to a resource-centric model based on the set of rules described in
Section 3.1, it is inadequate for implementing flexible dynamic semantic
mappings between metadata vocabularies. This is due to:
-
XSLT's limited capabilities for handling variable input descriptions based
on schemas which are not tightly constrained;
-
The non-existence of machine-understandable semantic information in declarative
XML-encoded metadata descriptions;
-
Processor-dependent handling of input parameters and procedural code extensions;
-
Limited string manipulation and comparison functions, e.g. it is not possible
to perform case-insensitive string comparisons within XSLT.
The mappings revealed that if the input XML descriptions are relatively
fixed and tightly constrained, then the semantic mappings can be hardwired
and XSLT is adequate. But if the input descriptions are at all variable
or unpredictable (e.g. undefined domain specific sub-classing and attributes)
then XSL simply cannot cope. Cawsey investigated the use of XSLT for customizing
RDF descriptions, reaching similar conclusions.
[34]
Below are listed a number of possible approaches to handling the semantic
mapping problem. The approach chosen is a balance between simplicity on
the one hand, and flexibility or scalability on the other. The wider the
targeted scope of interoperability, the more difficult it is to achieve
accurate, precise mappings. Below is a list of mapping approaches in increasing
order of both scope and difficulty:
-
Hardwire crosswalks between metadata terms from specific metadata domains
(easy, but only works for fixed input);
-
Extract mappings from a pre-defined multiple-domain mapping matrix;
-
Determine the semantic mappings from a metadata term ontology;
-
Determine the semantic mappings from a generic ontology such as WordNet;
-
Determine the semantic mappings from a dynamically generated ontology created
by using inferencing to merge multiple domain-specific ontologies.
By reducing the scope of the problem to interoperability between existing
metadata standards, then the fully generic approaches (e.g., 4 and 5 above)
become unnecessarily complex. Hence in the remainder of this paper we investigate
the less complex but still moderately flexible approaches (2 and 3) based
on a mapping matrix and a metadata term ontology, respectively.
7 Semantic Mapping via a Mapping Matrix
The second approach in the list above involves linking a mapping matrix
to the XSLT processor. The mapping matrix explicitly defines the semantic
mappings between a fixed set of metadata vocabularies from a number of
different domains. Figure 3 illustrates such a mapping matrix. If XPath
[35]
is used to specify the elements, then to some extent both the structural
and semantic mappings can be defined.
Table 1. Metadata mapping matrix
ABC Element |
DC Element |
ID3 Element |
MPEG-7 Path |
Resource/Title |
Title |
TIT2 |
CreationMetaInformation/Creation/Title/TitleText (@TitleType="original") |
Event/Act/Agent |
Creator |
TPE1 |
CreationMetaInformation/Creation/Creator (@role="creator") |
Publisher |
TPUB |
UsageMetaInformation/Publication/Publisher |
Contributor |
IPLS(involved People List),
TCOM(Composer),
TENC(Encoder),
TEXT(Lyricist),
TOLY(OriginalLyricist),
TOPE(Original Artist),
TPE2(Band, Orchestra, Accompaniment),
TPE3(Conductor),
TPE4 (Interpreter, Remixer, Modifier) |
CreationMetaInformation/Creation/Creator (@role) |
Resource/Subject |
Subject |
TIT1 |
CreationMetaInformation/Creation/Classification/PackagedType |
Resource/Description |
Description |
TIT3 |
CreationMetaInformation/Creation/CreationDescription |
Event/Context/Date |
Date.Creation |
- |
CreationMetaInformation/Creation/CreationDate |
Date.Publication |
- |
UsageMetaInformation/Publication/PublicationDate |
Date.Recording |
TRDA |
- |
Resource/Type |
Type |
TCON |
CreationMetaInformation/Classification/Genre |
Resource/Format |
Format |
TFLT |
MediaInformation.MediaProfile/MediaFormat/FileFormat |
Format.length |
TLEN |
- |
Format.size |
TSIZ |
- |
Resource/Identifier |
Identifier |
UFID |
MediaInformation/MediaIdentification/Identifier |
Event/Input |
Source |
TOAL (Title of original recording or source) |
- |
Event/Context/Place |
Coverage.Place |
- |
- |
This approach has certain debilitating limitations, however. A matrix
is only capable of specifying mappings which involve fairly simple one-to-one
mappings, and a two-dimensional matrix will only work if the mappings are
symmetrical in both directions across all the domains. If the mappings
are asymetrical then the matrix becomes highly complex and multi-dimensional.
However, the primary limitation of this approach is that it simply does
not scale - as the number of domains grows and the mappings become asymmetrical,
then the matrix becomes excessively complex and unwieldy.
8 Development of MetaNet, a Metadata Term Thesaurus
Rather than limiting the semantic mapping to a fixed number of domains/vocabularies
(i.e. the number of columns in the mapping matrix), a more generic approach
is to extract the mapping dynamically from a thesaurus of metadata terms,
generated by formally defining relationships between metadata terms from
a number of different domains' standardized vocabularies.
8.1 Intrathesaurus and Interthesaurus Relations
The ISO2788 standard for the identification and documentation of monolingual
thesauri [7] identifies the following types of intrathesaurus
relations:
-
hierarchical
-
associative
-
equivalence
The hierarchical relation occurs between concepts having "broader/narrower"
meanings. This can be further specialized into the generic (BTG/NTG), whole-part
(BTP/NTP) and instance (BT/NT) relations. For the sake of simplicity, we
have chosen only to model the BTG/NTG relation (a common practice among
thesauri developers) and the equivalence relation, and not to include associative
relations within MetaNet.
The ISO5964 standard for the documentation and establishment of multilingual
thesauri [8] identifies the following types of interthesaurus
relations:
-
exact equivalence
-
partial equivalence
-
single to multiple equivalence
-
inexact equivalence.
These relations indicate that the semantic relations between terms from
different metadata vocabularies are likely to be much more complex than
one-to-one exact equivalence and that even "exact equivalence" will be
an approximation. However, because the scope of our problem is limited
to relations between terms in a number of standardized English metadata
vocabularies, then we can expect the frequency of more complex mappings
to be less than for general natural language thesauri. For the first draft
of MetaNet, we decided only to consider exact and partial equivalence relations
and to combine them in the ET relation which defines equivalent/overlapping
terms. If two different domains use two different metadata terms which
are ETs in our thesaurus then we make the assumption that the domains are
referring to semantically equivalent concepts.
Consequently the metadata term thesaurus which we have developed, MetaNet
[4],
contains only preferred terms (the ABC core vocabulary), equivalent/overlapping
terms (ET), narrower terms (NT) and broader terms (BT), and attempts to
encompass terms from the most significant and widely-used metadata vocabularies
(Dublin Core, IFLA, IEEE LOM, INDECS).
8.2 Description of MetaNet
The objective of the MetaNet thesaurus is to provide the semantic
knowledge required to enable machine understanding of equivalence and hierarchical
(subtyping) relationships between metadata terms from different domains.
The scope of this thesaurus is limited to the most significant metadata
models/vocabularies used for describing attributes and events associated
with resources and their life cycles. This encompasses metadata vocabularies
from the bibliographic, museum, archival, record keeping and rights management
communities. It has been developed by performing WordNet [36]
searches using the core terms from the ABC vocabulary, and extracting those
synonyms and hyponyms which could conceivably be used in a metadata scheme
to represent the original core term. In addition, the results have been
compared with the vocabularies of the DC, INDECS, IFLA, IMS and CIDOC CRM
vocabularies to check that the majority of the terms used in these metadata
dictionaries have been incorporated into the thesaurus.
A machine-readable RDF Schema representation of this thesaurus has been
developed. [37] The RDF and RDF Schema elements, Class,
subClassOf,
property,
subPropertyOf are used to define the hierarchical/subtyping and
entity/attribute relationships between metadata elements. The RDFS
label
element is used to specify semantically equivalent terms which may be used.
The ABC core vocabulary is used as the top-level set of preferred terms.
Although this thesaurus has been generated manually, it could conceivably
be generated automatically by using inferencing mechanisms to merge RDF
Schemas from different domains, as has been proposed in the Ontology Inference
Layer (OIL). [30]
For example, consider "Agent", which is a core term of the ABC vocabulary
and hence a preferred term in the MetaNet thesaurus. Semantically equivalent
terms for "Agent", commonly used within other metadata vocabularies, include:
actor, contributor, creator, player, doer, worker, performer
Possible narrower terms or hyponyms for "Agent" include:
author, composer, artist, musician, . . etc.
Table 2 is an excerpt from the RDF Schema which illustrates the representation
for the "Agent" metadata term as well as its equivalent terms and a partial
hierarchy of its narrower terms.
Table 2. Excerpt from the RDF Schema
<?xml version="1.0"?>
<rdf:RDF xml:lang="en"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:ID="Agent">
<rdfs:comment xml:lang="en">The resources which contribute to or act
in an event. Typically agents are people, groups of people,
organisations or instruments.</rdfs:comment>
<rdfs:label xml:lang="en">Actor</rdfs:label>
<rdfs:label xml:lang="en">Contributor</rdfs:label>
<rdfs:label xml:lang="en">Creator</rdfs:label>
<rdfs:label xml:lang="en">Player</rdfs:label>
<rdfs:label xml:lang="en">Doer</rdfs:label>
<rdfs:label xml:lang="en">Worker</rdfs:label>
<rdfs:label xml:lang="en">Performer</rdfs:label>
<rdfs:subClassOf
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
</rdfs:Class>
<rdfs:Class rdf:ID="Author">
<rdfs:label xml:lang="en">Writer</rdfs:label>
<rdfs:label xml:lang="en">Wordsmith</rdfs:label>
<rdfs:subClassOf
rdf:resource="#Agent"/>
</rdfs:Class>
<rdfs:Class rdf:ID="Journalist">
<rdfs:label xml:lang="en">Columnist</rdfs:label>
<rdfs:label xml:lang="en">Reporter</rdfs:label>
<rdfs:subClassOf
rdf:resource="#Author"/>
</rdfs:Class>
</rdf:RDF>
|
A Web search and browse interface to MetaNet has also been developed.
[4]
Users can search on any common metadata term and retrieve a list of equivalent
terms, broader terms and narrower terms. Figure 3 shows the results of
a search on the term "author".
Figure 3. Results of MetaNet search
9 Linking MetaNet to XSLT
Using XSLT it is possible to parse an input XML description and for each
element encountered call a Java procedural code extension which determines
the equivalent term in the output domain from the semantic realtionships
specified in the MetaNet thesaurus.
For example, suppose the Java program, Mapping.java, contains an extension
function readMetaNet. For each element encountered during parsing
of the input metadata description, the input element name (e.g. abc:Agent)
and the output domain schema definition (e.g. the Dublin Core schema) are
passed to the readMetaNet function. This function searches the MetaNet
RDF Schema file for an element in the output schema definition that is
equivalent to the input element name (e.g. dc:contributor), and returns
this value. XSL creates a new output element with this name in the output
description. Figure 4 illustrates the program flowchart.
Figure 4. Program flow for metadata description mappings
The XSL code in Table 3 illustrates how to call a Java extension function,
readMetaNet,
from the main XSL file.
Table 3. XSL code to call a Java extension function,
readMetaNet,
from the main XSL file
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc ="http://purl.org/dc/elements/1.1/">
xmlns:lxslt="http://xml.apache.org/xslt"
xmlns:mapping="Mapping"
extension-element-prefixes="mapping"
version="1.0">
<lxslt:component prefix="mapping" elements="*" functions="readMetaNet">
<lxslt:script lang="javaclass" src="Mapping"/>
</lxslt:component>
<xsl:template match="ABC">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*">
<xsl:element name="mapping:readMetaNet(., 'dc')"/>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
|
Below is a high-level, simplistic algorithm describing the mapping process
that is performed within the readMetaNet Java function in Figure 4.
Table 4. Algorithm describing the mapping process within the readMetaNet
Java function in Figure 4
For each element in the input description
{
Search for the input element name in the output domain schema;
if (found) {
Map the input element to the equivalent output domain element;
}
else {
Extract the Equivalent Terms (ETs) for the input element from MetaNet;
Search the output domain schema for each of the ETs;
if (an ET is found)
{
Map the input element to the equivalent output domain element;
}
else {
Extract the broader terms (BTs) for the input element from MetaNet;
Search for each BT in the output domain namespace;
if (a BT is found)
{
Map the input element to the broader output domain element;
}
else {
Extract the narrower terms (NTs) for the input element from MetaNet;
Search for each NT in the output domain namespace;
if (a NT is found)
{
Map the input element to the narrower output domain element;
}
}
}
}
} endFor
|
10 Conclusions, Limitations and Future Work
10.1 Conclusions and Limitations
Our evaluation of XSLT for mapping between metadata descriptions from different
domains revealed that although XSLT is good for syntactical and structural
mapping, semantic mappings need to be hardwired into the code. Flexible
semantic mapping is only possible with the assistance of semantic knowledge
bases provided by ontologies or thesauruses such as the MetaNet thesaurus
described above.
The MetaNet thesaurus described here is a first draft English version,
based on the vocabulary of the ABC model. Although it has only been applied
to a relatively small sample set, some of the limitations of this thesaurus
are already evident. These include its inability to support metadata vocabularies
which use:
-
Tokens, e.g. ID3 tags such as TPE2, which are semantically meaningless.
This limitation can be overcome by either explicitly including such tags
in the thesaurus or searching the definitions (rather than element names)
in the output namespace for the input element name or its semantically
equivalent terms;
-
Abbreviations, e.g. acc.no.;
-
Qualifers or hybrid words joined by a variety of connectors, e.g. UserClass,
Assistant Editor, Art_Director, Time-span. This problem can be solved to
some extent by including "associated terms" in the thesaurus and by ignoring
typical "connectors".
In addition, the inherently ambiguous nature of language leads to the following
problems:
-
Metadata terms with multiple possible meanings, e.g."condition" - this
could be the current state of an object or it could be a restriction on
the permissable use of a resource. This can be overcome by the use of unambiguous
metadata terms by schema designers.
-
Multiple possible spellings for the same word, e.g. artefact/artifact,
colour/color.
-
This thesaurus is based on nouns, e.g. "creator", "publisher", and does
not search for related verbs, adverbs, adjectives in various tenses which
could be used to express the same semantics, e.g. "created_by", "published_by".
This problem could, to some extent, be overcome through the use of stemming.
Currently only English is supported. However, we believe that this thesaurus
could be extended to provide equivalent or overlapping terms for the ABC
vocabulary in other languages by following the recommendations specified
in ISO5964. [8]
10.2 Future Work
So far the ABC model has only been tested on a relatively small sample
set. We intend carrying out more extensive evaluation of both the ABC model
and the hybrid mapping approach, by applying them to metadata tranformations
between large sets of sample records provided by a number of different
CIMI [38] member organisations. The plan is to build
a testbed using multimedia museum resources and metadata descriptions provided
by CIMI members and to use this testbed to implement and evaluate metadata
interoperability between different museums' descriptions.
The Harmony ABC model exhibits many similarities with the CIDOC Conceptual
Reference Model (CRM) [21], a domain ontology developed
by the CIDOC Committee of the International Council of Museums. In the
near future we plan to investigate the possible merging of the Harmony
model and the CIDOC CRM model into a single ontology. We plan to use the
CIMI testbed described above to evaluate the "super" ontology resulting
from harmonization of these two models.
The mapping implementations above have all involved mapping from an
event-aware metadata model to a resource-centric metadata model. We are
also interested in the rules and mechanisms required for machine translations
between metadata descriptions based on different event-aware metadata models,
e.g. from ABC to INDECS or CIDOC CRM.
We would also like to investigate mapping between "application profiles"
or schemas which mix metadata elements imported from multiple different
namespaces. The test examples considered so far only present the problem
of mapping from a single domain's metadata description to another single
domain's metadata description, e.g. pure DC to pure MPEG-7. A situation
that will become increasingly common in the future is the need to map from
a schema which imports elements from multiple namespaces to another schema
which imports a different set of elements from multiple namespaces. In
addition, each schema may impose its own local
-
structural constraints, e.g. parent/child relationships
-
cardinality/occurrence constraints
-
datatyping, enumeration and formatting constraints on the element values.
We believe the approach proposed in this paper will support mapping between
mixed-domain "application profiles", but need to test this through further
research involving machine translations between metadata descriptions which
conform with both complex local usage constraints, (defined by XML Schemas
[39]),
as well as namespace-specific semantic definitions (defined by RDF Schemas).
Acknowledgements
The author acknowledges the valuable contributions which discussions with
Dan Brickley, Carl Lagoze, Martin Doerr and Sigge Lundberg have made to
this work.
The work reported in this paper has been funded by the Cooperative Research
Centre for Enterprise Distributed Systems Technology (DSTC) through the
Australian Federal Government's CRC Programme (Department of Industry,
Science and Resources).
References
[1]The Harmony Project Home Page http://www.ilrt.bris.ac.uk/discovery/harmony/
[2] C. Lagoze, J. Hunter and D. Brickley (2000) "An
Event-Aware Model for Metadata Interoperability". ECDL 2000, Lisbon,
September
[3] XSL Transformations (XSLT) Version 1.0 (1999) W3C
Recommendation, 16 November http://www.w3.org/TR/xslt.html
[4] MetaNet Search Page http://sunspot.dstc.edu.au:8888/Metanet/Top.html
[5] RDF
Schema Specification 1.0 (2000) W3C Candidate Recommendation, 27 March
http://www.w3.org/TR/rdf-schema/
[6] Dublin Core/MARC/GILS Crosswalk (1999) November
http://lcweb.loc.gov/marc/dccross.html
[7] ISO 2788 (1986) Documentation -- Guidelines for
the Development and Establishment of Monolingual Thesauri
[8] ISO 5964 (1985) Documentation -- Guidelines for
the Development and Establishment of Multilingual Thesauri
[9] Library of Congress Subject Headings, Cataloging
Distribution Service, Library of Congress http://lcweb.loc.gov/cds/lcsh.html
[10] Medical Subject Headings home page http://www.nlm.nih.gov/mesh/meshhome.html
[11] Art and Architecture Thesaurus Browser, Getty
Research Institute http://shiva.pub.getty.edu/aat_browser/
[12] C. Paice (1991) "A Thesaural Model of Information
Retrieval". Information Processing and Management, 27(5):433-447
[13] A.R. Aronson (1994) "Exploiting a Large Thesaurus
for Information Retrieval". RIAO 94, New York, October
[14] W. Bruce Croft and J. Yufeng (1994) "An Association
Thesaurus for Information Retrieval". RIAO 94, New York, October
[15] Dublin Core Metadata Initiative http://purl.org/dc/
[16] MARC Standards, Library of Congress Network Development
and MARC Standards Office http://lcweb.loc.gov/marc/marc.html
[17] G. Rust and M. Bide (1999) "The indecs Metadata
Schema Building Blocks". Indecs Metadata Model, November http://www.indecs.org/pdf/schema.pdf
[18] MPEG-7 Home Page http://www.darmstadt.gmd.de/mobile/MPEG7/index.html/
[19] Content Standard for Digital Geospatial Metadata
(CSDGM) http://www.fgdc.gov/metadata/contstan.html
[20] IEEE Learning Technology Standards Committee's
Learning Object Meta-data Working Group, Approved Working Draft WD5 Learning
Object Meta-data Scheme http://ltsc.ieee.org/wg12/
[21] ICOM/CIDOC Documentation Standards Group (1999)
Revised Definition of the CIDOC Conceptual Reference Model, September http://www.geneva-city.ch:80/musinfo/cidoc/oomodel
[22] C. Batini, M. Lenzerini and S.B. Navathe (1986)
"A comparative analysis of methodologies for database schema integration".
ACM Computing Surveys, 18(4):323-364, December
[23] E. Mena, V. Kashyap, A. Sheth and A. Illarramendi
(1996) "OBSERVER: An Approach for Query Processing in Global Information
Systems based on Interoperation across Pre-existing Ontologies". Proceedings
of the 1st IFCIS International Conference on Cooperative Information Systems
(CoopIS'96), Brussels, Belgium, June (IEEE Computer Society Press)
[24] R. Bayardo, et al. (1997) "InfoSleuth:
Agent-based Semantic Integration of Information in Open and Dynamic Environments".
Proceedings of ACM SIGMOD Conference on Management of Data, Tucson,
Arizona, May, pp. 195-206
[25] N. Guarino, C. Masolo and G. Vetere (1999) "Ontoseek:
Content-based Access to the Web". IEEE Intelligent Systems, Vol.
14, No. 3, May/June, 70-80
[26] H. Mili and R. Rada (1988) "Merging Thesauri:
Principles and Evalauation". IEEE Transactions on Pattern Analysis and
Machine Intelligence, 10(2):204-220
[27] M. Doerr and I. Fundulaki (1998) "A proposal on
extended interthesaurus links semantics". Technical Report TR-215, Institute
of Computer Science-FORTH, March
[28] M. Sintichakis and P. Constantopoulos (1997) "A
Method for Monolingual Thesauri Merging". Proceedings of the 20th ACM
International Conference on Research and Development in Information Retrieval
(ACM SIGIR), Philadeplphia, PA, USA, July
[29] B. Amann and I. Fundulaki (1999) "Integrating
Ontologies and Thesauri to Build RDF Schemas". In ECDL'99: Research and
Advanced Technologies for Digital Libraries, Paris, France, September,
Lecture Notes in Computer Science (Springer-Verlag), pp. 234-253
[30] Ontology Inference Layer http://www.ontoknowledge.org/oil/
[31] International Federation of Library Associations
and Institutions (IFLA) (1998) Functional Requirements for Bibliographic
Records, March http://www.ifla.org/VII/s13/frbr/frbr.pdf
[32] ID3 Tag Version 2.3.0 http://www.id3.org/id3v2.3.0.html
[33] Xalan-Java Overview http://xml.apache.org/xalan/overview.html
[34] A. Cawsey (2000) "Presenting tailored resource
descriptions: Will XSLT do the job?". WWW9 conference, Amsterdam,
May http://www.cee.hw.ac.uk/~alison/www9/paper.html
[35] XML Path Language (XPath) Version 1.0 (1999) November
http://www.w3.org/TR/xpath
[36] WordNet - a Lexical Database for English http://www.cogsci.princeton.edu/~wn/online/
[37] RDF Schema Representation of the MetaNet Thesaurus
(2000) October http://archive.dstc.edu.au/maenad/metanet.rdf
[38] CIMI Consortium for Interchange of Museum Information
http://www.cimi.org/
[39] XML Schema Language http://www.w3.org/XML/Schema
Appendix A
A.1 ABC Description of Example Resource
<?xml version="1.0"?>
<ABC>
<Event id="E1" Type="Performance">
<Title>Live At the Lincoln Centre</Title>
<Context>
<Date>7/4/98</Date>
<Time>20:00</Time>
<Place>Lincoln Centre</Place>
</Context>
<Act id="Act1">
<Agent>New York Philharmonic</Agent>
<Role>Orchestra</Role>
</Act>
<Input id="comp523"/>
<Output id="audio821"/>
<Rights>
Lincoln Center for Performing Arts
</Rights>
</Event>
<Resource id="comp523">
<Type>Musical Score</Type>
<Title>Concerto for Violin</Title>
</Resource>
<Resource id="audio821">
<Type>audio</Type>
<Format>MP3</Format>
<Length units="mins">
130
</Length>
</Resource>
<ABC>
A.2 Simple Resource-centric Description of Example Resource
<?xml version="1.0"?>
<Resource id="audio821">
<Title>Live At Lincoln Center</Title>
<Date.Performance>1998-07-04</Date.Performance>
<Time.Performance>20:00</Time.Performance>
<Place.Performance>Lincoln Centre</Place.Performance>
<Agent.Orchestra>New York Philharmonic</Agent.Orchestra>
<Relation.isPerformanceOf>comp523</Relation.isPerformanceOf>
<Description>Performance of 'Concerto for Violin'</Description>
<Rights>
Lincoln Center for Performing Arts
</Rights>
<Type>audio</Type>
<Format>MP3</Format>
<Length units="mins">
130
</Length>
</Resource>
A.3 Dublin Core Description of Simple Example
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about="audio821">
<dc:Title>Live At Lincoln Center</dc:Title>
<dc:Date.Performance>1998-07-04T20:00-05:00</dc:Date.Performance>
<dc:Coverage>Lincoln Centre</dc:Coverage>
<dc:Contributor.Orchestra>New York Philharmonic</dc:Contributor.Orchestra>
<dc:Relation.isPerformanceOf>comp523</dc:Relation.isPerformanceOf>
<dc:Description>Performance of 'Concerto for Violin'</dc:Description.Performance>
<dc:Rights>
Lincoln Center for Performing Arts
</dc:Rights>
<dc:Type>audio</dc:Type>
<dc:Format>MP3</dc:Format>
</rdf:Description>
<rdf:RDF>
A.4 MPEG-7 Description
<?xml version="1.0"?>
<MPEG-7 id="audio821">
<CreationMetaInformation>
<Creation>
<Title>Live at Lincoln Center</Title>
<Creator>
<Role>Orchestra</role>
<Name>New York Philharmonic</Name>
</Creator>
<CreationDate>
<day>7</day>
<month>4</month>
<year>1998</year>
</CreationDate>
<Location>
<PlaceName>Lincoln Center</PlaceName>
</Location>
</Creation>
<Classification>
<Genre>Performance</Genre>
</Classification>
</CreationMetaInformation>
<MediaInformation>
<MediaProfile>
<MediaFormat>
<Medium>MP3</Medium>
<Length><m>130</m></Length>
</MediaFormat>
</MediaProfile>
</MediaInformation>
<UsageMetaInformation>
<Rights>
<RightsId IdOrganization='Lincoln Center'/>
</Rights>
</UsageMetaInformation>
</MPEG-7>
A.5 ID3 Description
<?xml version="1.0"?>
<ID3>
<!-- Unique Identifier -->
<UFID>audio821</UFID>
<!-- Title -->
<TIT2>Live At Lincoln Center</TIT2>
<!-- Orchestra -->
<TPE2>New York Philharmonic</TPE2>
<!-- Type or Genre -->
<TCON>Performance</TCON>
<!-- Media Type sound originated from-->
<TMED>Audio/MP3</TMED>
<!-- Date Recorded -->
<TDAT>7/4/98</TDAT>
<!-- Time Recorded -->
<TIME>2100</TIME>
<!-- Length in millisecs-->
<TLEN>7800000</TLEN>
<!-- Original recording or source -->
<TOAL>comp523</TOAL>
<!-- Copyright Message -->
<TCOP>Lincoln Center of Performing Arts</TCOP>
</ID3>
Appendix B
B.1 XSL for Transforming from ABC to DC
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc ="http://purl.org/dc/elements/1.1/">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="ABC">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc ="http://purl.org/dc/elements/1.1/">
<rdf:Description>
<xsl:apply-templates select="Event"/>
<xsl:apply-templates select="Resource"/>
</rdf:Description>
</rdf:RDF>
</xsl:template>
<xsl:template match="Event">
<xsl:apply-templates select="Output"/>
<xsl:apply-templates select="Context"/>
<xsl:apply-templates select="Act"/>
<xsl:apply-templates select="Input"/>
<xsl:apply-templates select="Title"/>
<xsl:apply-templates select="Rights"/>
</xsl:template>
<xsl:template match="Output">
<xsl:attribute name="about">
<xsl:value-of select="@id"/>
</xsl:attribute>
<xsl:copy-of select="*"/>
</xsl:template>
<xsl:template match="Context">
<xsl:apply-templates select="Date"/>
<xsl:apply-templates select="Place"/>
</xsl:template>
<xsl:template match="Date">
<xsl:element name="dc:Date.{../../@Type}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Place">
<xsl:element name="dc:Coverage">
<xsl:value-of select='.'/>
</xsl:element>
</xsl:template>
<xsl:template match="Act">
<xsl:element name="dc:Contributor.{Role}">
<xsl:value-of select='Agent'/>
</xsl:element>
</xsl:template>
<xsl:template match="Input">
<xsl:element name="dc:Relation.is{../@Type}Of">
<xsl:value-of select="@id"/>
</xsl:element>
</xsl:template>
<xsl:template match="Title">
<xsl:element name="dc:Title">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Rights">
<xsl:element name="dc:Rights">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Resource">
<xsl:if test="@id=../Event/Output/@id">
<xsl:apply-templates select="Type"/>
<xsl:apply-templates select="Format"/>
</xsl:if>
<xsl:if test="@id=../Event/Input/@id">
<xsl:element name="dc:Description">
<xsl:value-of select="../Event/@Type"/> of
<xsl:text>"</xsl:text>
<xsl:value-of select="Title"/>
<xsl:text>"</xsl:text>
</xsl:element>
</xsl:if>
</xsl:template>
<xsl:template match="Type">
<xsl:element name="dc:Type">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Format">
<xsl:element name="dc:Format">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
B.2 XSL for Transforming from ABC to ID3
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:id3 ="http://www.id3.org/id3v2.3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="ABC">
<id3:ID3 xmlns:id3="http://www.id3.org/id3v2.3.0">
<xsl:apply-templates select="Event"/>
<xsl:apply-templates select="Resource"/>
</id3:ID3>
</xsl:template>
<xsl:template match="Event">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Output">
<xsl:element name="id3:UFID">
<xsl:value-of select="@id"/>
</xsl:element>
</xsl:template>
<xsl:template match="Context">
<id3:TDAT>
<xsl:value-of select="Date"/>
</id3:TDAT>
<id3:TIME>
<xsl:value-of select="Time"/>
</id3:TIME>
</xsl:template>
<xsl:template match="Act">
<xsl:element name="id3:TPE2">
<xsl:value-of select='Agent'/>
</xsl:element>
</xsl:template>
<xsl:template match="Input">
<xsl:element name="id3:TOAL">
<xsl:value-of select="@id"/>
</xsl:element>
</xsl:template>
<xsl:template match="Title">
<xsl:element name="id3:TIT2">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Rights">
<xsl:element name="id3:TCOP">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Resource">
<xsl:if test="@id=../Event/Output/@id">
<xsl:apply-templates select="Format"/>
<xsl:apply-templates select="Length"/>
</xsl:if>
</xsl:template>
<xsl:template match="Format">
<xsl:element name="id3:TMED">
<xsl:value-of select="."/>/<xsl:value-of select="../Type"/>
</xsl:element>
</xsl:template>
<xsl:template match="Length">
<xsl:element name="id3:TLEN">
<xsl:value-of select=".*60*1000"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
B.3 XSL for Transforming ABC to MPEG-7
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mpeg7 ="http://www.mpeg7.org/2000/MPEG7_schema/">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="ABC">
<MPEG-7>
<xsl:apply-templates select="Event"/>
<xsl:apply-templates select="Resource"/>
</MPEG-7>
</xsl:template>
<xsl:template match="Event">
<xsl:apply-templates select="Output"/>
<CreationMetaInformation>
<Creation>
<xsl:apply-templates select="Title"/>
<xsl:apply-templates select="Act"/>
<xsl:apply-templates select="Context"/>
<xsl:apply-templates select="Input"/>
</Creation>
<Classification>
<xsl:element name="Genre">
<xsl:value-of select="@Type"/>
</xsl:element>
</Classification>
</CreationMetaInformation>
<xsl:apply-templates select="Rights"/>
</xsl:template>
<xsl:template match="Output">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:template>
<xsl:template match="Title">
<xsl:element name="Title">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Act">
<Creator>
<xsl:element name="Role">
<xsl:value-of select='Role'/>
</xsl:element>
<xsl:element name="Name">
<xsl:value-of select='Agent'/>
</xsl:element>
</Creator>
</xsl:template>
<xsl:template match="Context">
<CreationDate>
<xsl:variable name="date" select='Date'/>
<xsl:variable name="my" select="substring-after($date,'/')"/>
<xsl:element name="day">
<xsl:value-of select="substring-before($date,'/')"/>
</xsl:element>
<xsl:element name="month">
<xsl:value-of select="substring-before($my,'/')"/>
</xsl:element>
<xsl:element name="year">
<xsl:value-of select="substring-after($my,'/')"/>
</xsl:element>
</CreationDate>
<Location>
<xsl:element name="PlaceName">
<xsl:value-of select='Place'/>
</xsl:element>
</Location>
</xsl:template>
<xsl:template match="Input">
</xsl:template>
<xsl:template match="Rights">
<UsageMetaInformation>
<Rights>
<xsl:element name="RightsId">
<xsl:value-of select="."/>
</xsl:element>
</Rights>
</UsageMetaInformation>
</xsl:template>
<xsl:template match="Resource">
<xsl:if test="@id=../Event/Output/@id">
<MediaInformation>
<MediaProfile>
<MediaFormat>
<xsl:apply-templates select="Format"/>
<xsl:apply-templates select="Length"/>
</MediaFormat>
</MediaProfile>
</MediaInformation>
</xsl:if>
</xsl:template>
<xsl:template match="Format">
<xsl:element name="Medium">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="Length">
<Length>
<xsl:if test="@units='mins'">
<xsl:element name="m">
<xsl:value-of select="."/>
</xsl:element>
</xsl:if>
</Length>
</xsl:template>
</xsl:stylesheet>
|