Interwoven MetaTagger
As organizations seek ways
to efficiently leverage the ever-increasing amount of information
available on the Web, it is becoming crucial to develop methods
that make content "intelligent". By making content
intelligent, organizations can get content to potential audiences
that is appropriate and relevant to that audience. This ability
is key to obtaining maximum value from search, personalization,
syndication and portal applications.
Interwoven MetaTagger automates
the process of adding intelligence to content by creating
rich, descriptive metadata from articles, documents, or Web
pages. MetaTagger automatically categorizes these content
items by subject and then identifies them so that they can
be linked to other relevant information. This next-generation
content intelligence solution provides applications that enable
content creators to categorize their content interactively
and allows developers to create procedures for recognizing
and classifying content automatically.
Based on industry-standard
or custom controlled vocabularies, MetaTagger can suggest
appropriate metadata for content. Through either a semi-automated
or fully automated approach, MetaTagger is able to add intelligence
to content for use later in run-time search, personalization,
syndication, and portal applications.
Controlled Vocabularies
MetaTagger utilizes controlled
vocabularies to provide precision and consistency in tagging
metadata. The human expertise that goes into building a controlled
vocabulary ultimately enables more accurate metadata to be
automatically applied to assets. Because they are controlled,
the vocabularies ensure that arbitrary metadata cannot be
applied to content. Controlling the vocabularies ensures consistency
in metadata across all assets.
Support is provided for use
of multiple vocabularies, be they industry-standard or custom.
Included with MetaTagger are three vocabularies: Public Companies,
Geographic Locations, and Industrial Codes. MetaTagger vocabularies
are expressed in XML which enables the import of any new or
existing custom vocabularies.
Content Classification and
Recognition
MetaTagger provides for the
categorization of text according to multiple schemes. It uses
a training set of pre-categorized texts in order to learn
what words and phrases occur in the various categories. By
analyzing the content, the asset is then tagged with one or
more subject categories that are appropriate. MetaTagger recognizers
automatically scan content and identify words or phrases such
as products and services, company names, persons, and locations,
that match entries in the controlled vocabulary. Assets are
then automatically tagged with all matches in the vocabulary.
Vocabulary Search and Browse
MetaTagger enables users to
search or browse for terms in any given vocabulary. This is
critically important in a semi-automated model, where MetaTagger
automatically suggests metadata and an author or subject-matter
expert refines the metadata based on personal or organizational
knowledge. Users can add to, remove, or replace automatically
suggested metadata with the results of a search or browse.
Automated Processing
MetaTagger can be configured
to automatically apply metadata to assets in both a semi-automated
and fully automated manner. The fully automated mode provides
organizations with a robust, efficient, and rapid mechanism
for applying metadata to either current or legacy content,
incoming syndication feeds, or disparate corporate assets.
TeamSite itself leverages MetaTagger automation within workflows
to assign metadata to assets without necessarily involving
humans in the process.
|