Searchable Product Archive and Distribution Engine
The Searchable Product Archive and Distribution Engine project (SPADE,
nee UPDS) at the DMC provides a permanent, searchable archive for XML-encoded
data products. The products can be essentially any XML document,
and can be searched by any of their fields that have been identified
as searchable fields. SPADE provides a single uniform web services-based
tool to query and access all manner of scientific data products.
The system consists of a pair of servers for submission and querying.
Product documents are submitted using a web service client to the Submission
server, which extracts searchable metadata into a relational database
and archives the XML document in its entirety. The web services-based
API supports both a stand-alone Java GUI client and a web browser interface
available to query the archive.

Figure 1: SPADE System.
To search the archive, one first selects a product type to get the searchable
fields for that product. After choosing the product, the user can enter
query constraints for that product type. Each product type will
have its own set of searchable metadata fields.

Figure 2: Selecting a Product Type.

Figure 3: Entering queries.
The query will return a list of products that matched the specified filter
criteria, allowing the user to view them directly or have the products
packaged and downloaded as a group.
Currently queries are only available by product type. That is, you can
only query for one type of product at a time by the metadata that is available
for that product type. In the next release, there will be an expanded set
of common metadata fields available with which to search across different products. For
example, one might search for all products that relate to a certain geographic
region. Common metadata fields will include latitude and longitude extents,
geographic region information, time extents, keywords, Dublin Core, and others.
The system is structured so that new and as-yet unforeseen product types
can be added to the archive with minimal effort. To add a new product
type to the archive, a configuration document is created and registered
with the system. This
configuration document describes the product’s searchable fields and is
used to create the database tables, guide the metadata extraction, and to build
the query page. Currently, creating this configuration document is a manual
process, similar to creating an XML Schema document. In the future we will provide
tools to simplify and at least partially automate the process. The products
can have essentially any XML structure, as long as they conform to a minimal
set of system requirements, particularly the inclusion of source and product
unique IDs.
The archive is populated with over 280,000 products, including over 250,000
XML Hypocenters, over 21,000 Harvard CMTs going back to 1962, PBO strain
data, and XML FARM products. Some of these products are experimental and will likely
change before the final release. SPADE is currently in beta release, with
a 1.0 release expected first quarter, 2007.
More information can be found at http://www.iris.edu/spade or
by contacting Linus Kamb at the IRIS DMC, linus
at iris.washington.edu
Submitted by Linus Kamb, IRIS DMC
|