|
Servicing
Data Requests at the DMC - A View From the Engine Room
By
now, it is probably well known that the IRIS Data Management
Center manages a lot of data from approximately 60 different
network operators. About 10 Gigabytes is ingested into our
holdings daily. To date, we have nearly 12 Terabytes stored
either in our front-end RAID system, which is online, or in
our 50 Terabyte, near-line mass store. This near-line system
is tape-based and allows us to efficiently recall data from
our dual sort archive, based on how a user requests data.
We store all data either in a time-sorted filesystem or a
station-sorted filesystem. The advantage to this dual-storage
is its use as an efficient method to either recall entire
continuous days of data from one station or assemble all data
from all networks for a given time into a singular SEED
volume for distribution.
Many
of you reading this article have requested data and are quite
familiar with how to format a request. What I would like to
point out is that the nature of how we receive data and how
you can access these data is changing.
The
IRIS DMC processes many "customized" requests each
day. These requests are considered customized because, through
a user-defined request format (BREQ_FAST),
the data returned to the user is a close fit to exactly what
they need. We have a web interface to our database so users
can find out exactly what data we have in our holdings. This
interface is called SeismiQuery
(see the Data Access article in this
issue for details). We highly recommend that requesters use
this interface before submitting their request to facilitate
request processing.
Currently,
the best utility for generating a well-constructed BREQ_FAST
request is the WEED utility - an xWindow-based application
that can be downloaded from ftp.iris.washington.edu/pub/programs
(see WEED
manual for details). WEED allows a researcher to explicitly
define the parameters that can be used in time-windowing data
and contains the tau-p software that calculates predicted
travel times therefor minimizing the pre-event or post-event
data that might come back to the user. It is of major benefit
for users to ask for data that is subset as small as possible
because we have limitations that prohibit the generation of
very large SEED volumes (like the normal UNIX limitation of
files being larger than 2 Gig). If we get very large requests,
we may ask that the author of the request resubmit smaller
requests, or we may split the request somewhat arbitrarily.
We
have controlling features built into our processing routines
so that no matter how many requests we receive at one time,
we can efficiently process each one. The system we currently
use is - for the most part - first-come, first-served, but
we also take the size of the requested data volume into consideration.
The smaller requests get processed more quickly and are fully
automated. We process over 90% of the customized requests
in a fully automated way; from receipt of the request to the
e-mail sent to advise the user that the SEED file is ready
to download. If the SEED file is very large, we transfer the
volume to a requested media tape and mail it to the user.
There has been a lot of effort the last year to bring data
to the DMC closer to real time. We now have a Frame Relay
circuit installed at both of the IRIS
Data Collection Centers that contribute the GSN data to
the DMC. These data are forwarded to the DMC and are archived
automatically. Because data can now be brought into the DMC
more efficiently, we will soon be able to generate event-oriented
SEED volumes in the FARM
holdings that are much closer to real-time and can be updated
automatically when new data arrives. There will be more on
this subject as we progress with the implementation but users
should be aware that currently, 12 different networks provide
data to the DMC either via ftp or frame relay circuits. We
believe that the efficiency of the disk transfers should be
very helpful in acquiring data for users in a more timely
manner.
The
Operations group at the DMC consists of 4 employees:
- Rick
Benson - Director of Operations
- Anh
Ngo - Senior Data Technician (highlighted
in this issue)
- Stacy
Fournier - Data Technician
- Mary
Edmunds - Data Technician
If
you have any questions about operations or making a request,
please feel free to write to Rick Benson.
See
also: Data
Access Tutorial
submitted
by Rick Benson
|