IRIS Home  
site map contact search  
data software stations and instrumentation  
You are here: IRIS > Software and Manuals > NetDC
 

NetDC manual [ back ] [ forward ]


9.0 PRODUCT SHIPMENT

NetDC has the important goal of ensuring that data processed at various sites finds its way back to the user in an automated fashion. In creating an interface where any NetDC site can be contacted to request data, NetDC has the responsibility to ensure that the user gets the requested data regardless of where the original request was directed. This necessitates effective request and product tracking and requires reliable data exchange mechanisms.

Networked Data Centers gives the user a choice between receiving data from individual data centers and receiving the data as a single merged product from the data center originally contacted. The former simply gives permission to the delegated data centers to forward their data products directly to the user. The latter requires automated information flow back to a single data center, as well as request tracking and data holding procedures.

At a given data center processing a request, a directory is created that becomes the home for tracking and storing all data produced. Requests are forwarded from here to other sites and temporary files related to the request are kept here as well. This directory is referred to as the user's "request directory". A request directory is created at both the hub site and the delegate sites for a given request.


Fig 9.1 - Layout of user request directory tree

As data products are completed, they are returned to the user directory and given a predictable file naming structure for easy organization. The file name pattern is this:

<DATA_TYPE>.<HUB_ID>.<DC_NAME>

where the DATA_TYPE is the kind of data in the file, whether it's inventory, response, or waveform data. The HUB_ID is the tag that is always present for any of the data products and remains static throughout the life of the networked request. This unique tag allows NetDC to track the progress of a request throughout the network as processing takes place. Every data center uses the same HUB_ID for a given request. The DC_NAME tag at the end lists the data center code name where the data product is produced. This helps to differentiate data produced at one center from that produced at another. Both products will share the same HUB_ID but they will have their own unique DC_NAME. Examples are shown here:

DATA.GEOFON:Mar_09,00:19:14:5975.GEOSCOPE
INV.IRIS_DMC:Apr_04,10:45:56:1123.ORFEUS
RESP.ORFEUS:Jul_23,13:45:32:12922.ORFEUS

Data files with the above mentioned naming structure will begin to appear in the user's request directory. The "check.list" file will note the arrival of these data files as they appear and change the status entry in the checklist.

If the user requested that the data be merged, the checklist will have entries with status PENDING. This means that NetDC is waiting for the data product to come back. Once the data has arrived, the status is changed to COMPLETE. Normally, the role of the hub site is to monitor incoming data from delegate sites, marking off arrivals as complete and waiting for all data to come in. When all delegate data products have been collected, the data products are combined together into one file and shipped to the user.

In the cases where the data is not to be merged, the "check.list" file will instead have all entries with NOMERGE status so that all delegate sites know immediately that they are to send their data directly to the user.

If too much data is produced for merging, the hub site has the option of refusing any further data be delivered to it and instead instructs the delegate site to send the shipment directly to the user and marks the "check.list" with NOMERGE. Products in the "check.list" file that are marked as NOMERGE are ignored with regard to the hub site waiting to send a merged shipment. Should there be some products COMPLETE as well as some products marked NOMERGE for a given data type, only the ones that are marked COMPLETE are included in a merge shipment. The others are assumed to be shipped to the user by the delegate site.

When it comes to shipping data products, each data type is treated as a separate entity. Inventory shipments are shipped on a schedule independent from waveform shipments and response shipments. If the data for responses should become available before waveform data, the user can be assured that the response data will be shipped without delay. Also of note: Data of different types are never merged together. You won't find response data mixed with inventory data. They are kept separate and only data of the same type is merged in a shipment. Should a user request response and waveform data, they will receive two separate shipment volumes, not one.

Upon shipment to the user, the data file, whether a merged data file or not, is renamed in the FTP directory to match the LABEL that the user specified in his request. The filename pattern in this case is:

<LABEL>.<DC_NAME>.<PID>

If a label was not provided by the user, the NetDC system generates a random tag and uses that as a label for the shipment. The labeling feature is meant as a convenience to the user receiving the shipment, since it results in a less-cryptic filename than what is used internally by NetDC and allows the user to specify a personalized form of organization to the data they collect. Once the user's requested data has been created in FTP, the contents are either sent to the user through email or the user is notified that the data can be retrieved. If so requested, NetDC can even push the data to the user's FTP site.

Once all data products have been shipped from a user's request directory, a flag file "SHIPPED" is created so that the NetDC system will later remove the directory. Request directories, for the most part, will remain cleaned up and not accumulate on the file system.

Over time, shipment files will accumulate in the FTP shipment directory as NetDC fulfills requests. It is a good idea to institute a separate cleanup procedure for this directory, removing files as they age past a certain number of days. More will be discussed on this issue later in the manual.

Datagrams make it possible to effectively enact product merging between a delegate site and a hub site. A certain protocol is followed that helps the transaction to complete effectively. First, when a shipment is ready to be sent to a hub site, the delegate site sends a message with the action word SHIPRDY. This tells the hub site that a delegate wishes to send it some data belonging to a certain user's request. This datagram looks something like this:

%%ACTION DATA::SHIPRDY
.HUB_ID GEOSCOPE:Feb_23,22:54:23:1176
.NAME Joe Seismologist
.EMAIL joe@host.seismolab.edu
.DELEGATE ORFEUS
.SHIPTO netdc@ipgp.jussieu.fr
.REPLYTO netdc@knmi.nl
.SIZE 23 KB
.MEDIA FTP
.DISPOSITION_USER PULL
.DISPOSITION_HUB PUSH ftp.ipgp.jussieu.fr /pub/netdc
.FILENAME joe_request_1.ORFEUS.3498
.LABEL joe_request_1
.END

The receiving site will take this information and decide to either send back a RCVRDY datagram, which tells the delegate that it's okay to send the data, or it will send a NOMERGE datagram, which tells the site to send the data directly to the user. The NOMERGE override will usually occur because the product being sent by the delegate is too large for the hub site to accommodate. If the hub site sends the RCVRDY notice, the datagram will contain all the same parameters as the SHIPRDY datagram

Once the delegate site receives the RCVRDY datagram, it initiates the file transfer, either through email or through FTP, depending on how the parameters were set. In both cases a SHIPMENT datagram is sent. Data being sent by email will be attached to the SHIPMENT datagram. The data is extracted by the hub site's NetDC routines from the email message and written to the appropriate request directory. If the data is being sent by FTP, the delegate site actually pushes the data to the hub site's anonymous FTP directory, based on the instructions provided in the DISPOSITION_HUB parameter and using an automated FTP client. Once the data has been placed at the hub site, the delegate sends a SHIPMENT datagram telling the hub site to grab the data file.

Once the hub site has confirmed that it has received the data, it will send a datagram back with the message RCVOK, allowing the delegate site to free itself from that request, enact cleanup routines, and so forth. Should the data not get to the hub site intact, it will send a RESEND message to the delegate site to have it try again. Only one RESEND will be attempted before failure. After a second failed attempt to merge, the hub site will instruct the delegate to switch to NOMERGE and send the data directly to the user. This is intended as an emergency fallback should transfer of data for merging fail.


Fig 9.2 — Diagram of merge shipment protocol between two sites.

The user receiving data can also take advantage of some level of automation in the way data is received. The DISPOSITION line in the NetDC request allows the user to request that the data is pushed his or her site through FTP, as opposed to having the user get the data manually. Note that this only applies to FTP shipments since email shipments are by default pushed to the user.

As indicated in an earlier chapter, the DISPOSITION line accepts one of two modes. The default mode is PULL, which requires no additional parameters and merely states that the user will grab the data manually from the hub site. The other mode is PUSH, which accepts two additional parameters. The first is the name of the FTP host that the user wishes the shipment to be transferred to, and the second is the anonymous FTP directory where the shipment is to go. If either one of these parameters is not present or prove untenable, the transfer fails and NetDC falls back to PULL mode, simply notifying the user through email of the shipment and telling them where to pick the product up at the hub site. An example PUSH directive looks like:

.DISPOSITION PUSH ftp.gfz-potsdam.de /pub/dropoff/netdc

Another feature of NetDC is the facility to have the user declare a maximum duration for product merging. This is specified as a number in the MERGE_DATA field of the request following the "YES" flag which specifies the maximum number of days that NetDC will wait for incoming data before sending a data shipment to the user.

.MERGE_DATA YES 3

At times, certain delegate sites will be unable to deliver their data for request merging in a timely fashion. NetDC by default waits for all the delegate data products to be accounted for. Should even one site be unable to deliver its data, NetDC could find itself waiting indefinitely. With this timeout feature however, the user is able to get as much data as possible in a suitable amount of time. The maximum number of timeout days is 90, the default number is zero, which implies same-day delivery.

The routine that carries out this timeout feature is called "shipment_handler" and is meant to be run on a daily basis. It runs through all of the current user request directories and checks to see if the request has expired. At the same time, it cleans up any directories that have been flagged as SHIPPED, which means that the request has been satisfied and the products shipped off to the user.

If "shipment_handler" finds that a request is overdue, it bundles up any products that are currently present and sends off merged data products to the user. The directory is then flagged with a "SHIPPED" empty file so that it will be cleaned up on the next pass of "shipment_handler". Delegate sites with overdue shipments that send a SHIPRDY datagram will be notified with a NOMERGE datagram in reply.


NetDC manual [ back ] [ forward ]

introduction •• overall concept •• request format •• request reception and delegation
datagrams •• local request processing •• inventory requests •• response requests
waveform requests •• product shipment •• installation and setup •• writing interface code
troubleshooting •• future implementations •• conclusion •• appendix A - summary of NetDC datagrams

About IRIS | Members | Programs | USArray | Seismic Monitor | Earthquakes | SeismoArchives
Mailing Lists | Stations | Data | Software | Publications | News | Contact | Site Map | Search

Incorporated Research Institutions for Seismology
1200 New York Ave NW • Suite 800 • Washington DC 20005
Phone: 202.682.2220 | Fax: 202.682.2444

Data Management Center
1408 NE 45th St. Suite 201
Seattle, WA 98105
Phone: 206.547.0393 | Fax: 206.547.1093
PASSCAL Instrument Center
100 East Road • Tech Industrial Park
New Mexico Tech • Socorro, NM 87801
Phone: 505.835.5070 | Fax: 505.835.5079

Send comments to the