Data Tape Organization

Tape format

The data were originally written on 4 mm DAT tape using the UNIX tar command from a Sun workstation running Solaris 2.7.  The specific command used to generate the tape was

tar cf /dev/rmt/0 .

The layout of directories under the "." directory from which this tape was produced is as follows:

navy_report contains files used to create an unpublished report on the results of these experiments to NSWCC

data_report is contains the electronic form of this report

css3.0_format contains database tables and waveform files in the css3.0 schema with some special extension tables.

logfiles contains Reftek logfiles produced when ref2segy was run on raw field tapes

pcf_files contains PASSCAL pcf_files used to make clock corrections on these data

segy contains segy disk images of these data importable into common reflection processing packages

CSS format data

The data in this css3.0_format directory contain the most complete database of parameters from this experiment.  The data in this directory can be immediately manipulated by software in the Antelope package from Boulder Real Time Technologies  (http://www.brtt.com) and the older, public domain version of many of the same programs that was called Datascope which I believe is still available from IRIS.

The data in this directory have all the standard CSS3.0 tables.   Note that all the 4.5 Hz sensor data use the Seed channel codes EHZ, EHN, and EHE.  The accelerometer data are tagged GHZ, GHN, and GHE which DMC people told me was the correct code for strong motion accelerometer data.  The response directory along with the associated CSS3.0 tables related to responses are correctly built so one could, in theory, build a SEED volume from these data.  I considered this an absurd format for these data, however, and chose not to do so.

The station names have a pattern that is also important to know.

  1. The "100" line stations (100, 101, 102, ..., 126) are stations on the linear profile running through array 1 shown in  Figure 1 .  This line of stations radiates roughly east from the shot point with the lowest number closest to the shot point.
  2. The "200" line stations (200, 201, ... , 216) are stations on the linear profile running roughly southward from the shot point.   Station 200 is again the closest to the shot and 216 is the most distant station on this line.
  3. The "300" line stations begin at a point approximately straight north of the shot point and run westward, across the lake, through array 3, and to the west outside of the mined area.  Again 300 is the closest station to the shot point.
  4. Arrays 1 and 2 used a common 24 element array geometry illustrated below.  The naming conventions is nAm where n is the "array number" (1 or 2), A is the arm (i.e. A, B, C, ..., H), and m is the index position of that station relative to the center of the array.  For example, 1B2 is the station just below and to the left of the "B" in this figure for array 1.  Note that the actual geometries for arrays 1 and 2 are rotated relative to this figure. 
  5. Array 3 was a "grid array" with points located on an approximately uniform grid with 10 m between stations.  Due to a surveying error the actual geometry is a skewed rectangle.  The naming convention is XnYm with n and m being index positions relative to an an grid origin, X0Y0, in the southwest corner of this array.  Thus, for example, X1Y2 is 1 grid point east and 2 grid points north of X0Y0.


I've included the arrival table for picks on most of the seismograms in this database to aid future users of these data.   I used some nonstandard tags on these phases that need to be explained:

  1. The largest number of phase picks are P.  P is the dominant P phase, but not necessarily the first arrival as would be the case by standard convention.  P here is an intepretation by me.  It is my best guess of a phase that propagates through the mined out area of Glendora as a head wave at the bedrock interface.  Outside the mined out area it continues as a similar entity traveling through the bedrock, but the upper layer thins to the natural weathered layer.
  2. In the mined area I pick a phase tagged "Pd" which is mnemonic for P direct.  It is an interpretation that is not certain to be correct.  In Poppeliers and Pavlis (2001) we argue this strong arrival is the direct wave traveling through the water saturated mining spoil.  We originally interpretted this phase as a strong P to S conversion, but we became convinced it was a direct wave on the basis of two lines of evidence:  (1) the polarization of this arrival is not consistent with a refracted S arrival, and (2) a noise test conducted independently (not included in this data set) found the velocity of water saturated mining spoil was consistent with that observed for this phase.
  3. There are a few picks on the far end of the 100 line stations labelled Pr.  These are what I interpret to be a secondary head wave generated at the contact between Pennsylvanian age sands and shales and Mississippian carbonates that conformably underly the Pennsylvanian rocks in this region.
  4. There are a fair number of "A" picks for airwaves.   Most of these are on the the G channels when I got interested in the strong air wave signals recorded on some buildings. (Look for example at the BARG on evid 5 and 9 which were large explosions shot very shallow.  The Barge is a floating building in the middle of the lake that got hammered by the airwave from these shots.  For example, the airwave generated accelerations were about twice the amplitude of the seismic wave amplitudes for evid 9. )
  5. There are a few "R" and "L" picks lying around the database.  These are mostly junk left from experimenting with pick schemes for portions of the surface wave train.  They are what I interpretted as Rayleigh and Love modes respectively, but they should not be taken too seriously.
This database also contains a nonstandard table that is not part of CSS3.0.   I defined an extension table to CSS3.0 I called "shot" (contained in the file "glendora.shot") that contains basic parameters of the shots that were recorded here.  The schema descriptor for this table can be found in the same directory in the file named "shot_mods".  The shot table documents shot size, shot depth, origin time, and shot location for each of the explosions that were recorded here.  The user should note that the origin times are not accurate as they were constructed as a fixed  time offset from the pick at the closest station.  This table should be directly visible with the Datascope/antelope program called dbe.   However, because it contains information not stored in the segy format files (see below) the contents are tabulated here for those unable to utilize the CSS3.0 files:

Note that dnorth and deast in this and the similar site table are GPS measured offsets from the GPS reference station located near array 3.  The absolute latitude and longitude of this origin is less reliable because it was not made from a differential measurement.  It was, however, produced from an average estimated by the Trimble surveying software for the reference station.  The actual accuracy of this measurement is not known, but the shot and station locations are known to a precision smaller than the size of the sensors we were using.   The accelerometer locations are slightly less accurate.  It is slightly worse in most cases because the GPS unit failed to lock near these metal buildings due, presumably, to strong multipaths induced by reflections from the buildings.  We located points as close as possible and corrected the final locations with a simple tape and compass measurement.  The nominal accuracy of this is probably about 20 cm, which is within the last significant figure of the tabulated data.  Finally, the reader should recognize that the shot size here is kilograms of C4 (the explosive used for these tests by the Navy) and the edepth is the water depth in meters.   Everything else is standard CSS3.0.

A final point about this component of the data is that we accidentally recorded a nearby mining explosion as we were pulling stations out on the morning of day 255.  Data from this mining explosion are found only in the CSS database files.  I did not write this event to a segy file.

SEGY disk image files

This is an active source data set that some may find useful to work with using one of several standard seismic reflection processing packages.  For example, there is a fascinating problem here with surface wave propagation.  I experimented briefly with f-k filtering in ProMAX and found a clear reflection of surface waves from the lake shore.  There are probably a number of other similar bounce phases off the walls of the mined out region.  Because this data set is not very large by modern standards I elected to also supply these data in what I will call segy disk image files.  That is, these are bitstream versions of a segy tape stored on disk rather than individual seismogram files as used, for example, in PASSCAL segy.  By "bitstream" I mean an image of a SEGY standard tape without the record marks.  This means the file starts with the EBCDIC (this is actually filled with 3200 null bytes by the db2segy program used here) and binary reel headers followed by fixed length segy trace records.  The distinction is that there are no record marks on a bitstream and position in the file can only be maintained by counting bytes from the beginning of the file.  In addition, we violate the original SEGY standard by using IEEE four-byte floats in place of the defunct IBM floating point format that defined the original standard.  This format is known to be readable in the Release 98 version of ProMAX using their SEG-Y Input module by selecting "Disk Image" and IEEE float in place of the defaults.  I expect other packages to have similar capabilities.

Because of the geometry of this experiment the data don't really logically fit in a single multichannel framework.  That is, I couldn't think of any rational way to assemble all the data into a single segy image that made much sense.  Consequently, I created not one but five segy disk image files with the following contents:

  1. The file line100_300.sgy contains data from the two linear profiles I referred to as the 100 and 300 line above.  The order in the file is 319, 318, ..., 300, 100, 101, ..., 126.  This was done because these two "lines" actually make a continuous profile with a minor bend at the 300 to 100 transition.
  2. The file  line200.sgy contains the 200 line data in order 200, 201, ..., 216.
  3. The files  array1.sgy, array2.sgy, and array3.sgy contain data from arrays 1, 2, and 3 respectively.  Note that because these arrays did not record every shot, they contain only relevant data and are NOT filled with null shot records.
I chose to not put the accelerometer data into a segy format as I assumed no one would want these data in a multichannel format since the sensors have a drastically different response from the L28 data and are widely seperated from each other.

The five segy image files were produced from a program called db2segy which is a C program I wrote for this purpose.  It is now being distributed by Boulder Real Time Technologies with their Antelope software as contributed software.  db2segy creates a segy disk image according to a recipe defined by it's input "parameter file".  The parameter file for db2segy for each of the five segy image file is important because it defines the way css3.0 station: channel codes should be mapped onto channel numbers, which is all that segy understands.  In theory, one could get all they want out of the shot and receiver coordinates written in the trace headers by db2segy, but to make life a little easier for potential users I'm supplying two additional sets of files:

  1. The data/segy/pf directory contains the parameter files used to drive db2segy for each of the five segy image files.  They have names that have an obvious association (line200.pf, array1.pf, etc.)
  2. The data/segy/db2segy_output directory contains the standard output stream when db2segy was created.  The output of db2segy shows how each station, channel, and shot number are mapped onto the segy disk image file.
Note that for all these files I chose to effectively sort the data by channel first.  Specifically, in every case if there are n stations in the given file then the east channels are the first n channels, the north components are the next n+1 to 2n channels, and the vertical are the last n channels.