For my project, I'm working a lot with Matlab program to perform cross correlation over huge amounts of continuous data. The technique implemented in the program is called template matching, and it entails cutting out small five to ten second slices of a waveform that shows a single event and trying to find sections in the continuous waveform that look similar enough to count as a match. Matches in the same places on different stations suggest the occurrence of a true seismic event, which is then noted and located and cataloged for some later analysis. To pick out the templates, I'm chopping up data using SAC, a program currently being maintained by IRIS. I'm still deep in the data manipulation (and even extraction?) phase, so I haven't had much mapping to do. However, I definitely see GMT in my future, and possibly other graphics tools as well when it comes time to start displaying my results.
Big data = big problems...
Sorry for using such a ridiculous buzzword. I'm not working with exabytes or anything here, but my data set is huge and more awkward to handle than the three large suitcases I had to single-handedly drag from the baggage claim to the taxi line outside the San Jose airport. Maybe it's because the siesmic signals I'm working with are in the .sac file format and thus require learning how to work with SAC in order to do anything useful with them, or maybe it's because I'm working with recordings from three channels of seven stations that sample at 40 samples/per second for a little over one month. That means somewhere around 2,177,280,000 samples. Wait, that actually doesn't look that bad written out! It's not as much as I thought! And it's not like the raw data is really 2 gigs anyway; it's just clunky. I'm actually feeling much better about all this now. These blog posts really are quite helpful.
My data set was collected using a "completely horrible mish-mash of things," as described by one of my officemates. A network of temporary seismic recording stations were deployed across southern California by some students in my research group about two years ago. The network consists of five types of broadband seismometers from three different companies, which, now that I think about it, helps explain why the appearance of the signals vary so greatly from station to station. Since this network was deployed by my group here at Stanford, the data collected from the stations in the network has neither been seen nor used by any other researchers.
I'm working with the raw, continuous signals picked up by these stations to find events that occurred in a seismic swarm about one year ago. Working with raw data gives me the confidence that I'm not missing any important information, but it also gives me a little bit of anxiety that anything that happens to this data set is definitely my fault. It's like babysitting a six-month-old child. It's not going to hurt itself, right? If I turn my back to pour a cup of tea and the dog sits on it, it's totally my fault! The raw data thing is kind of a major responsibility, but I'm excited to see what I can find.
Just for fun, let's try another number dump. Using pattern matching code, one template on one channel of one day of data gave me 600 potential matches, and another gave me 360. To make this fair, let's underestimate a bit and say the average pattern will find 300 matches per run. That means running my 290 templates over all the channels over the course of the month could potentially find 54,810,000 events, or more! That was really just for fun, though, because there's no way I'm letting that happen. As if there's even enough heap memory to allow for all those plots, am I right? Back to work on the code I go! Is it normal for all the wiggles to start moving after a while?
With my first week of frustrating network issues and slow data downloads under my belt, I've begun to finally understand what it is I'm supposed to be doing here. I still feel like a complete newbie, but I have an outline of a real project in my mind and the support of the amazing community here at Stanford Earth Sciences, and I'm ready to start making some progress. My project entails working with a month's worth of contiuous seismic recordings from various stations in southern California in an area where seismic swarms are not uncommon. After just a few days of data collection, I was able to see that detecting events in this huge mess of information wasn't going to be an easy task (and it became much more daunting when my advisor told me he anticipated possibly thousands of events to show up in the data). I'm a math and computer science student, but that doesn't mean I won't get overwhelmed by 600+ plots of potential earthquakes blowing up my monitor just from one run of a pattern matching program. That, by the way, just happened.
Goals for the Summer:
William Shakespeare said, "The very substance of the ambitious is merely the shadow of a dream." I'm starting to see what he meant. Whether or not my some of my goals are actually too lofty for one to truly achieve in ten weeks here in this beautiful corner of California (I'm looking at you, 7), I have positivity in my mind and an iMac at my fingertips (and kale chips in my lunchbox). I'm ready to get a move on with that list.
Have you ever played that game where one person will say a word or phrase and you have to say the first word that comes to your head? Like if you said David Bowie, I'd say glitter, and if you said Miami Vice, I'd say tan suit. Dirt would be my response if you said Socorro, New Mexico. As a born coastal North Carolinian, I'm no stranger to sand and dried sea salt covering me and my belongings after a day out on the water, but here in the dusty land of infinite tumbleweeds, the dirt is an entirely different beast. Shoes, backpacks, shorts, and even contact lenses became slightly more brown every day, and it became increasingly difficult to distinguish between a dirt line and a tan line after each hour of hiking or field training.
While the New Mexican devil-dust that somehow knew how to find it's way up my nose was a whole new business, I think I managed to take what I'd learned the hard way from many windy days on the beach and twist it into a number of relevant solutions for dealing with the dirt. I remembered to not eat my apples facing the wind and to apply sunscreen early so it wouldn't be sticky when I went outside. The American Southwest is one of my favorite regions in the world, and while it presents me with a number of new elements that I wouldn't usually encounter at home, I wouldn't have it any other way. The dirt here in Socorro wasn't worse than east coast beach sand, it wasn't harder to deal with, and it wasn't any more uncomfortable (once I took my contacts out). It was just different, and amongst the sun-bleached cow skulls and twisty desert trees with knots that could make Nicki Minaj jealous, I learned to adapt.
Obviously, I can't just talk all about dirt here. Let me bring out the metaphor so things make a little more sense, but first, some context: I am a math and computer science double major at North Carolina State University. I understand basic earth processes, and I learned a lot about basalt and vesicular rocks in Geology 101. My geophysical knowledge is limited, but I know how to handle coding, modeling, and the other things that I do. In the same way the dirt in Socorro caught me off-guard, the material taught here required some rapid adjustment in order to work my way up the learning curve as quickly as possible, but I've really enjoyed drawing connections from what I know best to a completely new subject. I have a great California summer ahead of me to master all the challenges presented this week in Socorro, but I have my feet on the ground and my Terminal open and ready to rock. I'm ready to handle whatever the high-altitude winds blow my way, as long as it's not dirt onto my sandwich again.