I've finally got my hands on the tomography code I will be using to make my P-wave tomography model. It's written almost entirely in Fortran and is what my mentor describes as "research code". Essentially meaning that it is far far far from bug free, and tends to need rough fixes and minor adjustments ALL the time. It's been an interesting experience these past couple of days working through the procedures, as I feel entirely ill-equipped to deal with Fortran/Unix error messages. I have found myself saying, "No, terminal, sorry. I don't know what caused a segmentation fault. Better go get some help with that." And since this code is essentially just borrowed from a fellow seismologist, there isn't a wealth of information regarding error messages. I am slowly learning why things are breaking, and where I should be looking to fix them, but it's largely a huge puzzle with very few pieces put together for me currently. Here are a couple things I have learned/completed so far:
1. Tomography geometry is complex.
Tomographic models rely on the geometry you determine to run correctly. There are a thousand ways to mess this up, and a thousand places you have to go in and update the model. The geometry of this model relies on both a coarse and a fine "grid". I calculated the size of these grids based on the locations of our stations and a guess of the size of the boundaries needed to contain all the waves. Because seismic waves aren't coming in on straight lines, your model has to be a bit bigger than the area of your stations. This is really just a guess, and can be edited later if something is incorrect. The fine grid spacing that fits within the spacing can be as complex as you want to make it, although mine is pretty simple. It's also important to note that the more detailed your geometry comes, the larger the output files are. And for a program that already takes over five hours to run, size matters if you want to run the code multiple times. My final models won't be perfect, but they will hopefully improve as I run the code additional times.
2. Workflow is so so important.
This seems self explanatory, although it is so easy to become too relaxed on. The tomography code I'm running consists of roughly 6-8 steps, depending on what you count as an actual step. Each of these steps comprises of multiple subsets. Each of these subsets probably consists of manipulating data or files. Needless to say, it's quite easy to get lost in code. That is if you can manage to not get confused in the Finder window. Knowing what steps have to be completed in what order, and where the necessary files are for each step is something that I am currently trying to keep track off. As I worked through the process the first time with my mentor, we chugged along happily. Now that we have hit a significant snag in the process, I am finally taking a breather and retracing my steps to ensure that I can repeat them when I head to Purdue next week and won't have Dr. Pavlis down the hall from me. It's also tricky because there is a balance to be struck between fixing inefficient workflow, and leaving things be because they already work.
3. "Research Code" is nasty because it works.
After three days of consistent debugging, editing, and fixing problems, I finally asked my mentor why someone didn't take the time to update the code. Not that Fortran is completely outdated, but its rigid tendencies can be frustrating. I learned that there are three primary reasons for keeping the code as is. Firstly, it's flexible. There are so many inputs and different ways to approach the problem that a "rough around the edges' code is in some ways better. It isn't pretty, but you can fanagle with it until it works for what you want to do. I don't really know if "fanagle" is an word, but it adequately describes the process. Secondly, the program is massive. The code I am using has been worked on longer than I've existed. Or close to it. It is tens of thousands of lines of code at least. To rewrite this in a different language would be a massive undertaking, and beyond the scope of something you could stick an undergraduate intern on. Simply put, if it works, don't fix it. Finally, it's fast. The code has been designed to some level of efficiency. It isn't always feasible to update code if the efficiency goes down. And in this case, it would plummet.
4. You will make mess up, probably critically mess up.
Obviously, as an undergraduate intern working with code way way beyond my comprehension level it was inevitable that problems would arise. However, even simple mistakes are made all the time that force you to backtrack and redo your work. It turns out all the data I've analyzed for the first two weeks was missing all of the Earthscope TA stations because of a simple database error my mentor made. Neither of us caught it until I had finished analyzing the 2013 data and as a result, I have to go back and reanalyze roughly 90 events. Such is life working with such large data sets. For me, it's a great opportunity to check my work before it is inputted into the tomography model. Thankfully, a mode exists in dbxcor that allows me to quickly sort through and find the relevant events to fix. This will cut the time spent analyzing by over half. It also works out because I am able to reanalyze the events while Dr. Pavlis attempts to debug the tomography code.
Those are just four lessons that I've learned over the past couple of days, along with many more that I could spend hours typing out. My work moved from something I understood fairly well to something I know nothing about in the matter of hours. It's been a blast working outside my standard comprehension level. Maybe I'll come out as a Fortran debugging wizard, who knows! Probably not, but I'll at least be an apprentice.
Until next segmentation fault,
Bradley
You must be logged in to post a comment.