Working with an Algorithm
Sometime over the past 6 months, I discovered in me a love for how things are computed. Algorithms in general are some of the more interesting things to be, because their intention is to take the data one has and create a useful model so that statements and predictions can be made.
When one can boil down a lot of numbers into a summary set of statistics, I can be interesting. Normally that just involves talking about the center and spread of the data, but those simple one don’t seem to be that interesting. At one point in one of my many statistic courses taken, a professor made a comment that the interesting part of the data isn’t that which conforms to expectations, rather the outliers in the data. Outliers are the part of data where discoveries are made.
So right now, I have been learning algorithms that are used to generate the estimates of a model (least-squares or maximum likelihood) which are pretty much the standard of computational statistics for 60 or more years. Most of these algorithms are well versed and have a lot of modifications to them to deal with more modern data problems, while a handful are not that well covered.
Dr. Scott at one point was talking to me about a few algorithms that dealt we decomposing the data into a sufficient statistic, and then how to update the sufficient statistics with new information, instead of spending time retrying to compute estimates. Originally this concept was important with regards to a military application of the Second World War, where the need to create estimates on artillery without having to recomputed complex and difficult formulas. The Givens Algorithm was so developed to update sufficient statistics with a new observation and then appropriately modify the estimates. With the invention/inclusion of the computer, the algorithm was ignored (sort of). So the few papers written on it were not referenced often and are somewhat hard to get a copy of.
With the inclusion of computing into the business world of the 60’s/70’s, the Givens resurfaced for a while and there were a few more papers written in the mid-70’s that try to explain it, but in effect only talk about what they were using it for. Again the technology was so far ahead of the research/programs that the need for updating algorithms almost seemed ridiculous as there were faster and faster machines that took less and less time to just run thru and entire update. Now we have times where we are talking about data that can be in the terabytes of data, and to update estimates of that based on new data, does take time.
So I started off learning about the Givens algorithm for self-enrichment purposes, then came along a professor, Dr. Tolley, who had been working with mass-spectrometer data and thought of cool ways to look at the data, one just happened to be dealing with the Givens algorithm. I was asked/tasked with getting it to work, and then told at some point that what I did with the data would be a lot of work and would qualify for the project for my masters.
I have been over the last several months coding up the algorithm and then testing it. I found problems with the coding, and each time I have tried to get better source material to see if I can find where my problem lies. Yesterday afternoon, I asked for help from Dr. Scott as I was going to stop working on a computer and go to a whiteboard and hand work thru the entire algorithm and then see if I could get my program to do the same. Needless to say, the problem in the computer code showed up in the attempt on the whiteboard, and I was stuck. Dr. Scott came in and looked over what I had done, and couldn’t get it to work out right. It was a frustration.
I looked over my calendar and my notebook, trying to find notes I had made and reference I had come up with, and took the time to figure out how much I have spent working on this problem [best guess close to 80 hours on the Givens algorithm alone] in the past couple of months. Some of that has been as I sit at home with nothing better to do, and other is at work while I am multi-tasking myself.
I made a comment a while back that I find myself more and more in my office when I should be other places. I have come to the realization that I am in my office more and more, because it easier to work here on what I am thinking about than it is to sit idly at home. Also I think I am using it as an excuse not to have to do anything in my social life if I am busy with my academic/work life. Oh well, maybe soon I will figure out what is wrong and be done with this particular problem for a while.
~u