Thursday, March 28, 2013

Playing with earthquake data

This is a little example inspired by Jeff Leek's Data analysis class. Jeff pointed us towards some data from the US Geological Survey on earthquakes. I believe these are the earthquakes for the last 7 days.

According to Nate Silver, analysis of earthquake data has made fools of many. But, that won't stop us. Let's see what we can do with it.

fileUrl <- "http://earthquake.usgs.gov/earthquakes/catalogs/eqs7day-M1.txt"
download.file(fileUrl, destfile = "./data/earthquake_7day.csv", method = "curl")
eq <- read.csv("./data/earthquake_7day.csv")

Let's be good citizens and record the provinance of our data.

attributes(eq)["date.downloaded"] <- date()
attributes(eq)["url"] <- fileUrl
save(eq, file = "earthquakes_7day.RData")
attributes(eq)$date.downloaded
## [1] "Mon Feb  4 20:03:48 2013"

Let's see what they give us.

colnames(eq)
##  [1] "Src"       "Eqid"      "Version"   "Datetime"  "Lat"      
##  [6] "Lon"       "Magnitude" "Depth"     "NST"       "Region"

Something looks a little funny about the distribution of magnitudes. What's up with that bump at magnitude 5? I'm guessing that some detectors are more sensitive than others and the less sensitive ones bottom out at around 5.

hist(eq$Magnitude, main = "Histogram of Earthquake Magnitude", col = "#33669988", 
    border = "#000066")

plot of chunk unnamed-chunk-5

I'm guessing that the better detectors are on the west coast of the US.

westcoast <- c("Northern California", "Central California", "Southern California",  "Greater Los Angeles area, California", "Santa Monica Bay, California", "Long Valley area, California", "San Francisco Bay area, California", "Lassen Peak area, California", "offshore Northern California", "Washington", "Seattle-Tacoma urban area, Washington", "Puget Sound region, Washington", "Olympic Peninsula, Washington", "Mount St. Helens area, Washington", "San Pablo Bay, California", "Portland urban area, Oregon", "off the coast of Oregon")
hist(eq[eq$Region %in% westcoast, "Magnitude"], main = "Histogram of Earthquake Magnitude - West Coast US", 
    col = "#33669988", border = "#000066")

plot of chunk unnamed-chunk-6

Now that looks a little smoother. Let's try R's mapping ability. Check out the ring of fire.

library(maps)
map()
points(x = eq$Lon, y = eq$Lat, col = rgb(1, 0, 0, alpha = 0.3), cex = eq$Magnitude * 
    0.5, pch = 19)

plot of chunk unnamed-chunk-7

Zoom in on the west coast.

map("state", xlim = range(c(-130, -110)), ylim = range(c(25, 50)))
points(x = eq$Lon, y = eq$Lat, col = rgb(1, 0, 0, alpha = 0.3), cex = eq$Magnitude * 
    0.5, pch = 19)

plot of chunk unnamed-chunk-8

Lucky for us, almost all the recent quakes out here were tiny and the biggest was way off the coast.

summary(eq[eq$Region %in% westcoast, "Magnitude"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     1.2     1.4     1.5     1.7     5.3

There are supposed to be active fault lines here in Seattle and you can see a few little ones. Down by Mt. St. Helens, they get a few moderately bigger ones. The only quakes I've felt recently were the contractors doing demolition for remodeling my basement.

The knitr output can also be seen on RPubs: Playing with Earthquake Data and on GitHub cbare/earthquakes.rmd.