This is a little example inspired by Jeff Leek's Data analysis class. Jeff pointed us towards some data from the US Geological Survey on earthquakes. I believe these are the earthquakes for the last 7 days.
According to Nate Silver, analysis of earthquake data has made fools of many. But, that won't stop us. Let's see what we can do with it.
fileUrl <- "http://earthquake.usgs.gov/earthquakes/catalogs/eqs7day-M1.txt"
download.file(fileUrl, destfile = "./data/earthquake_7day.csv", method = "curl")
eq <- read.csv("./data/earthquake_7day.csv")
Let's be good citizens and record the provinance of our data.
attributes(eq)["date.downloaded"] <- date()
attributes(eq)["url"] <- fileUrl
save(eq, file = "earthquakes_7day.RData")
attributes(eq)$date.downloaded
## [1] "Mon Feb 4 20:03:48 2013"
Let's see what they give us.
colnames(eq)
## [1] "Src" "Eqid" "Version" "Datetime" "Lat"
## [6] "Lon" "Magnitude" "Depth" "NST" "Region"
Something looks a little funny about the distribution of magnitudes. What's up with that bump at magnitude 5? I'm guessing that some detectors are more sensitive than others and the less sensitive ones bottom out at around 5.
hist(eq$Magnitude, main = "Histogram of Earthquake Magnitude", col = "#33669988",
border = "#000066")
I'm guessing that the better detectors are on the west coast of the US.
westcoast <- c("Northern California", "Central California", "Southern California", "Greater Los Angeles area, California", "Santa Monica Bay, California", "Long Valley area, California", "San Francisco Bay area, California", "Lassen Peak area, California", "offshore Northern California", "Washington", "Seattle-Tacoma urban area, Washington", "Puget Sound region, Washington", "Olympic Peninsula, Washington", "Mount St. Helens area, Washington", "San Pablo Bay, California", "Portland urban area, Oregon", "off the coast of Oregon")
hist(eq[eq$Region %in% westcoast, "Magnitude"], main = "Histogram of Earthquake Magnitude - West Coast US",
col = "#33669988", border = "#000066")
Now that looks a little smoother. Let's try R's mapping ability. Check out the ring of fire.
library(maps)
map()
points(x = eq$Lon, y = eq$Lat, col = rgb(1, 0, 0, alpha = 0.3), cex = eq$Magnitude *
0.5, pch = 19)
Zoom in on the west coast.
map("state", xlim = range(c(-130, -110)), ylim = range(c(25, 50)))
points(x = eq$Lon, y = eq$Lat, col = rgb(1, 0, 0, alpha = 0.3), cex = eq$Magnitude *
0.5, pch = 19)
Lucky for us, almost all the recent quakes out here were tiny and the biggest was way off the coast.
summary(eq[eq$Region %in% westcoast, "Magnitude"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.2 1.4 1.5 1.7 5.3
There are supposed to be active fault lines here in Seattle and you can see a few little ones. Down by Mt. St. Helens, they get a few moderately bigger ones. The only quakes I've felt recently were the contractors doing demolition for remodeling my basement.
The knitr output can also be seen on RPubs: Playing with Earthquake Data and on GitHub cbare/earthquakes.rmd.
Very nice analytics. A couple of other resources for earthquake data in R is on posted from a while back as well as a shiny app based on the original statistical-research.com post:
ReplyDeletehttp://statistical-research.com/earthquakes-over-the-past-7-days/
http://www.premiersoccerstats.com/wordpress/?p=1283
Nice, thanks for the links.
DeletePlot magnitude my geography. Almost all the quakes under 4.0 are in the US. It looks like there are differing cutoffs used by area. For the rest of the world quakes under 4.0 just are not in the file.
ReplyDelete