A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70's. Let's plot this data as a hierarchically clustered heatmap.
# scale data to mean=0, sd=1 and convert to matrix mtscaled <- as.matrix(scale(mtcars)) # create heatmap and don't reorder columns heatmap(mtscaled, Colv=F, scale='none')
By default, heatmap clusters by both rows and columns. It then reorders the resulting dendrograms according to mean. Setting Colv to false tells it not to reorder the columns, which will come in handy later. Let's also turn off the default scaling across rows. We've already scaled across columns, which is the sensible thing to do in this case.
If our columns are already in some special order, say as a time-series or by increasing dosage, we might want to cluster only rows. We could do that by setting the Colv argument to NA. One thing that clustering the columns tells us in this case is that some information is highly correlated, bordering on redundant. For example, displacement, horsepower and number of cylinders are quit similar. And the idea that to get more power (hp) and go faster (qsec) we need to burn more gas (mpg) is pretty well supported.
Separating clusters
If we'd like to separate out the clusters, I'm not sure of the best approach. One way is to use hclust and cutree, which allows you to specify k, the number of clusters you want. Don't forget that hclust requires a distance matrix as input.
# cluster rows hc.rows <- hclust(dist(mtscaled)) plot(hc.rows) # transpose the matrix and cluster columns hc.cols <- hclust(dist(t(mtscaled))) # draw heatmap for first cluster heatmap(mtscaled[cutree(hc.rows,k=2)==1,], Colv=as.dendrogram(hc.cols), scale='none') # draw heatmap for second cluster heatmap(mtscaled[cutree(hc.rows,k=2)==2,], Colv=as.dendrogram(hc.cols), scale='none')
That works, but, I'd probably advise creating one heatmap and cutting it up in Illustrator, if need be. I have a nagging feeling that the color scale will end up being slightly different between the two clusters, since the range of values in each submatrix is different. Speaking of colors, if you don't like the default heat colors, try creating a new palette with color ramp.
palette <- colorRampPalette(c('#f0f3ff','#0033BB'))(256) heatmap(mtscaled, Colv=F, scale='none', col=palette)
Confusing things
Another way to separate the clusters is to get the dendrograms out of heatmap and work with those. But Cutree applies to objects of class hclust, returned by hclust, and returns a map assigning each row in the original data to a cluster. Cutree takes either a height to cut at (h) or the desired number of clusters (k), which is nice.
Cut applies to dendrograms, which can be returned by heatmap if the keep.dendro option is set. Cut takes only h, not k, and returns a list with members upper and lower. Lower is a list of subtrees below the cut point.
Doing graphics with R starts easy, but gets arcane quickly. There's also a heatmap.2 function in the gplot package that adds color keys among other sparsely documented features.
This all needs some serious straightening out, but the basics are easy enough. Here are a couple more resources to make your heatmaps extra-hot:
- Using R to draw a Heatmap from Microarray Data
- Flowing Data's tutorial How to Make a Heatmap
...more on R.
Wow, your post is really amazing and useful. Thank you for sharing such a great article!
ReplyDeletecan we get the read heatmap clusters instead? we only get rowInd and ColInd, but no idea how to use these to get the same clusters visible on the heatmap
ReplyDeleteIf I understand your question, cut or cutree should give you access to the clusters.
ReplyDeleteThanks for the very useful post! Love it totally!
ReplyDeleteI am very new to R, hence may I ask if there is away to insert a color key/legend into this heat map? I heard that only heatmap.2 can do it. May I know if this is true? I attempt to modify the code slightly and wonder if this is correct:
mtscaled <- as.matrix(scale(mtcars))
heatmap.2(mtscaled, scale='none', key=T, density.info="none", trace="none")
(Strangely Colv=F was not allowed here).
Could you advise if the above is correct? Thanks so much!