一文詳解如何用 R 語言繪製熱圖

知識 06-18

AI 研習社按：作為目前最常見的一種可視化手段，熱圖因其豐富的色彩變化和生動飽滿的信息表達被廣泛應用於各種大數據分析場景。同時，專用於大數據統計分析、繪圖和可視化等場景的 R 語言，在可視化方面也提供了一系列功能強大、覆蓋全面的函數庫和工具包。

因此，對從業者而言，用 R 語言繪製熱圖就成了一項最通用的必備技能。本文將以 R 語言為基礎，詳細介紹熱圖繪製中遇到的各種問題和注意事項。原文作者 taoyan，原載於作者個人博客，AI 研習社獲授權。

簡介

本文將繪製靜態與互動式熱圖，需要使用到以下R包和函數：

heatmap():用於繪製簡單熱圖的函數

heatmap.2():繪製增強熱圖的函數

d3heatmap:用於繪製互動式熱圖的R包

ComplexHeatmap:用於繪製、注釋和排列複雜熱圖的R&bioconductor包（非常適用於基因組數據分析）

數據準備

使用R內置數據集 mtcars

使用基本函數繪製簡單簡單熱圖

主要是函數 heatmap(x, scale="row")

x: 數據矩陣

scale：表示不同方向，可選值有：row, columa, none

Default plotheatmap(df, scale = "none")

Use custom colorscol

#Use RColorBrewer color palette names

library(RColorBrewer)col

## [1] 32 11

heatmap(df, scale = "none", col=col, RowSideColors = rep(c("blue", "pink"), each=16),

ColSideColors = c(rep("purple", 5), rep("orange", 6)))

#參數RowSideColors和ColSideColors用於分別注釋行和列顏色等,可help(heatmap)詳情

增強熱圖

函數 heatmap.2()

在熱圖繪製方面提供許多擴展，此函數包裝在 gplots 包里。

library(gplots)heatmap.2(df, scale = "none", col=bluered(100),

trace = "none", density.info = "none")#還有其他參數可參考help(heatmap.2())

互動式熱圖繪製

d3heatmap 包可用於生成互動式熱圖繪製,可通過以下代碼生成：

if (!require("devtools"))

install.packages("devtools")

devtools::install_github("rstudio/d3heatmap")

函數 d3heatmap() 用於創建互動式熱圖，有以下功能：

將滑鼠放在感興趣熱圖單元格上以查看行列名稱及相應值

可選擇區域進行縮放

library(d3heatmap)d3heatmap(df, colors = "RdBu", k_row = 4, k_col = 2)

k_row、k_col分別指定用於對行列中樹形圖分支進行著色所需組數。進一步信息可help(d3heatmap())獲取。

使用 dendextend 包增強熱圖

軟體包 dendextend 可以用於增強其他軟體包的功能

library(dendextend)# order for rows

Rowv % scale %>% dist %>%

hclust %>% as.dendrogram %>%

set("branches_k_color", k = 3) %>%

set("branches_lwd", 1.2) %>% ladderize# Order for columns#

We must transpose the data

Colv % scale %>% t %>% dist %>%

hclust %>% as.dendrogram %>%

set("branches_k_color", k = 2, value = c("orange", "blue")) %>% set("branches_lwd", 1.2) %>% ladderize

#增強heatmap()函數

heatmap(df, Rowv = Rowv, Colv = Colv, scale = "none")

#增強heatmap.2()函數

heatmap.2(df, scale = "none", col = bluered(100), Rowv = Rowv, Colv = Colv, trace = "none", density.info = "none")

#增強互動式繪圖函數

d2heatmap()d3heatmap(scale(mtcars), colors = "RdBu", Rowv = Rowv, Colv = Colv)

繪製複雜熱圖

ComplexHeatmap 包是 bioconductor 包，用於繪製複雜熱圖，它提供了一個靈活的解決方案來安排和注釋多個熱圖。它還允許可視化來自不同來源的不同數據之間的關聯熱圖。可通過以下代碼安裝：

if (!require("devtools")) install.packages("devtools")

devtools::install_github("jokergoo/ComplexHeatmap")

ComplexHeatmap 包的主要功能函數是 Heatmap()，格式為：Heatmap(matrix, col, name)

matrix：矩陣

col：顏色向量（離散色彩映射）或顏色映射函數（如果矩陣是連續數）

name：熱圖名稱

library(ComplexHeatmap)

Heatmap(df, name = "mtcars")

#自設置顏色

library(circlize)

Heatmap(df, name = "mtcars", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")))

使用調色板

Heatmap(df, name = "mtcars",col = colorRamp2(c(-2, 0, 2), brewer.pal(n=3, name="RdBu")))

#自定義顏色

mycol

熱圖及行列標題設置

Heatmap(df, name = "mtcars", col = mycol, column_title = "Column title", row_title =

"Row title")

注意，行標題的默認位置是「left」，列標題的默認是「top」。可以使用以下選項更改：

row_title_side：允許的值為「左」或「右」（例如：row_title_side =「right」）

column_title_side：允許的值為「top」或「bottom」（例如：column_title_side =「bottom」）也可以使用以下選項修改字體和大小：

row_title_gp：用於繪製行文本的圖形參數

column_title_gp：用於繪製列文本的圖形參數

Heatmap(df, name = "mtcars", col = mycol, column_title = "Column title",

column_title_gp = gpar(fontsize = 14, fontface = "bold"),

row_title = "Row title", row_title_gp = gpar(fontsize = 14, fontface = "bold"))

在上面的R代碼中，fontface的可能值可以是整數或字元串：1 = plain，2 = bold，3 =斜體，4 =粗體斜體。如果是字元串，則有效值為：「plain」，「bold」，「italic」，「oblique」和「bold.italic」。

顯示行/列名稱：

show_row_names：是否顯示行名稱。默認值為TRUE

show_column_names：是否顯示列名稱。默認值為TRUE

Heatmap(df, name = "mtcars", show_row_names = FALSE)

更改聚類外觀

默認情況下，行和列是包含在聚類里的。可以使用參數修改：

cluster_rows = FALSE。如果為TRUE，則在行上創建集群

cluster_columns = FALSE。如果為TRUE，則將列置於簇上

# Inactivate cluster on rows

Heatmap(df, name = "mtcars", col = mycol, cluster_rows = FALSE)

如果要更改列集群的高度或寬度，可以使用選項column_dend_height和 row_dend_width：

Heatmap(df, name = "mtcars", col = mycol, column_dend_height = unit(2, "cm"),

row_dend_width = unit(2, "cm") )

我們還可以利用 color_branches() 自定義樹狀圖外觀

library(dendextend)

row_dend = hclust(dist(df)) # row clustering

col_dend = hclust(dist(t(df))) # column clustering

Heatmap(df, name = "mtcars", col = mycol, cluster_rows =

color_branches(row_dend, k = 4), cluster_columns = color_branches(col_dend, k = 2))

不同的聚類距離計算方式

參數 clustering_distance_rows 和 clustering_distance_columns

用於分別指定行和列聚類的度量標準，允許的值有「euclidean」, 「maximum」, 「manhattan」, 「canberra」, 「binary」, 「minkowski」, 「pearson」, 「spearman」, 「kendall」。

Heatmap(df, name = "mtcars", clustering_distance_rows = "pearson",

clustering_distance_columns = "pearson")

#也可以自定義距離計算方式

Heatmap(df, name = "mtcars", clustering_distance_rows = function(m) dist(m))

Heatmap(df, name = "mtcars", clustering_distance_rows = function(x, y) 1 - cor(x, y))

請注意，在上面的R代碼中，通常為指定行聚類的度量的參數 clustering_distance_rows顯示示例。建議對參數clustering_distance_columns（列聚類的度量標準）使用相同的度量標準。

# Clustering metric function

robust_dist = function(x, y) {

qx = quantile(x, c(0.1, 0.9)) qy = quantile(y, c(0.1, 0.9)) l = x > qx[1] & x < qx[2] & y

> qy[1] & y < qy[2] x = x[l] y = y[l] sqrt(sum((x - y)^2))}

# Heatmap

Heatmap(df, name = "mtcars", clustering_distance_rows = robust_dist,

clustering_distance_columns = robust_dist,

col = colorRamp2(c(-2, 0, 2), c("purple", "white", "orange")))

聚類方法

參數clustering_method_rows和clustering_method_columns可用於指定進行層次聚類的方法。允許的值是hclust()函數支持的值，包括「ward.D」，「ward.D2」，「single」，「complete」，「average」。

Heatmap(df, name = "mtcars", clustering_method_rows = "ward.D",

clustering_method_columns = "ward.D")

熱圖拆分

有很多方法來拆分熱圖。一個解決方案是應用k-means使用參數km。

在執行k-means時使用set.seed()函數很重要，這樣可以在稍後精確地再現結果

set.seed(1122)

# split into 2 groupsHeatmap(df, name = "mtcars", col = mycol, k = 2)

# split by a vector specifying row classes，有點類似於ggplot2里的分面

Heatmap(df, name = "mtcars", col = mycol, split = mtcars$cyl )

#split也可以是一個數據框，其中不同級別的組合拆分熱圖的行。

# Split by combining multiple variables

Heatmap(df, name ="mtcars", col = mycol, split = data.frame(cyl = mtcars$cyl, am = mtcars$am))

# Combine km and split

Heatmap(df, name ="mtcars", col = mycol, km = 2, split = mtcars$cyl)

#也可以自定義分割

library("cluster")

set.seed(1122)

pa = pam(df, k = 3)Heatmap(df, name = "mtcars", col = mycol, split = paste0("pam",

pa$clustering))

還可以將用戶定義的樹形圖和分割相結合。在這種情況下，split可以指定為單個數字：

row_dend = hclust(dist(df)) # row clusterin

grow_dend = color_branches(row_dend, k = 4)

Heatmap(df, name = "mtcars", col = mycol, cluster_rows = row_dend, split = 2)

熱圖注釋

利用HeatmapAnnotation()對行或列注釋。格式為： HeatmapAnnotation(df, name, col, show_legend)

df：帶有列名的data.frame

name：熱圖標註的名稱

col：映射到df中列的顏色列表

# Transposedf

# Heatmap of the transposed data

Heatmap(df, name ="mtcars", col = mycol)

# Annotation data frame

annot_df

# Define colors for each levels of qualitative variables

# Define gradient color for continuous variable (mpg)

col = list(cyl = c("4" = "green", "6" = "gray", "8" = "darkred"), am = c("0" = "yellow",

"1" = "orange"), mpg = colorRamp2(c(17, 25), c("lightblue", "purple")) )

# Create the heatmap annotation

# Combine the heatmap and the annotation

Heatmap(df, name = "mtcars", col = mycol, top_annotation = ha)

#可以使用參數show_legend = FALSE來隱藏注釋圖例

Heatmap(df, name = "mtcars", col = mycol, top_annotation = ha)

#注釋名稱可以使用下面的R代碼添加

library("GetoptLong")

# Combine Heatmap and annotation

Heatmap(df, name = "mtcars", col = mycol, top_annotation = ha)

# Add annotation names on the right

for(an in colnames(annot_df)) {

seekViewport(qq("annotation_@"))

grid.text(an, unit(1, "npc") + unit(2, "mm"), 0.5, default.units = "npc", just = "left")}

#要在左側添加註釋名稱，請使用以下代碼

# Annotation names on the left

for(an in colnames(annot_df)) { seekViewport(qq("annotation_@")) grid.text(an,

unit(1, "npc") - unit(2, "mm"), 0.5, default.units = "npc", just = "left")}

複雜注釋

將熱圖與一些基本圖形結合起來進行注釋，利用anno_point()，anno_barplot()，anno_boxplot()，anno_density() 和 anno_histogram()。

# Define some graphics to display the distribution of columns

.hist = anno_histogram(df, gp = gpar(fill = "lightblue"))

.density = anno_density(df, type = "line", gp = gpar(col = "blue"))

ha_mix_top = HeatmapAnnotation(hist = .hist, density = .density)

# Define some graphics to display the distribution of rows

.violin = anno_density(df, type = "violin", gp = gpar(fill = "lightblue"), which = "row")

.boxplot = anno_boxplot(df, which = "row")

ha_mix_right = HeatmapAnnotation(violin = .violin, bxplt = .boxplot, which = "row",

width = unit(4, "cm"))

# Combine annotation with heatmap

Heatmap(df, name = "mtcars", col = mycol, column_names_gp = gpar(fontsize = 8),

top_annotation = ha_mix_top, top_annotation_height = unit(4, "cm")) + ha_mix_right

熱圖組合

# Heatmap 1

ht1 = Heatmap(df, name = "ht1", col = mycol, km = 2, column_names_gp = gpar(fontsize = 9))

# Heatmap 2

ht2 = Heatmap(df, name = "ht2", col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")), column_names_gp = gpar(fontsize = 9))

# Combine the two heatmaps

ht1 + ht2

可以使用選項width = unit（3，「cm」））來控制熱圖大小。注意，當組合多個熱圖時，第一個熱圖被視為主熱圖。剩餘熱圖的一些設置根據主熱圖的設置自動調整。這些設置包括：刪除行集群和標題，以及添加拆分等。

draw(ht1 + ht2,

# Titles

row_title = "Two heatmaps, row title",

row_title_gp = gpar(col = "red"),

column_title = "Two heatmaps, column title",

column_title_side = "bottom",

# Gap between heatmaps

gap = unit(0.5, "cm"))

可以使用參數show_heatmap_legend = FALSE，show_annotation_legend = FALSE刪除圖例。

基因表達矩陣

在基因表達數據中，行代表基因，列是樣品值。關於基因的更多信息可以在表達熱圖之後附加，例如基因長度和基因類型。

expr = readRDS(paste0(system.file(package = "ComplexHeatmap"), "/extdata/gene_expression.rds"))

mat = as.matrix(expr[, grep("cell", colnames(expr))])

type = gsub("s\d+_", "", colnames(mat))

ha = HeatmapAnnotation(df = data.frame(type = type))

Heatmap(mat, name = "expression", km = 5, top_annotation = ha, top_annotation_height = unit(4, "mm"),

show_row_names = FALSE, show_column_names = FALSE) +

Heatmap(expr$length, name = "length", width = unit(5, "mm"), col = colorRamp2(c(0, 100000), c("white", "orange"))) +

Heatmap(expr$type, name = "type", width = unit(5, "mm")) +

Heatmap(expr$chr, name = "chr", width = unit(5, "mm"), col = rand_color(length(unique(expr$chr))))

也可以可視化基因組變化和整合不同的分子水平（基因表達，DNA甲基化，…）

可視化矩陣中列的分布

使用函數densityHeatmap()。

densityHeatmap(df)

開發者專場 | 英偉達深度學習學院現場授課

學習形式：線下授課 + 交流答疑

時間：7 月 8 日

地點：深圳市福田區福華路大中華喜來登酒店

培訓價格：1999 元，前五十名報名者提供五折早鳥票，先到先得！

點擊展開全文

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自唯物的精彩文章:

※深度學習先驅 Yoshua Bengio 解讀深度學習的關鍵突破點：無監督學習
※深度學習真的可以零基礎入門嗎？
※OpenAI新研究成果：如何讓AI智能體學會合作、競爭與交流？
※這可能是史上最簡單易懂的 GAN 教程

TAG:唯物 |

您可能感興趣

※怎麼畫好一張零件圖？圖文詳解！
※多圖詳解→如何用圖像混合模式如何給人像調色
※腹肌練習圖文詳解
※智永楷書用筆技法詳解，一文搞定！
※圖文詳解國畫寫意柿子技法
※「壓肩膀」圖文詳解
※工筆畫《和諧圖》圖文步驟詳解
※圖文教程：寫意畫蟬詳解
※圖文詳解：如何用玻璃壺煮一壺老白茶，簡單好學又實用！
※紫砂水深，圖文詳解教你怎麼看懂壺
※圖文教程：寫意燕魚《仙游圖》步驟詳解
※智永楷書用筆技法詳解，一文吃透！
※圖文教程：寫意梅花畫法詳解
※何為「篆籀筆法」？詳解圖解
※陸儼少課圖稿：圖文詳解各種樹法
※一篇文章詳解大數據使用技術和應用場景
※偏癱患者被動功能鍛煉圖文詳解！（一）
※梔子花的繁殖技術說明圖文詳解《收藏》
※一文詳解CRS
※圖文教程：寫意蘭花畫法步驟詳解