R语言绘制heatmap热图
2013-03-16 11:39
246 查看
From:http://bbsunchen.iteye.com/blog/1271580
介绍如何使用R绘制heatmap的文章。
今天无意间在Flowingdata看到一篇关于如何使用R来做heatmap的文章(请移步到这里)。虽然
heatmap只是R中一个很普通的图形函数,但这个例子使用了2008-2009赛季NBA50个顶级球员数据做了一个极佳的演示,效果非常不错。对R大致了解的童鞋可以直接在Rconsole上敲
?heatmap
直接查看帮助即可。
没有接触过R的童鞋继续围观,下面会仔细介绍如何使用R实现NBA50位顶级球员指标表现热图:
关于heatmap,中文一般翻译为“热图”,其统计意义wiki上解释的很清楚:
Aheatmapisagraphicalrepresentationofdatawherethevaluestakenbya
variableinatwo-dimensionalmaparerepresentedascolors.Heatmapsoriginatedin2Ddisplaysofthevaluesinadatamatrix.Largervalueswererepresentedbysmalldarkgrayorblacksquares(pixels)andsmallervalues
bylightersquares.
下面这个图即是Flowingdata用一些
R函数对2008-2009赛季NBA50名顶级球员指标做的一个热图(点击参看大图):
先解释一下数据:
这里共列举了50位球员,估计爱好篮球的童鞋对上图右边的每个名字都会耳熟能详。这些球员每个人会有19个指标,包括打了几场球(G)、上场几分钟(MIN)、得分(PTS)……这样就行成了一个50行×19列的矩阵。但问题是,数据有些多,需要使用一种比较好的办法来展示,Soitcomes,heatmap!
简单的说明:
比如从上面的热图上观察得分前3名(Wade、James、Bryant)PTS、FGM、FGA比较高,但Bryant的FTM、FTA和前两者就差一些;Wade在这三人中STL是佼佼者;而James的DRB和TRB又比其他两人好一些……
姚明的3PP(3PointsPercentage)这条数据很有意思,非常出色!仔细查了一下这个数值,居然是100%。仔细回想一下,似乎那个赛季姚明好像投过一个3分,并且中了,然后再也没有3p。这样本可真够小的!
最后是如何做这个热图(做了些许修改):
Step0.DownloadR
R官网:http://www.r-project.org,它是免费的。官网上面提供了Windows,Mac,Linux版本(或源代码)的R程序。
Step1.Loadthedata
R可以支持网络路径,使用读取csv文件的函数read.csv。
读取数据就这么简单:
Step2.Sortdata
按照球员得分,将球员从小到大排序:
当然也可以选择MIN,BLK,STL之类指标
Step3.Preparedata
把行号换成行名(球员名称):
去掉第一列行号:
Step4.Preparedata,again
把dataframe转化为我们需要的矩阵格式:
Step5.Makeaheatmap
#R的默认还会在图的左边和上边绘制dendrogram,使用Rowv=NA,Colv=NA去掉
这样就得到了上面的那张热图。
Step6.Colorselection
或者想把热图中的颜色换一下:
>library("ALL")
>data("ALL")
>selSamples<-ALL$mol.biol%in%c("ALL1/AF4",
+"E2A/PBX1")
>ALLs<-ALL[,selSamples]
>ALLs$mol.biol<-factor(ALLs$mol.biol)
>colnames(exprs(ALLs))<-paste(ALLs$mol.biol,
+colnames(exprs(ALLs)))
>library("genefilter")
>meanThr<-log2(100)
>g<-ALLs$mol.biol
>s1<-rowMeans(exprs(ALLs)[,g==levels(g)[1]])>
+meanThr
>s2<-rowMeans(exprs(ALLs)[,g==levels(g)[2]])>
+meanThr
>s3<-rowttests(ALLs,g)$p.value<2e-04
>selProbes<-(s1|s2)&s3
>ALLhm<-ALLs[selProbes,]
>library(RColorBrewer)
>hmcol<-colorRampPalette(brewer.pal(10,"RdBu"))(256)
>spcol<-ifelse(ALLhm$mol.biol=="ALL1/AF4",
+"goldenrod","skyblue")
>heatmap(exprs(ALLhm),col=hmcol,ColSideColors=spcol)
最后,可以
>help(heatmap)查找帮助,看看帮助给提供的例子
也可以看这的例子:
http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/
介绍如何使用R绘制heatmap的文章。
今天无意间在
heatmap只是R中一个很普通的图形函数,但这个例子使用了2008-2009赛季NBA50个顶级球员数据做了一个极佳的演示,效果非常不错。对R大致了解的童鞋可以直接在Rconsole上敲
?heatmap
直接查看帮助即可。
没有接触过R的童鞋继续围观,下面会仔细介绍如何使用R实现NBA50位顶级球员指标表现热图:
关于heatmap,中文一般翻译为“热图”,其统计意义
Aheatmapisagraphicalrepresentationofdatawherethevaluestakenbya
variableinatwo-dimensionalmaparerepresentedascolors.Heatmapsoriginatedin2Ddisplaysofthevaluesinadatamatrix.Largervalueswererepresentedbysmalldarkgrayorblacksquares(pixels)andsmallervalues
bylightersquares.
下面这个图即是
先解释一下数据:
这里共列举了50位球员,估计爱好篮球的童鞋对上图右边的每个名字都会耳熟能详。这些球员每个人会有19个指标,包括打了几场球(G)、上场几分钟(MIN)、得分(PTS)……这样就行成了一个50行×19列的矩阵。但问题是,数据有些多,需要使用一种比较好的办法来展示,Soitcomes,heatmap!
简单的说明:
比如从上面的热图上观察得分前3名(Wade、James、Bryant)PTS、FGM、FGA比较高,但Bryant的FTM、FTA和前两者就差一些;Wade在这三人中STL是佼佼者;而James的DRB和TRB又比其他两人好一些……
姚明的3PP(3PointsPercentage)这条数据很有意思,非常出色!仔细查了一下这个数值,居然是100%。仔细回想一下,似乎那个赛季姚明好像投过一个3分,并且中了,然后再也没有3p。这样本可真够小的!
最后是如何做这个热图(做了些许修改):
Step0.DownloadR
R官网:
Step1.Loadthedata
R可以支持网络路径,使用读取csv文件的函数read.csv。
读取数据就这么简单:
nba<-read.csv("http://datasets.flowingdata.com/ppg2008.csv",sep=",")
Step2.Sortdata
按照球员得分,将球员从小到大排序:
nba<-nba[order(nba$PTS),]
当然也可以选择MIN,BLK,STL之类指标
Step3.Preparedata
把行号换成行名(球员名称):
row.names(nba)<-nba$Name
去掉第一列行号:
nba<-nba[,2:20]#ornba<-nba[,-1]
Step4.Preparedata,again
把dataframe转化为我们需要的矩阵格式:
nba_matrix<-data.matrix(nba)
Step5.Makeaheatmap
#R的默认还会在图的左边和上边绘制dendrogram,使用Rowv=NA,Colv=NA去掉
heatmap(nba_matrix,Rowv=NA,Colv=NA,col=cm.colors(256),revC=FALSE,scale='column')
这样就得到了上面的那张热图。
Step6.Colorselection
或者想把热图中的颜色换一下:
heatmap(nba_matrix,Rowv=NA,Colv=NA,col=heat.colors(256),revC=FALSE,scale="column",margins=c(5,10))
BioinformaticsandComputationalBiologySolutionsUsingRandBioconductor第10章的
例子:
Heatmaps,orfalsecolorimageshaveareasonablylonghistory,ashasthe
notionofrearrangingthecolumnsandrowstoshowstructureinthedata.
TheywereappliedtomicroarraydatabyEisenetal.(1998)andhave
becomeastandardvisualizationmethodforthistypeofdata.
Aheatmapisatwo-dimensional,rectangular,coloredgrid.Itdisplays
datathatthemselvescomeintheformofarectangularmatrix.Thecolor
ofeachrectangleisdeterminedbythevalueofthecorrespondingentry
inthematrix.Therowsandcolumnsofthematrixcanberearranged
independently.Usuallytheyarereorderedsothatsimilarrowsareplaced
nexttoeachother,andthesameforcolumns.Amongtheorderingsthat
arewidelyusedarethosederivedfromahierarchicalclustering,butmany
otherorderingsarepossible.Ifhierarchicalclusteringisused,thenitis
customarythatthedendrogramsareprovidedaswell.Inmanycasesthe
resultingimagehasrectangularregionsthatarerelativelyhomogeneous
andhencethegraphiccanaidindeterminingwhichrows(generallythe
genes)havesimilarexpressionvalueswithinwhichsubgroupsofsamples
(generallythecolumns).
Thefunctionheatmapisanimplementationwithmanyoptions.Inparticular,
userscancontroltheorderingofrowsandcolumnsindependently
fromeachother.Theycanuserowandcolumnlabelsoftheirownchoosing
orselecttheirowncolorscheme.
>library("ALL")
>data("ALL")
>selSamples<-ALL$mol.biol%in%c("ALL1/AF4",
+"E2A/PBX1")
>ALLs<-ALL[,selSamples]
>ALLs$mol.biol<-factor(ALLs$mol.biol)
>colnames(exprs(ALLs))<-paste(ALLs$mol.biol,
+colnames(exprs(ALLs)))
>library("genefilter")
>meanThr<-log2(100)
>g<-ALLs$mol.biol
>s1<-rowMeans(exprs(ALLs)[,g==levels(g)[1]])>
+meanThr
>s2<-rowMeans(exprs(ALLs)[,g==levels(g)[2]])>
+meanThr
>s3<-rowttests(ALLs,g)$p.value<2e-04
>selProbes<-(s1|s2)&s3
>ALLhm<-ALLs[selProbes,]
>library(RColorBrewer)
>hmcol<-colorRampPalette(brewer.pal(10,"RdBu"))(256)
>spcol<-ifelse(ALLhm$mol.biol=="ALL1/AF4",
+"goldenrod","skyblue")
>heatmap(exprs(ALLhm),col=hmcol,ColSideColors=spcol)
最后,可以
>help(heatmap)查找帮助,看看帮助给提供的例子
也可以看这的例子:
UsingRtodrawaHeatmapfromMicroarrayData | |
[c] The firstsectionofthispageusesRtoanalyseanAcutelymphocyticleukemia(ALL)microarraydataset,producingaheatmap(withdendrograms)ofgenesdifferentiallyexpressedbetweentwotypesofleukemia. Thereisafollowonpagedealingwithhowtodothis fromPythonusingRPy. Theoriginalcitationfortherawdatais"GeneexpressionprofileofadultT-cellacutelymphocyticleukemiaidentifiesdistinctsubsetsofpatientswithdifferentresponsetotherapyandsurvival"byChiarettietal.Blood2004. ( Theanalysisisa"stepbystep"recipebasedonthispaper, HeatmapsfromR AssumingyouhavearecentversionofR(from TheRProject)and BioConductor(see WindowsXPinstallationinstructions),theexampledatasetcanbedownloadedastheBioConductorALLpackage. YoushouldbeabletoinstallthisfromwithinRasfollows: >source("http://www.bioconductor.org/biocLite.R") Alternatively,youcandownloadthepackagebyhandfrom here. IfyouareusingWindows,downloadALL_1.0.2.zip(orsimilar)andsaveit.ThenfromwithintheRprogram,usethemenuoption"Packages","Installpackage(s)fromlocalzipfiles..."andselecttheZIPfile. OnLinux,downloadALL_1.0.2.tar.gz(orsimilar)anduse sudoRCMDINSTALLALL_1.0.2.tar.gzatthecommandprompt. Withthatoutoftheway,youshouldbeabletostartRandloadthispackagewiththe libraryanddatacommands: >library("ALL") IfyouinspecttheresultingALLvariable,itcontains128sampleswith12625genes,andassociatedphenotypicdata. >ALL Wecanlooksattheresultsofmolecularbiologytestingforthe128samples: >ALL$mol.biol Ignoringthesampleswhichcamebacknegativeonthistest(NEG),mosthavebeenclassifiedashavingatranslocationbetweenchromosomes9and22(BCR/ABL),oratranslocationbetweenchromosomes4and11(ALL1/AF4). Forthepurposesofthisexample,weareonlyinterestedinthesetwosubgroups,sowewillcreateafilteredversionofthedatasetusingthisasaselectioncriteria: >eset<-ALL[,ALL$mol.biol%in%c("BCR/ABL","ALL1/AF4")] Theresultingvariable,eset,containsjust47samples-eachwiththefull12,625geneexpressionlevels. Thisisfartoomuchdatatodrawaheatmapwith,butwecandooneforthefirst100genesasfollows: >heatmap(exprs(eset[1:100,])) AccordingtotheBioConductorpaperwearefollowing,thenextstepintheanalysiswastousethelmFitfunction(fromthelimmapackage)tolookforgenesdifferentiallyexpressedbetweenthetwogroups.Thefittedmodelobject isfurtherprocessedbytheeBayesfunctiontoproduceempiricalBayesteststatisticsforeachgene,includingmoderatedt-statistics,p-valuesandlog-oddsofdifferentialexpression. >library(limma) Ifthelimmapackageisn'tinstalled,you'llneedtoinstallitfirst: >source("http://www.bioconductor.org/biocLite.R") Wecannowreproduce Figure1fromthepaper. >topTable(fit,coef=2) Theleftmostnumbersarerowindices,IDistheAffymetrixHGU95av2accessionnumber,Misthelogratioofexpression,Aisthelogaverageexpression,tthemoderatedt-statistic,andBisthelogoddsofdifferentialexpression. Next,weselectthosegenesthathaveadjustedp-valuesbelow0.05,usingaverystringentHolmmethodtoselectasmallnumber(165)ofgenes. >selected<-p.adjust(fit$p.value[,2])<0.05 ThevariableesetSelhasdataon(only)165genesforall47samples.Wecaneasilyproduceaheatmapasfollows(inR-2.1.1thisdefaultstoayellow/red"heat"colourscheme): >heatmap(exprs(esetSel)) Ifyouhavethetopographicalcoloursinstalled(yellow-green-blue),youcandothis: >heatmap(exprs(esetSel),col=topo.colors(100)) Thisisgettingverycloseto Gentlemanetal.'sFigure2,excepttheyhaveaddedared/bluebanneracrossthetoptoreallyemphasizehowthehierarchicalclusteringhascorrectlysplitthedataintothetwogroups(10and37patients). Todothat,wecanusetheheatmapfunction'soptionalargumentofColSideColors.IcreatedasmallfunctiontomaptheeselSet$mol.biolvaluestored(#FF0000)andblue(#0000FF),whichwecanapplytoeachofthemolecularbiology resultstogetamatchinglistofcoloursforourcolumns: >color.map<-function(mol.biol){if(mol.biol=="ALL1/AF4")"#FF0000"else"#0000FF"} Looksprettyclosenow,doesn'tit: Torecap,thisis"all"weneededtotypeintoRtoachievethis: library("ALL") HeatmapsinR-MoreOptions Onesubtlepointinthepreviousexamplesisthattheheatmapfunctionhasautomaticallyscaledthecoloursforeachrow(i.e.eachgenehasbeenindividuallynormalisedacrosspatients).Thiscanbedisabledusingscale="none",which youmightwanttodoifyouhavealreadydoneyourownnormalisation(orthismaynotbeappropriateforyourdata): heatmap(exprs(esetSel),col=topo.colors(75),scale="none",ColSideColors=patientcolors,cexRow=0.5) Youmightalsohavenoticedintheabovesnippet,thatIhaveshrunktherowcaptionswhichweresobigtheyoverlappedeachother.TherelevantoptionsarecexRowandcexCol. Sofarsogood-butwhatifyouwantedakeytothecoloursshown?Theheatmapfunctiondoesn'tofferthis,butthegoodnewsisthatheatmap.2fromthegplotslibrarydoes.Infact,itoffersalotofotherfeatures,manyofwhich Ideliberatelyturnoffinthefollowingexample: library("gplots") Bydefault,heatmap.2willalsoshowatraceoneachdatapoint(removedthiswithtrace="none").Ifyouaskforakey(usingkey=TRUE)thisfunctionwillactuallygiveyouacombined"colorkeyandhistogram",butthatcanbeoverridden (withdensity.info="none"). Don'tlikethecolourscheme?Tryusingthefunctionsbluered/redblueforared-white-bluespread,orredgreen/greenredforthered-black-greencolourschemeoftenusedwithtwo-colourmicroarrays: library("gplots") HeatmapsfromPython So,howcanwedothatfromwithinPython?Onewayisusing thispage. P.S.Ifyouwanttouseheatmap.2fromwithinpythonusingRPy,usethesyntaxheatmap_2duetothedifferencesinhowRandPythonhandlefullstopsandunderscores. Whataboutothermicroarraydata? Well,Ihavealsodocumentedhowyoucan loadNCBIGEOSOFTfilesintoRasaBioConductorexpressionsetobject.AslongasyoucangetyourdataintoRasamatrixordataframe,convertingitintoanexprSetshouldn'tbetoohard. |
相关文章推荐
- R语言绘制正太分布图,并进行正太分布检验
- 三、R语言可视化--ggplot2和REmap包绘制地图
- R语言绘制热图——pheatmap
- 分词及词云图绘制-R语言
- R语言——绘制半圆形图
- R语言利用wordcloud2绘制词云
- R语言绘制文字云
- R语言抓取暴风魔镜评论数据并绘制各省市购买量热力图
- R语言绘制ROC曲线
- R语言绘制布林带通道
- 利用R语言绘制世界航班路线图
- R语言学习记录图形的绘制(二)
- R语言抓取pm2.5数据绘制全国pm2.5分布图
- R语言图形绘制学习——图形叠加绘制
- R语言:使用REmap绘制超炫酷的地图
- 使用R语言绘制其他图形之相关系数图
- R语言绘制带errorbar 的柱状图
- R语言绘制花瓣图flower plot
- R语言绘制均线图SMA的交叉点