Draw nicer Classification and Regression Trees with the rpart.plot package
2016-06-03 10:14
1556 查看
The basic way to plot a classification or regression tree built withR’s rpart() function
is just to call plot. However, in general, the results just aren’t pretty. As it turns out, for some time now there has been a better way to plot rpart() trees:
the prp() function in Stephen Milborrow’s rpart.plot package.
This function is a veritable “Swiss Army Knife” for plotting trees and the documentation for the package is quite good: in addition to the package pdf, Stephen maintains a very nice and easy-to-read user
manual on his webpage. However, it wasn’t until I stumbled onto Graham Williams recently posted “one pager” about
plotting trees that I realized just how powerful and flexible prp() is. This document is packed packed with ideas about drawing trees withprp() and
is a very useful supplement to Graham’s superb book:Data
Mining with Rattle and R.
For anyone not familiar with prp(), the following script is an idiosyncratic first look and, I hope, motivation for further investigation.
Created by Pretty R at inside-R.org
After loading the data, the script makes some deliberately ill-advised choices in building an rpart() classification tree for the segmentationData from the caret package. Plotting the tree withplot() (not
shown) produces an a couple of black clouds of overlaid text that is fairly typical of what you could expect from an attempt to naively plot a large tree. However, prp() does a pretty good job of plotting
the tree and revealing its structure with just the default settings. And, using a parameter instructing prp() to abbreviate variable names makes the plot even more readable.
The next few lines of code show off the prp()’s interactive pruning capability. The line that assigns the objectnew.tree.1 produces a “live” tree plot. Use the mouse to prune the tree, hit “QUIT”
and replot and you have a fairly nice plot for the top part of the tree. This is slick way to get a legible picture of the top part of a tree into a report.
tree.2, a much more reasonable tree for the segmentationData, results from just accepting the rpart() deaults. First, this tree is plotted with prp() using
the default settings and then, in the next line, the tree is plotted using the fancyRpartPlot() function from Graham Williams rattle package.
This function is just a wrapper for prp() but is easy to use for plotting classification trees and is a very nice example of how aesthetics can facilitate communication. Each node box displays the classification,
the probability of each class at that node (i.e. the probability of the class conditioned on the node) and the percentage of observations used at that node. Notice how the use of the dotted lines tends to emphasize the nodes and not the tree itself, and how
having the bottom level of leaves line up helps the viewer to guess that the percentages in the node boxes indicate the percentage of observations that arrived at each node. (The bottom row adds to 100%, .. well almost). fancyRpartPlot() is
well worth studying as an example of how to choose among the daunting number of parameter options available for prp().
The last few lines of the script construct a tree using the rxDTree()function for building classification and regression tree models with massive data sets. This function is included in the RevoScaleR package
that ships with Revolution R Enterprise. Both prp() andfancyRpartPlot() work
well for these trees too. It is just necessary to run the rxAddInheritance() function on the rxDTree() model object to make sure prp() knows
what to do with it. The parameter settings for rxDTree() should yield a plot very similar to the fancy plot above. Happy plotting!
is just to call plot. However, in general, the results just aren’t pretty. As it turns out, for some time now there has been a better way to plot rpart() trees:
the prp() function in Stephen Milborrow’s rpart.plot package.
This function is a veritable “Swiss Army Knife” for plotting trees and the documentation for the package is quite good: in addition to the package pdf, Stephen maintains a very nice and easy-to-read user
manual on his webpage. However, it wasn’t until I stumbled onto Graham Williams recently posted “one pager” about
plotting trees that I realized just how powerful and flexible prp() is. This document is packed packed with ideas about drawing trees withprp() and
is a very useful supplement to Graham’s superb book:Data
Mining with Rattle and R.
For anyone not familiar with prp(), the following script is an idiosyncratic first look and, I hope, motivation for further investigation.
# Plotting Classification Trees with the plot.rpart and rattle pckages library(rpart) # Popular decision tree algorithm library(rattle) # Fancy tree plot library(rpart.plot) # Enhanced tree plots library(RColorBrewer) # Color selection for fancy tree plot library(party) # Alternative decision tree algorithm library(partykit) # Convert rpart object to BinaryTree library(caret) # Just a data source for this script # but probably one of the best R packages ever. data(segmentationData) # Get some data data <- segmentationData[,-c(1,2)] # Make big tree form <- as.formula(Class ~ .) tree.1 <- rpart(form,data=data,control=rpart.control(minsplit=20,cp=0)) # plot(tree.1) # Will make a mess of the plot text(tree.1) # prp(tree.1) # Will plot the tree prp(tree.1,varlen=3) # Shorten variable names # Interatively prune the tree new.tree.1 <- prp(tree.1,snip=TRUE)$obj # interactively trim the tree prp(new.tree.1) # display the new tree # #------------------------------------------------------------------- tree.2 <- rpart(form,data) # A more reasonable tree prp(tree.2) # A fast plot fancyRpartPlot(tree.2) # A fancy plot from rattle # #------------------------------------------------------------------- # Plot a tree built with RevoScaleR # Construct a model formula sdNames <- names(segmentationData) X <- as.vector(sdNames[-c(1,2,3)]) form <- as.formula(paste("Class","~", paste(X,collapse="+"))) # Run the model rx.tree <- rxDTree(form, data = segmentationData,maxNumBins = 100, minBucket = 10,maxDepth = 5,cp = 0.01, xVal = 0) # Plot the tree prp(rxAddInheritance(rx.tree)) fancyRpartPlot(rxAddInheritance(rx.tree))
Created by Pretty R at inside-R.org
After loading the data, the script makes some deliberately ill-advised choices in building an rpart() classification tree for the segmentationData from the caret package. Plotting the tree withplot() (not
shown) produces an a couple of black clouds of overlaid text that is fairly typical of what you could expect from an attempt to naively plot a large tree. However, prp() does a pretty good job of plotting
the tree and revealing its structure with just the default settings. And, using a parameter instructing prp() to abbreviate variable names makes the plot even more readable.
The next few lines of code show off the prp()’s interactive pruning capability. The line that assigns the objectnew.tree.1 produces a “live” tree plot. Use the mouse to prune the tree, hit “QUIT”
and replot and you have a fairly nice plot for the top part of the tree. This is slick way to get a legible picture of the top part of a tree into a report.
tree.2, a much more reasonable tree for the segmentationData, results from just accepting the rpart() deaults. First, this tree is plotted with prp() using
the default settings and then, in the next line, the tree is plotted using the fancyRpartPlot() function from Graham Williams rattle package.
This function is just a wrapper for prp() but is easy to use for plotting classification trees and is a very nice example of how aesthetics can facilitate communication. Each node box displays the classification,
the probability of each class at that node (i.e. the probability of the class conditioned on the node) and the percentage of observations used at that node. Notice how the use of the dotted lines tends to emphasize the nodes and not the tree itself, and how
having the bottom level of leaves line up helps the viewer to guess that the percentages in the node boxes indicate the percentage of observations that arrived at each node. (The bottom row adds to 100%, .. well almost). fancyRpartPlot() is
well worth studying as an example of how to choose among the daunting number of parameter options available for prp().
The last few lines of the script construct a tree using the rxDTree()function for building classification and regression tree models with massive data sets. This function is included in the RevoScaleR package
that ships with Revolution R Enterprise. Both prp() andfancyRpartPlot() work
well for these trees too. It is just necessary to run the rxAddInheritance() function on the rxDTree() model object to make sure prp() knows
what to do with it. The parameter settings for rxDTree() should yield a plot very similar to the fancy plot above. Happy plotting!
相关文章推荐
- 命令行快速技巧:如何定位一个文件
- jquery+CSS实现的多级竖向展开树形TRee菜单效果
- cmd tree命令 以树形格式罗列文件
- 无限分级和tree结构数据增删改【附DEMO下载】
- 实例讲解jQuery EasyUI tree中state属性慎用
- 推荐8款jQuery轻量级树形Tree插件
- tree 以树形格式罗列文件
- EasyUI Tree+Asp.net实现权限树或目录树导航的简单实例
- E3 tree 1.6在Firefox下显示问题的修复方法
- js树形控件脚本代码
- swing中Tree与滚动条用法实例分析
- Spark Decision Tree
- 实例讲解jQuery EasyUI tree中state属性慎用
- 带check的treeView
- extjs 4 treepanel locked and expand
- JQuery EasyUI 实现tree的右键菜单
- flex tree 的两个小技巧
- Tree.log
- gnome文件管理器(nautilus)树形结构文件窗口 多窗口
- 排序binary tree