您的位置:首页 > 其它

6.2、朴素贝叶斯实例

2016-02-10 15:51 281 查看


贝叶斯实例


junjun


2016年2月10日

Rmarkdown脚本及数据集:http://pan.baidu.com/s/1hr0gTrI


实例一、朴素贝叶斯对莺尾花进行分类

#1、加载数据
data("iris")

#2、创建测试集和训练集数据
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.2.3

set.seed(2005)
index <- createDataPartition(iris$Species, p=0.7, list=F)
train_iris <- iris[index, ]
test_iris <- iris[-index, ]

#3、建模
library(e1071)
model_iris <- naiveBayes(Species~., data=train_iris)

#4、模型评估
summary(model_iris)

##         Length Class  Mode
## apriori 3      table  numeric
## tables  4      -none- list
## levels  3      -none- character
## call    4      -none- call

pred <- predict(model_iris, train_iris, type="class")
mean(pred==train_iris[, 5])

## [1] 0.952381

#5、预测
pred_iris <- predict(model_iris, test_iris, type="class")
mean(pred_iris==test_iris[, 5])

## [1] 1

table(pred_iris, test_iris[, 5])

##
## pred_iris    setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         15         0
##   virginica       0          0        15



实例二、对打网球数据分类并预测

#1、加载数据
data<-read.csv("F:/R/Rworkspace/NB/playingtennis.csv")
str(data)

## 'data.frame':    14 obs. of  6 variables:
##  $ Day        : Factor w/ 14 levels "D1","D10","D11",..: 1 7 8 9 10 11 12 13 14 2 ...
##  $ Outlook    : Factor w/ 3 levels "Overcast","Rain",..: 3 3 1 2 2 2 1 3 3 2 ...
##  $ Temperature: Factor w/ 3 levels "Cool","Hot","Mild": 2 2 2 3 1 1 1 3 1 3 ...
##  $ Humidity   : Factor w/ 2 levels "High","Normal": 1 1 1 1 2 2 2 1 2 2 ...
##  $ Wind       : Factor w/ 2 levels "Strong","Weak": 2 1 2 2 2 1 1 2 2 2 ...
##  $ PlayTennis : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 1 2 1 2 2 ...

summary(data)

##       Day        Outlook  Temperature   Humidity     Wind   PlayTennis
##  D1     :1   Overcast:4   Cool:4      High  :7   Strong:6   No :5
##  D10    :1   Rain    :5   Hot :4      Normal:7   Weak  :8   Yes:9
##  D11    :1   Sunny   :5   Mild:6
##  D12    :1
##  D13    :1
##  D14    :1
##  (Other):8

#从上可知:数据集中的Day属性对分类和预测无用,可以删除

#2、数据清洗
dataset <- data[, 2:6]

#3、建模
library(e1071)
model <- naiveBayes(dataset[, 1:4], dataset[, 5])

#4、预测
new_data <- data.frame("Rain","Hot","High","Strong")
predict(model, new_data)

## [1] Yes
## Levels: No Yes

new_data <- data.frame("Sunny","Mild","Normal","Weak")
predict(model, new_data)

## [1] Yes
## Levels: No Yes
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: