您的位置：首页 > 大数据 > 人工智能

朴素贝叶斯分类器（Naive Bayes）

2017-01-18 17:36 435 查看

1. 贝叶斯定理

假设有两个事件，事件A和事件B，已知事件A发生的概率为p(A)，事件B发生的概率为P(B)，事件A发生的前提下，事件B发生的概率为p(B|A)，事件B发生的前提下，事件A发生的概率为p(A|B)，事件A和事件B同时发生的概率是p(AB)，则有

p(AB)=p(A)p(B|A)=p(B)p(A|B)(1)

根据式(1)可以推出贝叶斯定理为

p(B|A)=p(B)p(A|B)p(A)(2)

给定一个全集{B1,B1,…,Bn}，其中Bi与Bj是不相交的，即BiBj=∅，则根据全概率公式，对于一个事件A，会有

p(A)=∑i=1np(Bi)p(A|Bi)(3)

则广义的贝叶斯定理有

p(Bi|A)=p(Bi)p(A|Bi)∑ni=1p(Bi)p(A|Bi)(4)

2. 朴素贝叶斯基本原理

给定一组训练数据集{(X1,y1),(X2,y2),(X3,y3),…,(Xm,ym)}，其中，m是样本的个数，每个数据集包含着n个特征，即Xi=(xi1,xi2,…,xin)。类标记集合为{y1,y2,…,yk}。设p(y=yi|X=x)表示输入的X样本为x时，输出的y为yk的概率。

假设现在给定一个新的样本x，要判断其属于哪一类，可分别求解p(y=y1|x)，p(y=y2|x)，p(y=y3|x)，…，p(y=yk|x)的值，哪一个值最大，就属于那一类。即，求解最大的后验概率 argmaxp(y|x)。

那如何求解出这些后验概率呢？根据贝叶斯定理，有

p(y=yi|x)=p(yi)p(x|yi)p(x)(5)

一般地，朴素贝叶斯方法假设各个特征之间是相互独立的，则式(5)可以写成：

p(y=yi|x)=p(yi)p(x|yi)p(x)=p(yi)∏nj=1p(xj|yi)∏nj=1p(xj)(6)

由于(6)式的分母，对于每一个p(y=yi|x)求解都是一样的，所以，在实际操作中，可以省略掉。最终，朴素贝叶斯分类器的判别公式变成如下的形式：

y=argmaxyip(yi)p(x|yi)=argmaxyip(yi)∏j=1np(xj|yi)(7)

下面，是如何通过样本对 p(y) 和 p(x|y) 进行概率估计。

3. 朴素贝叶斯法的参数估计

3.1 极大似然估计

在朴素贝叶斯法中，学习就是意味着估计先验概率p(y) 和条件概率 p(x|y)，然后根据先验概率和条件概率，去计算新的样本的后验概率 p(y|x)。其中，估计先验概率和条件概率的方法有很多，比如极大似然估计，多项式，高斯，伯努利等。

其中，在极大似然估计中，先验概率p(y)的极大似然估计如下：

p(y=yi)=样本中yi标签的个数总样本的个数(8)

假设输入样本的第j的特征中所有可能取值的集合是 {aj1,aj2,…,ajsj}，则条件概率p(x(j)|y=yi)的极大似然估计如下：

p(x(j)=ajl|y=yi)=在属于yi标签的样本中，第j个特征值等于ajl的个数样本中属于yi标签的个数(9)

例子1

该例子来自李航的《统计学习方法》。

表中X(1)和X(2)为特征，取值的集合分别是A1={1,2,3}，A2={S,M,L}，Y为类标记，Y=1,−1。试求，x=(2,S)的类标记。数据如下所示，其中，特征X(2)的取值{S,M,L}分别表示成{0,1,2}。

import numpy as np
import pandas as pd

x1 = np.array([1,1,1,1,1,2,2,2,2,2,3,3,3,3,3])
x2 = np.array([0,1,1,0,0,0,1,1,2,2,2,1,1,2,2])
y = np.array([-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1])

dataSet = np.concatenate((x1[:,None],x2[:,None],y[:,None]),axis=1)

df = pd.DataFrame(dataSet,index=np.arange(1,16,1),columns=['X1','X2','y'])

df.T


	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3
X2	0	1	1	0	0	0	1	1	2	2	2	1	1	2	2
y	-1	-1	1	1	-1	-1	-1	1	1	1	1	1	1	1	-1

求解

step1: 求解先验概率

p(y=−1)=615，p(y=1)=915

step2 求解条件概率

(2.1) 特征X1

p(X1=1|y=−1)=36=12，p(X1=2|y=−1)=26=13， p(X1=3|y=−1)=16

p(X1=1|y=1)=29，p(X1=2|y=1)=39=13， p(X1=3|y=1)=49

(2.2) 特征X1

p(X2=0|y=−1)=36=12，p(X2=1|y=−1)=26=13， p(X2=2|y=−1)=16

p(X2=0|y=1)=19，p(X2=1|y=1)=49=， p(X2=2|y=1)=49

step3 求解后验概率

p(y=−1)p(X=(2,S)|y=−1)=p(y=−1)p(X1=2|y=−1)p(X2=S|y=−1)=6151312=115

p(y=1)p(X=(2,S)|y=1)=p(y=1)p(X1=2|y=1)p(X2=S|y=1)=9151319=145

因为 115>145，所以该样本的类标记为 −1

如下是python的极大似然估计的朴素贝叶斯代码，代码运行结果跟求解一致。

class MLENB:
"""
Maximum likelihood estimation Naive Bayes

Attributes
----------
class_prior_ : array, shape (n_classes, )
Smoothed empirical probability for each class.
class_count_: array, shape (n_classes,)
number of training samples observed in each class.
MLE_: array, shape(n_classes, n_features)
Maximum likelihood estimation of each feature per class, each of element is a dict
"""

def __init__(self):
pass

def fit(self,X,y):
"""Fit maximum likelihood estimation Naive Bayes according to X, y

Parameters
----------
X : array-like, shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples
and n_features is the number of features.
y : array-like, shape (n_samples,)
Target values.

Returns
-------
self : object
Returns self.
"""
n_samples = X.shape[0]
n_features = X.shape[1]
n_classes = len(set(y))

self.class_count_ = np.empty(n_classes)
self.class_prior_ = np.empty(n_classes)
self.MLE_ = np.empty((n_classes,n_features),dtype=dict)

self.target_unique = np.unique(y)
for i in range(n_classes):
dataX_tu = X[y == self.target_unique[i]]
self.class_prior_[i] = dataX_tu.shape[0] / float(len(y))
self.class_count_[i] = dataX_tu.shape[0]

for j in range(n_features):
feature = dataX_tu[:,j]
feature_unique = np.unique(feature)
fp = {}
for f_item in feature_unique:
fp[f_item] = list(feature).count(f_item) / float(len(feature))
self.MLE_[i,j] = fp

return self

def __predict_likelihood(self,x):
if x.ndim == 1:
x = np.array([x])
n_samples = x.shape[0]
n_features = x.shape[1]
n_classes = len(self.class_count_)

likelihood = []
for x_item in x:
class_p = []
for i in range(n_classes):
p = self.class_prior_[i]
for j in range(n_features):
if x_item[j] in self.MLE_[i,j]:
p *= self.MLE_[i,j][x_item[j]]
else:
p *= 0
class_p.append(p)
likelihood.append(class_p)
return np.array(likelihood)

def predict(self,x):
"""Perform classification on an array of test vectors X.

Parameters
----------
X : array-like, shape = [n_samples, n_features]

Returns
-------
C : array, shape = [n_samples]
Predicted target values for X
"""

likelihood = self.__predict_likelihood(x)
max_index = np.argmax(likelihood, axis=1)
return np.array([self.target_unique[i] for i in max_index])

def predict_proba(self,x):
"""
Return probability estimates for the test vector X.

Parameters
----------
X : array-like, shape = [n_samples, n_features]

Returns
-------
C : array-like, shape = [n_samples, n_classes]
Returns the probability of the samples for each class in
the model. The columns correspond to the classes in sorted
order, as they appear in the attribute `classes_`.
"""
likelihood = self.__predict_likelihood(x)
return np.array([lh / np.sum(lh) for lh in likelihood])

# 测验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mlenb = MLENB()
mlenb.fit(X,y)
print(mlenb.predict(np.array([2,0])))
print(mlenb.predict_proba(np.array([2,0])))

[-1]
[[ 0.75  0.25]]

3.2 Multinomial Naive Bayes

用极大似然估计可能会出现所要估计的概率值为0的情况。这时会影响到后验概率的计算结果，使分类产生偏差。这时，可以采用多项式模型，对先验概率和条件概率做一些平滑处理。具体公式为：

先验概率p(y)的估计如下：

p(y=yi)=样本中yi标签的个数+α样本的总个数+总的类别个数×α(10)

假设输入样本的第j个特征的所有可能取值的集合是 {aj1,aj2,…,ajsj}，则条件概率p(x(j)|y=yi)的估计如下：

p(x(j)|y=yi)=在属于yi标签的样本中,第j个特征值等于ajl的个数+α属于yi标签的个数+第j个特征的唯一值个数×α(11)

其中，α是平滑值。当α=1时，是拉普拉斯平滑（Laplace smoothing），当α=0时，退化到极大似然估计。当0<α<1时，称作Lidstone平滑。

有个疑问：多项式朴素贝叶斯与李航《统计学习方法》中说的贝叶斯估计有啥区别？本文的方法是参考李航的贝叶斯估计。

python的多项式朴素贝叶斯的参考代码如下：

class MultinomialNB:
"""Naive Bayes classifier for multinomial models
Attributes
----------
class_prior_ : array, shape (n_classes, )
Smoothed empirical probability for each class.
class_count_: array, shape (n_classes,)
number of training samples observed in each class.
bayes_estimation_: array, shape(n_classes, n_features)
bayes estimations of each feature per class, each of element is a dict
"""
def __init__(self, alpha=1.0):
self.alpha_ = 1.0

def fit(self,X,y):
n_samples = X.shape[0]
n_features = X.shape[1]
n_classes = len(set(y))

self.class_count_ = np.empty(n_classes)
self.class_prior_ = np.empty(n_classes)
self.bayes_estimation_ = np.empty((n_classes,n_features),dtype=dict)

self.target_unique = np.unique(y)
for i in range(n_classes):
dataX_tu = X[y == self.target_unique[i]]
self.class_prior_[i] = (dataX_tu.shape[0] + self.alpha_) / (float(len(y)) + n_classes * self.alpha_)
self.class_count_[i] = dataX_tu.shape[0]

for j in range(n_features):
feature = dataX_tu[:,j]
feature_unique = np.unique(feature)
fp = {}
for f_item in feature_unique:
fp[f_item] = (list(feature).count(f_item) + self.alpha_) / (float(len(feature)) + len(feature_unique) * self.alpha_)
self.bayes_estimation_[i,j] = fp

return self

def __predict_likelihood(self,x):
if x.ndim == 1:
x = np.array([x])
n_samples = x.shape[0]
n_features = x.shape[1]
n_classes = len(self.class_count_)

likelihood = []
for x_item in x:
class_p = []
for i in range(n_classes):
p = self.class_prior_[i]
for j in range(n_features):
if x_item[j] in self.bayes_estimation_[i,j]:
p *= self.bayes_estimation_[i,j][x_item[j]]
else:
p *= 0
class_p.append(p)
likelihood.append(class_p)
return np.array(likelihood)

def predict(self,x):
likelihood = self.__predict_likelihood(x)
max_index = np.argmax(likelihood, axis=1)
return np.array([self.target_unique[i] for i in max_index])

def predict_proba(self,x):
likelihood = self.__predict_likelihood(x)
return np.array([lh / np.sum(lh) for lh in likelihood])

# 测验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mnb = MultinomialNB()
mnb.fit(X,y)
print(mnb.predict(np.array([2,0])))
print(mnb.predict_proba(np.array([2,0])))

[-1]
[[ 0.65116279  0.34883721]]

3.3 Gaussian Naive Bayes

当输入的特征是连续值的时候，我们无法用上面的方法来估计先验概率和条件概率，可以采用高斯模型。高斯模型假设特征服从高斯分布。

其特征的似然估计如下所示：

p(xi|y)=12πσ2y−−−−√exp(−(xi−μy)22σ2y)(12)

其中，

σ2y是第i个特征的方差，μy是第i个特征的均值。

其python代码如下：

class GaussianNB:
"""
Attributes
----------
class_prior_ : array, shape (n_classes,)
probability of each class.
class_count_ : array, shape (n_classes,)
number of training samples observed in each class.
theta_ : array, shape (n_classes, n_features)
mean of each feature per class
sigma_ : array, shape (n_classes, n_features)
variance of each feature per class
"""

def __init__(self):
pass

def fit(self, X, y):
n_samples = X.shape[0]
n_features = X.shape[1]
n_classes = len(set(y))

self.theta_ = np.zeros([n_classes,n_features])
self.sigma_ = np.zeros([n_classes,n_features])
self.class_prior = np.zeros(n_classes)
self.class_count = np.zeros(n_classes)

self.target_unique = np.unique(y)
for i in range(n_classes):
dataX_tu = X[y == self.target_unique[i]]
self.class_prior[i] = dataX_tu.shape[0] / float(len(y))
self.class_count[i] = dataX_tu.shape[0]
self.theta_[i,:] = np.mean(dataX_tu,axis=0)
self.sigma_[i,:] = np.var(dataX_tu,axis=0)

return self

def __predict_likelihood(self,x):
if x.ndim == 1:
x = np.array([x])

n_samples = x.shape[0]
likelihood = []
for x_item in x:
gaussian = np.exp(-(x_item-self.theta_)**2 / (2 * self.sigma_)) / np.sqrt(2*np.pi*self.sigma_)
p = np.exp(np.sum(np.log(gaussian),axis=1))
likelihood.append(self.class_prior * p)
return np.array(likelihood)

def predict(self,x):
likelihood = self.__predict_likelihood(x)
max_index = np.argmax(likelihood, axis=1)
return np.array([self.target_unique[i] for i in max_index])

def predict_proba(self,x):
likelihood = self.__predict_likelihood(x)
return np.array([lh / np.sum(lh) for lh in likelihood])

# 测验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

gnb = GaussianNB()
gnb.fit(X,y)
print(gnb.predict(np.array([2,0])))
print(gnb.predict_proba(np.array([2,0])))

[-1]
[[ 0.74566865  0.25433135]]

3.4 Bernoulli Naive Bayes

5. Naive Bayes 注意事项

Works only with categorical predictors, numerical predictors must be categorized or binned before use

Works with the assumption of predictor independence, and thus cannot detect or account for relationships between the predictors, unlike a decision tree for example.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： naivebayes 朴素贝叶斯贝叶斯定理概率估计

相关文章推荐

新的分享

章节导航


	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3
X2	0	1	1	0	0	0	1	1	2	2	2	1	1	2	2
y	-1	-1	1	1	-1	-1	-1	1	1	1	1	1	1	1	-1


	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3
X2	0	1	1	0	0	0	1	1	2	2	2	1	1	2	2
y	-1	-1	1	1	-1	-1	-1	1	1	1	1	1	1	1	-1


	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3
X2	0	1	1	0	0	0	1	1	2	2	2	1	1	2	2
y	-1	-1	1	1	-1	-1	-1	1	1	1	1	1	1	1	-1