Naive Bayes Theorm And Application - Application
2016-04-10 20:57
344 查看
Naive Bayes Theorm And Application - Application
Naive Bayes Theorm And Application - ApplicationAutoclassNaive Bayes for clustering
EMExpectation Maximization
Expected sufficient statistics
Example E step
Example M-step
iteration with EM step
EM algorithm in Naive Bayes
Key observation
E-step for Naive Bayes
M-step for Naive Bayes
Autoclass
Example of Autoclass
Detail of EM steps in Autoclass
Convergence
Reference
Naive Bayes Theore classifier can be applied for Text classification, such as classifying the email into spam and not spam, getting news categories, and finding emotion.
Autoclass:Naive Bayes for clustering
Autoclass is just like Naive Bayes, but it designed for unsupervised learning. Given unlabeled training data D1,...,DN where Di=⟨xi1,...xik⟩ where k is the attributes of the instance i, without class label like win or fail etc. Goal of this problem is to learn a Naive Bayes Model. We introduce 2 symbols: P(C) for the probability for class C and P(Xi|C) for the probability of attributes Xi given class C.To solve this problem, we use maximum likelihood algorithm. Just like the theorem about Naive Bayes we disscuss in Naive Bayes Theorm And Application - Theorem
Parameters:
1. θC = P(C=T)
2. 1−θC = P(C=F)
3. P(Xi=T|C=T) = θTi
4. P(Xi=F|C=T) = 1−θTi
5. P(Xi=T|C=F) = θFi
6. P(Xi=F|C=F) = 1 - θFi
7. θ=⟨θC,θT1,...,θTn,θF1,...θFn⟩
And the approach is for this problem is to find θ that maximizes the formular: L(θ)=p(D|θ)=∏i=1Np(xi|θ) But this is a difficult problem because we don’t have sufficient statistics because the class labels are missing.
EM(Expectation Maximization)
Expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posterior(MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. —— WikipediaThe problem now is that data is not fully observed(not labels).
1.If we know the sufficient statistics of the data, we can choose parameter values so as to maximize the likelihood just like discussed in the theorem essay.
2. If we know the model parameters, we can compute a probability distribution over the missing attributes. From these, we get the expected sufficient statistics
Expected sufficient statistics
From observed data and model parameters we get the probability of every possible completion of the data(guess the label of the data). Then each completion defines sufficient statistics(find θ maximizing the likelihood). The expected sufficient statistics is the expectation, taken over all possible completions, of the sufficient statistics for each completion.Now we can give a general form about EM algorithm:
Repeat:
θold=θ
E-step (Expectation): Compute the expectated sufficient statistics.
M-step (Maxmization): Choose θ so as to maximize the likelihood of the expected sufficient statistical.
Until θ is close to θold
Example: E step
θC=0.7θT1=0.9θF1=0.3θT2=0.6θF2=0.2D=[FTTT]
After the first completions, in other other words, add label vector to the Data matrix, now we assume the augmented matrix Completion1:
1st: [FTTTFF]
2nd: [FTTTFT]
3rd: [FTTTTF]
4th: [FTTTTT]
Example: M-step
The probability ofP(Completion1|θ)∝P(X1=F,X2=T,C=F)∗P(X1=T,X2=T,C=F)=P(C=F)P(X1=F|C=F)P(X2=T|C=F)P(C=F)P(X1=T|C=F)P(X2=T|C=F)=0.3∗0.7∗0.2∗0.3∗0.3∗0.2=0.000756
With the same procedure, we can get:
P(Completion2)∝0.3∗0.7∗0.2∗0.7∗0.9∗0.6=0.015876
P(Completion3)∝0.7∗0.1∗0.6∗0.3∗0.3∗0.2=0.000756
P(Completion4)∝0.7∗0.1∗0.6∗0.7∗0.9∗0.6=0.015876
iteration with EM step
Now a interation finished, we “guess” the completion (often refering labels), and maximize the parameter θ with the completion. Now we can start a new iteration with new θ. The explaination of the epecatation symbols i s given in the Naive Bayes Theorm And Application - Theorem.
E[NT]=0.0227∗0+0.4773∗1+0.0227∗1+0.4773∗2=1.4546E[NF]=N−E[NT]E[NT1,T]=0.0227∗0+0.4773∗1+0.0227∗0+0.4773∗1=0.9546E[NF1,T]=0.0227∗1+0.4773∗0+0.0227∗1+0.4773∗0=0.0454E[NT1,T]=1.4546E[NF2,T]=0.5454
Now we can make Maximum likelihood estimates(2nd M-step) again:
θC=E[NT]/N=1.4546/2=0.7273θT1=E[NT1,T]/E[NT]=0.9546/1.4546=0.6563θF1=E[NF1,T]/E[NF]=0.0454/0.5454=0.0832θT2=E[NT2,T]/E[NT]=1.4546/1.4546=1θF2=E[NT2,T]/E[NF]=0.5454/0.5454=1
EM algorithm in Naive Bayes
In fact, the number of completions is exponential in number of instances.Key observation
We don’t care about the exact completions, only expected sufficient statistics. While Each instances contributes separately to expected sufficient. Then we can:1. enumerate completions of each instance separately.
2. get probability of each completion.
3. get expected contribution of that instance to sufficient statistics.
E-step for Naive Bayes:
Expectation according to the initial parameter θ:1. E[NT] is the expected number of instances in which the class is T.
2. Each instance has probability of the class being T.
3. Each instance contributes that probability to E[NT]
4. In symbols: E[NT]=∑j=1NP(Cj=T|xj1,...,xjn)∝∑j=1NP(Cj=T)∏i=1nP(xji|Cj=T)
5. E[NTi,T] is the expected number of times the class is T when Xi is T. If an instance has Xi≠T, it contributes 0 to E[NTi,T].
6. If an instance has Xi=T, it contributes the probability that the class is T to E[NTi,T].
7. In symbols:
E[NTi,T]=∑j:xji=TNP(Cj=T|xji,...,xjn)∝∑j:xji=TNP(Cj=T)∏i=1nP(xji|Cj=T).
M-step for Naive Bayes:
Maximize the the likelihood according to the expectation:1. E[NT] is the expected number of instances in which the class is T.
2. Each instance has probability of the class being T.
3. Each instance contributes that probability to E[NT]
4. In symbols:
E[NT]=∑j=1NP(Cj=T|xj1,...,xjn)∝∑j=1NP(Cj=T)∏i=1nP(xji|Cj=T)
For notational convenience, we encode T as 1, F as 0, then for instance Xj:
P(xji|Cj=T)=(θTi)xji(1−θTi)1−xjiP(xji|Cj=F)=(θFi)xji(1−θFi)1−xji
Autoclass
Set θC,θTi and θFi to arbitrary values for all attributes. Then Repeat the EM algorithm until convergence:Expectation step
Maximization step
In the expectation step, E[NT]=0E[NTi,T]=0E[NFi,T]=0
For each instance Dj:
pT=θC∏i=1n(θTi)xji(1−θTi)1−xjipF=(1−θC)∏i=1n(θFi)xji(1−θFi)1−xjiq=pTpT+pFE[NT]+=q
for each attribute i:
if x_i & == T:
E[NTi,T]+=q
E[NFi,T]+=(1−q)
In the maximization step:
θC=E[NT]N
For each attribute i: θTi=E[NTi,T]E[NT]θFi=E[NFi,T]N−E[NT]
Example of Autoclass
E-step: Given the the initial “guess” about the parameter θ and expectation:θC=0.7θT1=0.9,θF1=0.3θT2=0.6,θF2=0.2
and the dataset, D:
instances12attribute 1(X_1)FTattributes 2(X_2)TT
and the initial expectation is below :
E[NT]=0E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0E[NF2,T]=0
Detail of EM steps in Autoclass
The 1st E-step, for instance 1:pTpFq=θC∗∏i=12(θT1)xji∗(1−θTi)1−xji=0.7∗(0.90)(1−0.9)1−0∗(0.61)∗(1−0.6)0=0.042=(1−θC)∗∏i=12(θF1)xji∗(1−θFi)1−xji=(1−0.7)∗(0.30)(1−0.3)1−0∗(0.21)∗(1−0.2)0=0.042=0.0420.042+0.042=0.5
after the loop of instance 1, the expectation is below:
E[NT]=0.5E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0.5E[NF2,T]=0.5
For instance 2:
pTpFq=θC∗∏i=12(θT1)xji∗(1−θTi)1−xji=0.7∗(0.91∗0.10)∗(0.61∗0.40)=0.378=(1−θC)∗∏i=12(θF1)xji∗(1−θFi)1−xji=0.3∗(0.31∗0.70)∗(0.21∗0.80)=0.018=0.3780.378+0.018=0.95
after the loop of instance 1, the expectation is below:
E[NT]=1.45E[NT1,T]=0.95E[NF1,T]=0.05E[NT2,T]=1.45E[NF2,T]=0.55
The first M-step, we maximize the parameter according the expectation:
θC=E[NT]N=1.452=0.72
For attribute 1:
θT1=E[NT1,T]E[NT]=0.951.45=0.65θF1=E[NF1,T]N−E[NT]=0.051.45=0.09
For attribute 2:
θT2=E[NT2,T]E[NT]=1.451.45=1.0θF2=E[NF2,T]N−E[NT]=1.451.45=1.0
Convergence
EM improves the likelihood on every iteration, and it’s guaanteed to converge to a maximum of the likelihood function. But the maximum may be a local maximum. There is a tip about Using EM algorithm, don’t start EM with symmetric parameter values, in particular, not starting with uniform.Reference
Most of content in this essay comes from the CMU machine learning course notes, while I’m forgetting the source link. Sorry!相关文章推荐
- XML相关技术资料
- XML简易教程之三
- 基于XML的桌面应用
- XML指南――XML 语法
- XML指南——XML编码
- XHTML标准的版本
- XML指南――XML 属性
- XML指南――XML 确认
- XML简易教程之二
- AIX服务级别---你所需要了解的基本资料
- 查看sql语句执行时间/测试sql语句性能
- Web harvesting
- Hknsc - Hong Kong Network Service company Limited domain names open enhance their awareness of online trademark
- 黑马程序员--SQL Server Truncate关键字
- libemgucv-windesktop-3.1.0.2282.exe安装出现的问题记录
- Windows 7 Migration
- Linux-2.6.20的cs8900驱动分析(二)
- 台湾网站浏览排行榜
- Javase6 index
- How to disable OSD1 and get a transparent color for OSD1. Using DMAI?