您的位置:首页 > 移动开发

Naive Bayes Theorm And Application - Application

2016-04-10 20:57 344 查看

Naive Bayes Theorm And Application - Application

Naive Bayes Theorm And Application - Application
AutoclassNaive Bayes for clustering

EMExpectation Maximization
Expected sufficient statistics

Example E step

Example M-step

iteration with EM step

EM algorithm in Naive Bayes
Key observation

E-step for Naive Bayes

M-step for Naive Bayes

Autoclass
Example of Autoclass

Detail of EM steps in Autoclass

Convergence

Reference

Naive Bayes Theore classifier can be applied for Text classification, such as classifying the email into spam and not spam, getting news categories, and finding emotion.

Autoclass:Naive Bayes for clustering

Autoclass is just like Naive Bayes, but it designed for unsupervised learning. Given unlabeled training data D1,...,DN where Di=⟨xi1,...xik⟩ where k is the attributes of the instance i, without class label like win or fail etc. Goal of this problem is to learn a Naive Bayes Model. We introduce 2 symbols: P(C) for the probability for class C and P(Xi|C) for the probability of attributes Xi given class C.

To solve this problem, we use maximum likelihood algorithm. Just like the theorem about Naive Bayes we disscuss in Naive Bayes Theorm And Application - Theorem

Parameters:

1. θC = P(C=T)

2. 1−θC = P(C=F)

3. P(Xi=T|C=T) = θTi

4. P(Xi=F|C=T) = 1−θTi

5. P(Xi=T|C=F) = θFi

6. P(Xi=F|C=F) = 1 - θFi

7. θ=⟨θC,θT1,...,θTn,θF1,...θFn⟩

And the approach is for this problem is to find θ that maximizes the formular: L(θ)=p(D|θ)=∏i=1Np(xi|θ) But this is a difficult problem because we don’t have sufficient statistics because the class labels are missing.

EM(Expectation Maximization)

Expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posterior(MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. —— Wikipedia

The problem now is that data is not fully observed(not labels).

1.If we know the sufficient statistics of the data, we can choose parameter values so as to maximize the likelihood just like discussed in the theorem essay.

2. If we know the model parameters, we can compute a probability distribution over the missing attributes. From these, we get the expected sufficient statistics

Expected sufficient statistics

From observed data and model parameters we get the probability of every possible completion of the data(guess the label of the data). Then each completion defines sufficient statistics(find θ maximizing the likelihood). The expected sufficient statistics is the expectation, taken over all possible completions, of the sufficient statistics for each completion.

Now we can give a general form about EM algorithm:

Repeat:

θold=θ

E-step (Expectation): Compute the expectated sufficient statistics.

M-step (Maxmization): Choose θ so as to maximize the likelihood of the expected sufficient statistical.

Until θ is close to θold

Example: E step

θC=0.7θT1=0.9θF1=0.3θT2=0.6θF2=0.2

D=[FTTT]

After the first completions, in other other words, add label vector to the Data matrix, now we assume the augmented matrix Completion1:

1st: [FTTTFF]

2nd: [FTTTFT]

3rd: [FTTTTF]

4th: [FTTTTT]

Example: M-step

The probability of

P(Completion1|θ)∝P(X1=F,X2=T,C=F)∗P(X1=T,X2=T,C=F)=P(C=F)P(X1=F|C=F)P(X2=T|C=F)P(C=F)P(X1=T|C=F)P(X2=T|C=F)=0.3∗0.7∗0.2∗0.3∗0.3∗0.2=0.000756

With the same procedure, we can get:

P(Completion2)∝0.3∗0.7∗0.2∗0.7∗0.9∗0.6=0.015876

P(Completion3)∝0.7∗0.1∗0.6∗0.3∗0.3∗0.2=0.000756

P(Completion4)∝0.7∗0.1∗0.6∗0.7∗0.9∗0.6=0.015876

iteration with EM step

Now a interation finished, we “guess” the completion (often refering labels), and maximize the parameter θ with the completion. Now we can start a new iteration with new θ. The explaination of the epecatation symbols i s given in the Naive Bayes Theorm And Application - Theorem

.

E[NT]=0.0227∗0+0.4773∗1+0.0227∗1+0.4773∗2=1.4546E[NF]=N−E[NT]E[NT1,T]=0.0227∗0+0.4773∗1+0.0227∗0+0.4773∗1=0.9546E[NF1,T]=0.0227∗1+0.4773∗0+0.0227∗1+0.4773∗0=0.0454E[NT1,T]=1.4546E[NF2,T]=0.5454

Now we can make Maximum likelihood estimates(2nd M-step) again:

θC=E[NT]/N=1.4546/2=0.7273θT1=E[NT1,T]/E[NT]=0.9546/1.4546=0.6563θF1=E[NF1,T]/E[NF]=0.0454/0.5454=0.0832θT2=E[NT2,T]/E[NT]=1.4546/1.4546=1θF2=E[NT2,T]/E[NF]=0.5454/0.5454=1

EM algorithm in Naive Bayes

In fact, the number of completions is exponential in number of instances.

Key observation

We don’t care about the exact completions, only expected sufficient statistics. While Each instances contributes separately to expected sufficient. Then we can:

1. enumerate completions of each instance separately.

2. get probability of each completion.

3. get expected contribution of that instance to sufficient statistics.

E-step for Naive Bayes:

Expectation according to the initial parameter θ:

1. E[NT] is the expected number of instances in which the class is T.

2. Each instance has probability of the class being T.

3. Each instance contributes that probability to E[NT]

4. In symbols: E[NT]=∑j=1NP(Cj=T|xj1,...,xjn)∝∑j=1NP(Cj=T)∏i=1nP(xji|Cj=T)

5. E[NTi,T] is the expected number of times the class is T when Xi is T. If an instance has Xi≠T, it contributes 0 to E[NTi,T].

6. If an instance has Xi=T, it contributes the probability that the class is T to E[NTi,T].

7. In symbols:

E[NTi,T]=∑j:xji=TNP(Cj=T|xji,...,xjn)∝∑j:xji=TNP(Cj=T)∏i=1nP(xji|Cj=T).

M-step for Naive Bayes:

Maximize the the likelihood according to the expectation:

1. E[NT] is the expected number of instances in which the class is T.

2. Each instance has probability of the class being T.

3. Each instance contributes that probability to E[NT]

4. In symbols:

E[NT]=∑j=1NP(Cj=T|xj1,...,xjn)∝∑j=1NP(Cj=T)∏i=1nP(xji|Cj=T)

For notational convenience, we encode T as 1, F as 0, then for instance Xj:

P(xji|Cj=T)=(θTi)xji(1−θTi)1−xjiP(xji|Cj=F)=(θFi)xji(1−θFi)1−xji

Autoclass

Set θC,θTi and θFi to arbitrary values for all attributes. Then Repeat the EM algorithm until convergence:

Expectation step

Maximization step

In the expectation step, E[NT]=0E[NTi,T]=0E[NFi,T]=0

For each instance Dj:

pT=θC∏i=1n(θTi)xji(1−θTi)1−xjipF=(1−θC)∏i=1n(θFi)xji(1−θFi)1−xjiq=pTpT+pFE[NT]+=q

for each attribute i:

if x_i & == T:

E[NTi,T]+=q

E[NFi,T]+=(1−q)

In the maximization step:

θC=E[NT]N

For each attribute i: θTi=E[NTi,T]E[NT]θFi=E[NFi,T]N−E[NT]

Example of Autoclass

E-step: Given the the initial “guess” about the parameter θ and expectation:

θC=0.7θT1=0.9,θF1=0.3θT2=0.6,θF2=0.2

and the dataset, D:

instances12attribute 1(X_1)FTattributes 2(X_2)TT

and the initial expectation is below :

E[NT]=0E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0E[NF2,T]=0

Detail of EM steps in Autoclass

The 1st E-step, for instance 1:

pTpFq=θC∗∏i=12(θT1)xji∗(1−θTi)1−xji=0.7∗(0.90)(1−0.9)1−0∗(0.61)∗(1−0.6)0=0.042=(1−θC)∗∏i=12(θF1)xji∗(1−θFi)1−xji=(1−0.7)∗(0.30)(1−0.3)1−0∗(0.21)∗(1−0.2)0=0.042=0.0420.042+0.042=0.5

after the loop of instance 1, the expectation is below:

E[NT]=0.5E[NT1,T]=0E[NF1,T]=0E[NT2,T]=0.5E[NF2,T]=0.5

For instance 2:

pTpFq=θC∗∏i=12(θT1)xji∗(1−θTi)1−xji=0.7∗(0.91∗0.10)∗(0.61∗0.40)=0.378=(1−θC)∗∏i=12(θF1)xji∗(1−θFi)1−xji=0.3∗(0.31∗0.70)∗(0.21∗0.80)=0.018=0.3780.378+0.018=0.95

after the loop of instance 1, the expectation is below:

E[NT]=1.45E[NT1,T]=0.95E[NF1,T]=0.05E[NT2,T]=1.45E[NF2,T]=0.55

The first M-step, we maximize the parameter according the expectation:

θC=E[NT]N=1.452=0.72

For attribute 1:

θT1=E[NT1,T]E[NT]=0.951.45=0.65θF1=E[NF1,T]N−E[NT]=0.051.45=0.09

For attribute 2:

θT2=E[NT2,T]E[NT]=1.451.45=1.0θF2=E[NF2,T]N−E[NT]=1.451.45=1.0

Convergence

EM improves the likelihood on every iteration, and it’s guaanteed to converge to a maximum of the likelihood function. But the maximum may be a local maximum. There is a tip about Using EM algorithm, don’t start EM with symmetric parameter values, in particular, not starting with uniform.

Reference

Most of content in this essay comes from the CMU machine learning course notes, while I’m forgetting the source link. Sorry!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息