【机器学习】逻辑回归(matlab实现)
2017-11-27 10:35
489 查看
参考吴恩达的course
链接:http://pan.baidu.com/s/1kUUGiP5 密码:vvsa
载入数据,数据有三列,前两列是x1、x2,第三列是 y,可以形象化为两位面试官给应聘者打分,分数为x1、x2,y的值是0或者1,表示录用或者不录用。
可视化数据
plotData()具体实现为:
横纵坐标分别为x1、x2
g(z)=11+e−z
推导可知:
g′(z)=ddz11+e−z=1(1+e−z)2(e−z)
=11+e−z⋅(1−11+e−z)
=g(z)(1−g(z))
这一条性质非常好
预测函数为sigmoid函数的形式:
hθ(x)=g(θTx)=11+e−θTx
具体实现为:
J(θ)=1m∑i=1m[−y(i)logh(x(i))−(1−y(i))log(1−h(x(i)))]
这个地方log是ln,求梯度过程如下
∂∂θjJ(θ)=−1m∑i=1m⎛⎝⎜y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i))⎞⎠⎟∂∂θjg(θTx(i))
括号右边的数乘进去
=−1m∑i=1m⎛⎝⎜y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i))⎞⎠⎟g(θTx(i))(1−g(θTx(i)))∂∂θjθTx(i)
展开括号
=−1m∑i=1m(y(i)(1−g(θTx(i)))−(1−y(i))g(θTx(i)))x(i)j
=1m∑i=1m(hθ(x(i))−y(i))x(i)j
调用costFunction()函数,具体实现如下:
结果显示
Cost at initial theta (zeros): 0.693147
Gradient at initial theta (zeros):
-0.100000
-12.009217
-11.262842
优化过程直接调用matlab的fminunc()函数,这样不用设置学习率了,也不用写循环迭代了,替换一下(costFunction(t, X, y)),设为自己的就行。
查看结果
Cost at theta found by fminunc: 0.203506
theta:
-24.932760
0.204406
0.199616
plotDecisionBoundary()函数具体实现为:
结果如下:
结果为:0.774321
具体实现为:
sigmoid函数计算出来,如果大于等于0.5,我们就认为应聘者通过面试,否者未通过。
最终在这个数据集上的准确率为:
Train Accuracy: 89.000000
1 载入数据
数据集下载地址链接:http://pan.baidu.com/s/1kUUGiP5 密码:vvsa
载入数据,数据有三列,前两列是x1、x2,第三列是 y,可以形象化为两位面试官给应聘者打分,分数为x1、x2,y的值是0或者1,表示录用或者不录用。
data = load('ex2data1.txt'); %(100,3) X = data(:, [1, 2]); y = data(:, 3); %y是0,1
可视化数据
plotData(X, y); % Put some labels hold on; xlabel('x1') ylabel('x2') legend('positive', 'negative') hold off;
plotData()具体实现为:
function plotData(X, y) figure; hold on; pos = find(y == 1); neg = find(y == 0); %画正样本 plot(X(pos , 1) , X(pos , 2) , 'k+' , 'LineWidth' , 2 , 'MarkerSize' , 10); % LineWidth:线宽 % MarkerSize:标记点的大小 % k黑 % MarkerFaceColor:标记点内部的填充颜色 % y黄 %画负样本 plot(X(neg , 1) , X(neg , 2) , 'ko' , 'MarkerFaceColor' ,'y', 'MarkerSize' , 10); hold off; end
横纵坐标分别为x1、x2
2 实现
2.1 初始化
[m, n] = size(X); %(100,2) X = [ones(m, 1) X]; %(100,3),第一列全是1 initial_theta = zeros(n + 1, 1); %(3,1)
2.2 sigmoid函数
sigmoid函数为:g(z)=11+e−z
推导可知:
g′(z)=ddz11+e−z=1(1+e−z)2(e−z)
=11+e−z⋅(1−11+e−z)
=g(z)(1−g(z))
这一条性质非常好
预测函数为sigmoid函数的形式:
hθ(x)=g(θTx)=11+e−θTx
具体实现为:
function g = sigmoid(z) g = zeros(size(z)); g = 1 ./ (1 + exp(-z)); end
2.3 计算损失函数和梯度
logistic regression 交叉熵损失函数如下:J(θ)=1m∑i=1m[−y(i)logh(x(i))−(1−y(i))log(1−h(x(i)))]
这个地方log是ln,求梯度过程如下
∂∂θjJ(θ)=−1m∑i=1m⎛⎝⎜y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i))⎞⎠⎟∂∂θjg(θTx(i))
括号右边的数乘进去
=−1m∑i=1m⎛⎝⎜y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i))⎞⎠⎟g(θTx(i))(1−g(θTx(i)))∂∂θjθTx(i)
展开括号
=−1m∑i=1m(y(i)(1−g(θTx(i)))−(1−y(i))g(θTx(i)))x(i)j
=1m∑i=1m(hθ(x(i))−y(i))x(i)j
[cost, grad] = costFunction(initial_theta, X, y);
调用costFunction()函数,具体实现如下:
function [J, grad] = costFunction(theta, X, y) m = length(y); J = 0; grad = zeros(size(theta)); H = sigmoid(X*theta); T = -y.*log(H) - (1 - y).*log(1 - H); % Matlab中log()的默认值为ln() % log(exp(1)) = 1 J = 1/m*sum(T); %交叉熵损失 for i = 1 : m, grad = grad + (H(i) - y(i)) * X(i,:)'; end grad = 1/m*grad; end
结果显示
fprintf('Cost at initial theta (zeros): %f\n', cost); fprintf('Gradient at initial theta (zeros): \n'); fprintf(' %f \n', grad);
Cost at initial theta (zeros): 0.693147
Gradient at initial theta (zeros):
-0.100000
-12.009217
-11.262842
优化过程直接调用matlab的fminunc()函数,这样不用设置学习率了,也不用写循环迭代了,替换一下(costFunction(t, X, y)),设为自己的就行。
options = optimset('GradObj', 'on', 'MaxIter', 400); % Run fminunc to obtain the optimal theta % This function will return theta and the cost [theta, cost] = ... fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
查看结果
fprintf('Cost at theta found by fminunc: %f\n', cost); fprintf('theta: \n'); fprintf(' %f \n', theta);
Cost at theta found by fminunc: 0.203506
theta:
-24.932760
0.204406
0.199616
2.4 画出分类边界
调用plotDecisionBoundary()函数% Plot Boundary plotDecisionBoundary(theta, X, y); % Put some labels hold on; % Labels and Legend xlabel('x1') ylabel('x2') % Specified in plot order legend('positive', 'nagetive') hold off;
plotDecisionBoundary()函数具体实现为:
function plotDecisionBoundary(theta, X, y) %the decision boundary defined by theta % PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the % positive examples and o for the negative examples. X is assumed to be % a either % 1) Mx3 matrix, where the first column is an all-ones column for the % intercept. % 2) MxN, N>3 matrix, where the first column is all-ones % Plot Data plotData(X(:,2:3), y); hold on if size(X, 2) <= 3 % Only need 2 points to define a line, so choose two endpoints plot_x = [min(X(:,2))-2, max(X(:,2))+2]; % Calculate the decision boundary line plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1)); % Plot, and adjust axes for better viewing plot(plot_x, plot_y) % Legend, specific for the exercise legend('Admitted', 'Not admitted', 'Decision Boundary') axis([30, 100, 30, 100]) else % Here is the grid range u = linspace(-1, 1.5, 50); v = linspace(-1, 1.5, 50); z = zeros(length(u), length(v)); % Evaluate z = theta*x over the grid for i = 1:length(u) for j = 1:length(v) z(i,j) = mapFeature(u(i), v(j))*theta; end end z = z'; % important to transpose z before calling contour % Plot z = 0 % Notice you need to specify the range [0, 0] contour(u, v, z, [0, 0], 'LineWidth', 2) end hold off end
结果如下:
2.5 预测和评估
2.5.1 预测
x1 、x2 为45,85,预测y的概率,预测出来y是0-1之间的数y = sigmoid([1 45 85] * theta); fprintf(['For 45 and 85, we predict y ' ... 'probability of %f\n\n'], y);
结果为:0.774321
2.5.2 评估准确性
调用了predict()函数% Compute accuracy on our training set p = predict(theta, X); fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100); fprintf('\nProgram paused. Press enter to continue.\n');
具体实现为:
function p = predict(theta, X) m = size(X, 1); p = zeros(m, 1); p = (sigmoid(X*theta) >= 0.5);
sigmoid函数计算出来,如果大于等于0.5,我们就认为应聘者通过面试,否者未通过。
最终在这个数据集上的准确率为:
Train Accuracy: 89.000000
相关文章推荐
- Stanford机器学习网络课程---第三讲(续)Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression
- 机器学习中的逻辑回归和线性回归的matlab程序实现
- Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression
- logistic regression 逻辑回归 matlab实现
- 【机器学习】Octave 实现逻辑回归 Logistic Regression
- stanford coursera 机器学习编程作业 exercise 3(逻辑回归实现多分类问题)
- 【机器学习】逻辑回归原理及其实现
- 逻辑回归和朴素贝叶斯算法实现二值分类(matlab代码)
- 斯坦福《机器学习》课程算法matlab实现之梯度下降算法——直线回归(一)
- Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression
- 机器学习:逻辑回归python实现
- [机器学习]逻辑回归公式推导及其梯度下降法的Python实现
- 机器学习之逻辑回归(logistics regression)代码(牛顿法实现)
- 机器学习:逻辑回归与Python代码实现
- 机器学习:逻辑回归原理及实现代码
- Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression
- 机器学习逻辑回归:原理解析及代码实现
- 机器学习-逻辑回归(python3代码实现)
- 机器学习逻辑回归神经网络手写数字识别(matlab)
- 逻辑回归的Matlab实现