您的位置:首页 > 其它

Deep learning:一softmax Regression 练习

2014-12-04 09:30 155 查看
引言:

参看的是http://www.cnblogs.com/tornadomeet/archive/2013/03/23/2977621.html 和 http://deeplearning.stanford.edu/wiki/index.php/Exercise:Softmax_Regression

主要完成的是高光谱数据,训练样本103*42776,测试样本是103*21391,实验环境是MATLAB2009a

实验理论:

只用了softmax模型,没有隐含层,只有输入输出,,输入为原始的高光谱图像,全部数据作为训练,一半数据作为预测。在试验中主要计算误差函数和其偏导数。

其推理过程如下:



oftmax regression中对参数的最优化求解不只一个,每当求得一个优化参数时,如果将这个参数的每一项都减掉同一个数,其得到的损失函数值也是一样的。这说明这个参数不是唯一解。用数学公式证明过程如下所示:

  


  那这个到底是什么原因呢?从宏观上可以这么理解,因为此时的损失函数不是严格非凸的,也就是说在局部最小值点附近是一个”平坦”的,所以在这个参数附近的值都是一样的了。那么怎样避免这个问题呢?其实加入规则项就可以解决(比如说,用牛顿法求解时,hession矩阵如果没有加入规则项,就有可能不是可逆的从而导致了刚才的情况,如果加入了规则项后该hession矩阵就不会不可逆了),加入规则项后的损失函数表达式如下:

  


  这个时候的偏导函数表达式如下所示:

  


注意的事项:
MATLAB程序的实现过程为:在softmaxCost函数中,groundTruth=full(sparse(labels,1:numCase,1))可能不是很好理解:比如data=[1 2 3 4;5 6 7 8],是一个2*4的矩阵,labels为[3 2 4 1],sparse(labels,1:numCase,1):(3,1) 1;(2,1) 1;(4,3) 1;(1,4) 1;比如(1,4)表示标签为1时第4个样本为1,即1{y(4)=1}=0,如果y(4)=其他则要为0;进一步扩展矩阵

0 0 0 1

0 1 0 0

1 0 0 0

0 0 1 0

在softmaxPredict中:theta=softmaxModel.optTheta;

pred=zeros(1,size(data,2));

[nop,pred]=max(theta*data); nop为每一行中最大的数,pred为该数对应的类别是多少;利用acc=mean(labels(:)==pred(:))来计算精确度

实验结果为:74.106%,精度很低,说明softmax不能直接用来对数据进行分类,相比于SVM精度很低。

还需进一步完善的地方:在MATLAB中矩阵的乘法还不是很熟悉,有待进一步练习;

附录代码:

softmaxExercise

clc;
clear all;

%%======================================================================
%% STEP 0: Initialise constants and parameters
%
%  Here we define and initialise some constants which allow your code
%  to be used more generally on any arbitrary input.
%  We also initialise some parameters used for tuning the model.

inputSize=103;
numClasses=9;
lambda=1e-4;

%%======================================================================
%% STEP 1: Load data
%
%  In this section, we load the input and output data.
%  For softmax regression on MNIST pixels,
%  the input data is the images, and
%  the output data is the labels.
load one.mat
train_data=[train_data test_data];
train_data=(train_data-min(train_data(:)))./(max(train_data(:))-min(train_data(:)));
images = train_data;
labels = [train_label;test_label];
%labels(labels==0)=10;

inputData=images;

% DEBUG = true; % Set DEBUG to true when debugging.
DEBUG = false;
if DEBUG
inputSize = 8;
inputData = randn(8, 100);
labels = randi(10, 100, 1);
end

theta=0.005*randn(numClasses*inputSize,1);

%%======================================================================
%% STEP 2: Implement softmaxCost
%
%  Implement softmaxCost in softmaxCost.m.

[cost,grad]=softmaxCost(theta,numClasses,inputSize,lambda,inputData,labels);

%%======================================================================
%% STEP 3: Gradient checking
%
%  As with any learning algorithm, you should always check that your
%  gradients are correct before learning the parameters.
%

if DEBUG
numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ...
inputSize, lambda, inputData, labels), theta);

% Use this to visually compare the gradients side by side
disp([numGrad grad]);

% Compare numerically computed gradients with those computed analytically
diff = norm(numGrad-grad)/norm(numGrad+grad);
disp(diff);
% The difference should be small.
% In our implementation, these values are usually less than 1e-7.

% When your gradients are correct, congratulations!
end

%% STEP 4: Learning parameters
%
%  Once you have verified that your gradients are correct,
%  you can start training your softmax regression code using softmaxTrain
%  (which uses minFunc).

options.maxIter=100;
%softmaxModel其实只是一个结构体,里面包含了学习到的最优参数以及输入尺寸大小和类别个数信息
softmaxModel=softmaxTrain(inputSize,numClasses,lambda,inputData,labels,options);

%%======================================================================
%% STEP 5: Testing
%
%  You should now test your model against the test images.
%  To do this, you will first need to write softmaxPredict
%  (in softmaxPredict.m), which should return predictions
%  given a softmax model and the input data.
test_data=(test_data-min(test_data(:)))./(max(test_data(:))-min(test_data(:)));
images = test_data;
labels = test_label;
%labels(labels==0) = 10; % Remap 0 to 10

inputData=images;
size(softmaxModel.optTheta);
size(inputData);

[pred]=softmaxPredict(softmaxModel,inputData);
acc=mean(labels(:)==pred(:));

fprintf('Accurancy: %0.3f%%\n', acc*100);


softmaxCost

function [cost,grad]=softmaxCost(theta,numClasses,inputSize,lambda,data,labels)

theta=reshape(theta,numClasses,inputSize);

numCase=size(data,2);
groundTruth=full(sparse(labels,1:numCase,1));%%不容易理解的地方

cost = 0;
thetagrad = zeros(numClasses, inputSize);

M=bsxfun(@minus,theta*data,max(theta*data,[],1));
M=exp(M);
p=bsxfun(@rdivide,M,sum(M));

cost=-1/numCase*groundTruth(:)'*log(p(:))+lambda/2*sum(theta(:).^2);

thetagrad=-1/numCase*(groundTruth-p)*data'+lambda*theta;
grad =thetagrad(:);

end


softmaxTrain

function softmaxModel=softmaxTrain(inputSize,numClasses,lambda,inputData,labels,options)

if ~exist('options', 'var')
options = struct;
end

if ~isfield(options, 'maxIter')
options.maxIter = 400;
end

theta = 0.005 * randn(numClasses * inputSize, 1);

addpath minFunc/
options.Method='lbfgs';

minFuncOptions.display='on';

[softmaxOptTheta, cost] = minFunc( @(p) softmaxCost(p, ...
numClasses, inputSize, lambda, ...
inputData, labels), ...
theta, options);

% Fold softmaxOptTheta into a nicer format
softmaxModel.optTheta = reshape(softmaxOptTheta, numClasses, inputSize);
softmaxModel.inputSize = inputSize;
softmaxModel.numClasses = numClasses;

end


softmaxPredict

function [pred]=softmaxPredict(softmaxModel,data)

theta=softmaxModel.optTheta;
pred=zeros(1,size(data,2));

[nop,pred]=max(theta*data);
%nop为每一列最大的数,pred为每一列中索引数;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: