您的位置:首页 > 编程语言

theano学习指南---栈式降噪自编码SdA(翻译)

2016-08-04 15:44 633 查看
欢迎fork我的github:https://github.com/zhaoyu611/DeepLearningTutorialForChinese

最近在学习Git,所以正好趁这个机会,把学习到的知识实践一下~ 看完DeepLearning的原理,有了大体的了解,但是对于theano的代码,还是自己撸一遍印象更深。

注意:本节内容认为读者已经阅读之前的 Classifying MNIST digits using Logistic RegressionMultilayer Perceptron章节。此外,本节用到以下Theano函数和概念:T.tanh, shared variables, basic arithmetic ops, T.grad, Random numbers, floatX. 如果你想在GPU上运行,请查看GPU

注意: 本节代码下载地址:http://deeplearning.net/tutorial/code/SdA.py

栈式降噪自编码(The Stacked Denoising Autoencoder, SdA)是栈式自编码的扩展,首先有Vincent提出的。

本教程以之前的降噪自编码为基础,如果您不熟悉编码,我们建议您先阅读之前的章节。

栈式自编码

将降噪自编码器进行堆栈组成深度网络:将下层的输出量作为上层输入量。该结构的无监督预训练是逐层进行的。每层作为降噪自编码器进行训练,目标是输入量(上层的输出量)的最小化重构误差。当完成前k层训练,就可以进行第k+1层训练,因为我们可以根据之前的层计算进行该层计算。

完成所有层的预训练之后,网络进行第二步训练:微调。这里进行有监督微调,因为我们期望在有监督的测试中最小化预测误差。首先在网络的顶层增加一个逻辑回归层(输出层的结果更加精确)。然后使用训练多感知器的方法训练整个网络。此时,我们只考虑每个自编码器中的编码部分。这一步是有监督的,所以使用目标类进行训练。(更多细节参见Multilayer Perceptron )

Theano中有很多降噪自编码的类,所以上述逻辑很容易实现。栈式降噪自编码可以看做由两方面组成:一系列的自编码器和一个多感知器。预训练时,使用第一个方面:将模型看做一系列的自编码器,分别训练每个自编码器。在训练的第二阶段,使用第二个方面。两方面是相互关联的,原因如下:

多感知器中编码层和sigmoid层参数共享

多感知器的中间层的输出是自编码器的输入

class SdA(object):
"""Stacked denoising auto-encoder class (SdA)

A stacked denoising autoencoder model is obtained by stacking several
dAs. The hidden layer of the dA at layer `i` becomes the input of
the dA at layer `i+1`. The first layer dA gets as input the input of
the SdA, and the hidden layer of the last dA represents the output.
Note that after pretraining, the SdA is dealt with as a normal MLP,
the dAs are only used to initialize the weights.
"""

def __init__(
self,
numpy_rng,
theano_rng=None,
n_ins=784,
hidden_layers_sizes=[500, 500],
n_outs=10,
corruption_levels=[0.1, 0.1]
):
""" This class is made to support a variable number of layers.

:type numpy_rng: numpy.random.RandomState
:param numpy_rng: numpy random number generator used to draw initial
weights

:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
:param theano_rng: Theano random generator; if None is given one is
generated based on a seed drawn from `rng`

:type n_ins: int
:param n_ins: dimension of the input to the sdA

:type hidden_layers_sizes: list of ints
:param hidden_layers_sizes: intermediate layers size, must contain
at least one value

:type n_outs: int
:param n_outs: dimension of the output of the network

:type corruption_levels: list of float
:param corruption_levels: amount of corruption to use for each
layer
"""

self.sigmoid_layers = []
self.dA_layers = []
self.params = []
self.n_layers = len(hidden_layers_sizes)

assert self.n_layers > 0

if not theano_rng:
theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
# allocate symbolic variables for the data
self.x = T.matrix('x')  # the data is presented as rasterized images
self.y = T.ivector('y')  # the labels are presented as 1D vector of
# [int] labels


self.sigmoid_layers存储多感知器中的sigmoid层,self.dA_layers 存储多感知器中的降噪自编码层。

然后,构造n_layers sigmoid层和n_layers降噪自编码层,其中n_layers是模型的深度。其中HiddenLayerMultilayer Perceptron中的类,唯一的不同是用logistic函数

代替tanh非线性函数。连接sigmoid层到多感知器中,并且构造降噪自编码器。降噪自编码器的编码部分可以和sigmoid层共享权矩阵和偏置。

for i in range(self.n_layers):
# construct the sigmoidal layer

# the size of the input is either the number of hidden units of
# the layer below or the input size if we are on the first layer
if i == 0:
input_size = n_ins
else:
input_size = hidden_layers_sizes[i - 1]

# the input to this layer is either the activation of the hidden
# layer below or the input of the SdA if you are on the first
# layer
if i == 0:
layer_input = self.x
else:
layer_input = self.sigmoid_layers[-1].output

sigmoid_layer = HiddenLayer(rng=numpy_rng,
input=layer_input,
n_in=input_size,
n_out=hidden_layers_sizes[i],
activation=T.nnet.sigmoid)
# add the layer to our list of layers
self.sigmoid_layers.append(sigmoid_layer)
# its arguably a philosophical question...
# but we are going to only declare that the parameters of the
# sigmoid_layers are parameters of the StackedDAA
# the visible biases in the dA are parameters of those
# dA, but not the SdA
self.params.extend(sigmoid_layer.params)

# Construct a denoising autoencoder that shared weights with this
# layer
dA_layer = dA(numpy_rng=numpy_rng,
theano_rng=theano_rng,
input=layer_input,
n_visible=input_size,
n_hidden=hidden_layers_sizes[i],
W=sigmoid_layer.W,
bhid=sigmoid_layer.b)
self.dA_layers.append(dA_layer)


现在我们需要做的就是将logistic层添加到sigmoid层的顶部,这样我们就得到了多感知器。我们使用 Classifying MNIST digits using Logistic Regression中的LogisticRegression 类完成这项工作。

# We now need to add a logistic layer on top of the MLP
self.logLayer = LogisticRegression(
input=self.sigmoid_layers[-1].output,
n_in=hidden_layers_sizes[-1],
n_out=n_outs
)

self.params.extend(self.logLayer.params)
# construct a function that implements one step of finetunining

# compute the cost for second phase of training,
# defined as the negative log likelihood
self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
# compute the gradients with respect to the model parameters
# symbolic variable that points to the number of errors made on the
# minibatch given by self.x and self.y
self.errors = self.logLayer.errors(self.y)


SdA类也提供了逐层生成降噪自编码训练函数的方法。该类的返回值是列表形式,其中元素i代表第i层执行一步dA训练。

def pretraining_functions(self, train_set_x, batch_size):
''' Generates a list of functions, each of them implementing one
step in trainnig the dA corresponding to the layer with same index.
The function will require as input the minibatch index, and to train
a dA you just need to iterate, calling the corresponding function on
all minibatch indexes.

:type train_set_x: theano.tensor.TensorType
:param train_set_x: Shared variable that contains all datapoints used
for training the dA

:type batch_size: int
:param batch_size: size of a [mini]batch

:type learning_rate: float
:param learning_rate: learning rate used during training for any of
the dA layers
'''

# index to a [mini]batch
index = T.lscalar('index')  # index to a minibatch

To be able to change the corruption level or the learning rate during training, we associate Theano variables with them.

corruption_level = T.scalar('corruption')  # % of corruption to use
learning_rate = T.scalar('lr')  # learning rate to use
# begining of a batch, given `index`
batch_begin = index * batch_size
# ending of a batch given `index`
batch_end = batch_begin + batch_size

pretrain_fns = []
for dA in self.dA_layers:
# get the cost and the updates list
cost, updates = dA.get_cost_updates(corruption_level,
learning_rate)
# compile the theano function
fn = theano.function(
inputs=[
index,
theano.In(corruption_level, value=0.2),
theano.In(learning_rate, value=0.1)
],
outputs=cost,
updates=updates,
givens={
self.x: train_set_x[batch_begin: batch_end]
}
)
# append `fn` to the list of functions
pretrain_fns.append(fn)

return pretrain_fns


为了改变训练时corruption程度和学习率,需要增加Theano中的变量。

corruption_level = T.scalar('corruption')  # % of corruption to use
learning_rate = T.scalar('lr')  # learning rate to use
# begining of a batch, given `index`
batch_begin = index * batch_size
# ending of a batch given `index`
batch_end = batch_begin + batch_size

pretrain_fns = []
for dA in self.dA_layers:
# get the cost and the updates list
cost, updates = dA.get_cost_updates(corruption_level,
learning_rate)
# compile the theano function
fn = theano.function(
inputs=[
index,
theano.In(corruption_level, value=0.2),
theano.In(learning_rate, value=0.1)
],
outputs=cost,
updates=updates,
givens={
self.x: train_set_x[batch_begin: batch_end]
}
)
# append `fn` to the list of functions
pretrain_fns.append(fn)

return pretrain_fns


现在,函数 pretrain_fns[i]参数包括 index和可选参数corruption——corrution程度和lr——学习率。需要注意的是,参数名是Theano中变量名,而不是Python中的变量名(learning_ratecorruption_level),在使用Theano时请牢记。

微调时,我们将多个函数功能统一到一个函数中。功能包括:train_fn, valid_score test_score

def build_finetune_functions(self, datasets, batch_size, learning_rate):
'''Generates a function `train` that implements one step of
finetuning, a function `validate` that computes the error on
a batch from the validation set, and a function `test` that
computes the error on a batch from the testing set

:type datasets: list of pairs of theano.tensor.TensorType
:param datasets: It is a list that contain all the datasets;
the has to contain three pairs, `train`,
`valid`, `test` in this order, where each pair
is formed of two Theano variables, one for the
datapoints, the other for the labels

:type batch_size: int
:param batch_size: size of a minibatch

:type learning_rate: float
:param learning_rate: learning rate used during finetune stage
'''

(train_set_x, train_set_y) = datasets[0]
(valid_set_x, valid_set_y) = datasets[1]
(test_set_x, test_set_y) = datasets[2]

# compute number of minibatches for training, validation and testing
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
n_valid_batches //= batch_size
n_test_batches = test_set_x.get_value(borrow=True).shape[0]
n_test_batches //= batch_size

index = T.lscalar('index')  # index to a [mini]batch

# compute the gradients with respect to the model parameters
gparams = T.grad(self.finetune_cost, self.params)

# compute list of fine-tuning updates
updates = [
(param, param - gparam * learning_rate)
for param, gparam in zip(self.params, gparams)
]

train_fn = theano.function(
inputs=[index],
outputs=self.finetune_cost,
updates=updates,
givens={
self.x: train_set_x[
index * batch_size: (index + 1) * batch_size
],
self.y: train_set_y[
index * batch_size: (index + 1) * batch_size
]
},
name='train'
)

test_score_i = theano.function(
[index],
self.errors,
givens={
self.x: test_set_x[
index * batch_size: (index + 1) * batch_size
],
self.y: test_set_y[
index * batch_size: (index + 1) * batch_size
]
},
name='test'
)

valid_score_i = theano.function(
[index],
self.errors,
givens={
self.x: valid_set_x[
index * batch_size: (index + 1) * batch_size
],
self.y: valid_set_y[
index * batch_size: (index + 1) * batch_size
]
},
name='valid'
)

# Create a function that scans the entire validation set
def valid_score():
return [valid_score_i(i) for i in range(n_valid_batches)]

# Create a function that scans the entire test set
def test_score():
return [test_score_i(i) for i in range(n_test_batches)]

return train_fn, valid_score, test_score


注意:valid_score 和* test_score*不是Theano的函数,而是Python函数。整个验证集合和测试集合都要使用这两个函数,从而分别产生代价值列表。

整合功能

以下代码构造了栈式自编码:

numpy_rng = numpy.random.RandomState(89677)
print('... building the model')
# construct the stacked denoising autoencoder class
sda = SdA(
numpy_rng=numpy_rng,
n_ins=28 * 28,
hidden_layers_sizes=[1000, 1000, 1000],
n_outs=10
)


训练网络包括两步:逐层预训练和微调。

在预训练时,需要遍历网络的所有层。对于每层,需要使用Theano函数执行SGD优化权重从而减小该层的重构误差。该函数用于训练集合,训练次数由变量pretraining_epochs决定。

s
for epoch in range(pretraining_epochs):
# go through the training set
c = []
for batch_index in range(n_train_batches):
c.append(pretraining_fns[i](index=batch_index,
corruption=corruption_levels[i],
lr=pretrain_lr))
print('Pre-training layer %i, epoch %d, cost %f' % (i, epoch, numpy.mean(c)))

end_time = timeit.default_timer()

print(('The pretraining code for file ' +
os.path.split(__file__)[1] +
' ran for %.2fm' % ((end_time - start_time) / 60.)), file=sys.stderr)


微调过程类似多感知器。唯一的不同是它使用给定函数build_finetune_functions

运行代码

执行代码:

python code/SdA.py


使用默认参数,程序进行15次预训练,每个batchsize为1, 第一层corruption level为0.1,第二层corruption level为0.2,第三层corruption level为0.3。预训练学习率为0.001,微调率为0.1。预训练过程耗时585.01分钟,平均每次训练13分钟。36次微调过程耗时444.2分钟,平均每次12.34分钟。最终验证得分为1.39%,测试得分为1.3% 。实验平台为 an Intel Xeon E5430 @ 2.66GHz CPU, with a single-threaded GotoBLAS。

提示和技巧

减少运行时间的一个方法(假设用户有足够的内存):对网络的前k−1层进行数据转换。例如,首先训练第一层dA。完成训练后,计算数据集合中每个点的隐层单元的值,并存储为新的集合。利用新的集合,采用同样方法计算第二层,第三层等等。此时,可以看到dA是独立训练的,他们完成了输入数据的非线性转换。当完成所有的dA训练,就可以开始微调模型啦~!

参考文献

[1] http://deeplearning.net/tutorial/SdA.html#sda
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息