您的位置：首页 > 其它

【theano-windows】学习笔记六——theano中的循环函数scan

2017-09-05 16:15 447 查看

前言

Scan

是

Theano

中最基础的循环函数, 官方教程主要是通过大量的例子来说明用法. 不过在学习的时候我比较习惯先看看用途, 然后是参数说明, 最后再是研究实例.

国际惯例, 参考网址

官网关于

Scan

的11个例子

官网更全面的介绍

简介

用途

递归的一般形式, 可以被用于循环

scan

有两个特殊的案例

Reduction

和

map

scan

可以按照某个输入序列执行一个函数, 在每个时间戳都提供一个输出, 可以被函数的下一次执行所看到

可以看到之前执行函数在前K步的情况

sum()

操作可以通过在一个列表上

scan

函数

z+x(i)

, 初始状态是

z=0

通常

for

循环可以使用

scan()

搞定, 而且

scan()

是

Theano

处理循环的最接近方法

使用

scan()

进行循环的好处

迭代次数可以成为

符号图

的一部分

最小化GPU转移

有序计算梯度

比

python

的中的

for

循环稍微快点

通过检测实际内存需要, 因而能够降低总内存使用

参考手册

两个特殊的案例

一个

reduce

操作可以被用于返回

scan

的最后一个输出

一个

map

操作可以被用于让函数忽视之前步骤的输出

调用以下几个函数都会使用

Scan

操作:

theano.map(fn, sequences, non_sequences=None, truncate_gradient=-1, go_backwards=False, mode=None, name=None)

参数说明(只提供部分参数说明, 具体可戳第二个参考博客)：

fn

是每一步迭代应用的函数

sequence

是迭代序列列表

non_sequence

是传入

fn

的参数, 这些参数不会被迭代

go_backwards

是

bool

型参数, 如果为

True

就说明

sequence

是从列表的最后一个向着列表开头传入迭代

theano.reduce(fn, sequences, outputs_info, non_sequences=None, go_backwards=False, mode=None, name=None)

参数说明:

fn

是每步迭代应用的函数

sequence

是迭代序列列表

outputs_info

是

reduce

输出的字典列表

non_sequences

传入

fn

的参数列表,这些参数都不会参与迭代

go_backwards

是

bool

型参数, 如果是

True

就说明

sequence

是从列表的最后一个向着列表开头传入迭代

theano.foldl(fn, sequences, outputs_info, non_sequences=None, mode=None, name=None)
theano.foldr(fn, sequences, outputs_info, non_sequences=None, mode=None, name=None)

参数说明(关于

foldl

和

foldr

的说明可以戳这里1,这里2, 这里3):

fn

是每次循环执行的函数

sequence

是跌迭序列列表

outputs_info

输出的字典列表

non_sequences

迭代中不会传入

fn

的参数列表

#scan函数的参数列表
theano.scan(fn, sequences=None, outputs_info=None, non_sequences=None, n_steps=None, truncate_gradient=-1, go_backwards=False, mode=None, name=None, profile=False, allow_gc=None, strict=False, return_list=False)

参数说明:

fn

是每一步

scan

都会执行的操作, 它需要构建一个变量去描述每次迭代的输出, 输入到

theano

的变量期望能够代表输入序列的所有切片和之前的输出值,

non_sequences

也会被丢给

scan

. 输入到

fn

的变量顺序如下:

第一个序列的所有时间片

第二个序列的所有时间片

…

最后一个序列的所有时间片

第一个输出的所有过去片

第二个输出的所有过去片(顺便吐槽一下

theano

的文档错别字真多,

output

都能写成

otuput

)

…

最后一个输出的过去片

其他的参数(

non_sequences

提供的序列)

序列的顺序与丢给

scan

的

sequence

列表一样, 输出的顺序与

outputs_info

的序列一样

关于输入输出的顺序, 官网给了一个例子:

#加入调用scan函数的参数如下

scan(fn, sequences = [ dict(input= Sequence1, taps = [-3,2,-1])
, Sequence2
, dict(input =  Sequence3, taps = 3) ]
, outputs_info = [ dict(initial =  Output1, taps = [-3,-5])
, dict(initial = Output2, taps = None)
, Output3 ]
, non_sequences = [ Argument1, Argument2])

那么

fn

接收参数的顺序如下:

#scan中fn接收的参数顺序

Sequence1[t-3]
Sequence1[t+2]
Sequence1[t-1]
Sequence2[t]
Sequence3[t+3]
Output1[t-3]
Output1[t-5]
Output3[t-1]
Argument1
Argument2

在

non_sequences

列表中可以包含共享变量, 虽然

scan

自己可以指出它们, 因而可以跳过, 但是为了代码的清晰, 还是建议提供它们(这些共享变量). 当然

scan

也可以断定其他的

non_sequences

(非共享的), 即使它们没有被传递给

scan

, 一个简单的例子如下:

import theano.tensor as TT
W   = TT.matrix()
W_2 = W**2
def f(x):
return TT.dot(x,W_2)

scan

函数希望返回两个东西

一个是输出列表, 输出顺序与

outputs_info

一样, 不同在于每一个输出初始状态必须仅有一个输出变量(既然它没有用)

另一个是

fn

需要返回一个更新字典(告诉如何在每次迭代以后更新共享变量), 字典可以是元组列表.

这两个返回的列表没有顺序限制,

fn

可以返回

(output_list,update_dictionary)

或者

(update_dictionary,output_list)

或者仅仅输出一个(在这种情况下,另一个就是空)

为了将

scan

作为

while

循环使用, 还需要返回一个停止条件, 在

until

类中加入, 这个条件应该被当做第三个元素返回, 比如

return [y1_t, y2_t], {x:x+1}, theano.scan_module.until(x < 50)

sequences

是描述

scan

迭代的

Theano

变量或者字典的列表, 如果提供的是字典, 那么一系列的可选信息可以被提供, 字典需要包含如下

keys

input

(强制性的): 代表序列的

Theano

变量

taps

fn

所需要的序列的时间拍子. 作为一组整数列表提供, 值

表示第

步迭代会将

t+k

时间片数据传递给

fn

, 默认值是

在列表

sequences

中任何的

Theano

变量都被自动地包装到字典中, 其中

taps

设置为

output_info是描述循环计算的输出的初始状态的

Theano

变量或者字典列表, 当初始状态作为字典给出以后, 关于输出对应的初始状态的可选信息可以被提供. 这个字典应该包含:

initial

: 代表给定输出初始状态的

Theano变量

, 如果输出不是递归计算且不需要初始状态, 那么这部分可以被忽略.

taps

: 传递给

fn

的时间拍子, 是负整数列表, 值

代表第

次迭代将会传递

t+k

片给

fn

如果

output_info

是空列表或者

None

scan

会假设任何的输出都没有使用拍子. 如果仅仅提供了输出的自己, 那么会报错(因为没有任何的约定去只是如何将提供的信息映射到

fn

的输出)

non_sequences

是传递给

fn

的参数列表, 可以选择排除列表中传递给

fn

的变量, 但是不建议这么做

n_steps

是迭代次数

truncate_gradient

是截断

BPTT(Backpropagation Through Time)

算法的迭代次数，这个应该是与

RNN

有关的梯度更新时候需要使用的

go_backwards

: 标志着

scan

是否需按照序列反向取值. 如果每个序列是按时间索引, 那么这个值是

True

的时候, 那么就从最后到

行进

name

: 当分析

scan

的时候, 为

scan

的任意实例提供一个名字很重要, 这个分析器能够提供你的代码的整体描述, 而且可以分析实例每一步的计算.

mode

: 建议将这个参数置为

None

profile

: 暂时不了解先

allow_gc

暂时不了解先

strict

如果是

true

,那么要求

fn

中的共享变量必须作为

non_sequences

或者

sequences的一部分被提供

return_list

: 如果是

true

, 那么即使只有一个输出, 也会返回一个列表

返回值是以元组的形式返回

(outputs,updates)

outputs

是

theano

变量或者

theano

变量列表, 与

outputs_info

顺序相同

updates

是字典子集, 指定了共享变量的更新方法, 这个字典需要被传递到

theano.function

中. 与一般的字典不同的是,

keys

是共享变量, 这些字典的添加是一致的

theano.scan_checkpoints(fn, sequences=[], outputs_info=None, non_sequences=[], name='checkpointscan_fn', n_steps=None, save_every_N=10, padding=True)

描述

更加节省空间的Scan函数, 但是使用更加严格, 在

scan()

中, 对每个输入计算关于输出的梯度, 你需要存储每步的中间结果, 这很费内存. 而这个

scan_checkpoints()

允许

save_every_n

步前向计算, 而不去存储中间结果, 也允许在梯度计算期间重新计算它们.

参数说明:

fn

: 迭代函数

sequences

theano

变量或者字典列表, 描述

scan

迭代所需序列, 每个序列必须相同长度

outputs_info

: 循环计算的输出的初始状态, 是

theano

变量或者字典列表

non_sequences

: 是传递给

fn

的参数列表

n_steps

:是迭代次数

save_every_N

: 不需要存储

scan

计算的步骤数

padding

: 如果序列的长度不是

save_every_N

的准备横竖被, 那么就填充

, 以保证

scan

正常运行

输出:与

scan()

一样,输出

(outputs,updates)

元组形式, 区别在于: 它仅仅包含每

save_every_N

步的输出. 没有被函数返回的时间步将会在梯度计算中重新计算

实例

一下实例的书写都要先引入模块

import theano
import theano.tensor as T

1.计算A**K

如果在

python

中用循环写,可以是这样:

#计算A**k
k=2
A=3
result=1
for i in range(k):
result=result*A
print result

分析一下需要三件事情被处理:

result

的初始值、

result

的累积结果、不变量

. 那么不变量就存在

non_sequences

中, 初始化就存在

outputs_info

中, 累积操作是自动发生的:

k=T.iscalar('k')
A=T.vector('A')#其实按照习惯,最好是写T.dvector之类的
result, updates = theano.scan(fn=lambda prior_result,A : prior_result * A, #迭代使用函数
outputs_info=T.ones_like(A),#丢给prior_result
non_sequences=A,#丢给A
n_steps=k)#迭代次数

上面代码中注意

scan()

固定的接收参数顺序: 输出先验(初始值)、

non_sequence

; 但是由于

scan()

返回的是每次迭代的结果, 所以只需要取出最后一次结果

final_result = result[-1]

然后放到

function

中去编译, 返回相关结果

power=theano.function(inputs=[A,k],outputs=final_result,updates=updates)#放到函数中编译

然后返回

0~9

的平方的结果

print power(range(10),2)
#[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]

2. 主维度迭代: 多项式计算

除了按照固定次数的迭代,

scan()

也可以按照主维度去迭代, 类似于第一个实例用的是

for i in range(k)

,而本实例关注的是

for iter in a_list

. 这时候提供的循环

tensors

需要使用

sequences

关键字

本实例演示的是从一个系数列表中构建符号计算:

f=c[1]∗x0+c[2]∗x1+c[3]∗x2

按照上面的步骤, 同样先定义两个变量分别指示系数

和输入变量

, 指数是从

0~inf

的, 那么就用

arange

取值就行了, 关于输出值的初始状态就不需要了, 因为输出值并没有被迭代计算:

coefficients=T.dvector('coefficients')#系数
x=T.dscalar('x')#变量
max_coefficients_supported=10000#指数

定义完所需变量后, 按照

lambda

定义的

fn

中定义顺序, 传递

sequences

指定系数和指数, 然后使用

outputs_info

初始化输出, 因为输出无需初始化或者迭代计算, 所以为

None

,其实也可以省略这个参数不写. 最后在

non_sequences

中传递变量, 一定要注意传递给
fn
的参数顺序是
sequences
、
outputs_info
、
non_sequences

components, updates=theano.scan(lambda coefficients,power,free_variable: coefficients*(free_variable**power),
outputs_info=None,
sequences=[coefficients,T.arange(max_coefficients_supported)],
non_sequences=x)

分析一下: 先把

sequences

中的

coefficients

丢给

lambda

中的

coefficients

T.arange(max_coefficients_support)

定义的指数丢给

power

,然后因为

outputs_info

是

None

, 说明它相当于没有, 可以忽视它继续看后面的将

non_sequences

丢给

free_variable

, 接下来计算加和及在

function

中编译, 最后测试

polynomial=components.sum()
calculate_ploynomial=theano.function(inputs=[coefficients,x],outputs=polynomial)
#test
test_coefficients=np.asarray([1,0,2],dtype=np.float32)
test_value=3
print 'use scan result:',calculate_ploynomial(test_coefficients,test_value)
print 'use normal calc:',(1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2))
#use scan result: 19.0
#use normal calc: 19.0

有几个有趣的事情注意一下:

首先生成系数, 然后把它们加和起来. 其实也可以沿途计算加和, 然后取最后一个值, 这更具内存效率

第二就是结果没有累积状况, 将

outputs_info=None

, 这表明

scan

不会将先验结果传递给

fn

, 注意参数传递顺序:

sequences (if any), prior result(s) (if needed), non-sequences (if any)

第三就是有一个便捷操作, 利用

thenao.tensor.arange

到

sequences

中, 为什么长度不一样也能丢到

fn

中呢？看第四条

第四就是如果给定的多个

sequences

不是同一长度,

scan

会截断它们为最短的长度.也就是说指数本来是

0~9999

, 但是按照

coefficients,T.arange(max_coefficients_supported)

中最短的那个截断.

随后我们自己用中间变量写一次试试, 累加和写到输出先验

results

变量中, 存储在

scan()

函数的

outputs_info

参数中

#尝试自己用累加和写一遍
results=np.array([0],dtype='int32')
c=T.vector('c',dtype='int32')
x=T.scalar('x',dtype='int32')
components, updates=theano.scan(lambda i,results,c,x: results+c[i]*(x**i),
sequences=T.arange(c.shape[0],dtype='int32'),
outputs_info= results,
non_sequences=[c,x])
final_res=components[-1]
cal_poly=theano.function(inputs=[c,x],outputs=final_res)
test_c=np.asarray([1,0,2],dtype=np.int32)
test_value=3
print cal_poly(test_c,test_value)
#19

【PS】鬼知道我写的对不对, 各位亲们如果感觉哪里出问题希望多多交流，目前结果反正是对的，比较坑的是一定要注意传入
fn
的参数一定要是相同类型, 我刚开始直接声明

results=0

, 才发现这个是

int8

类型, 结果一直报错, 坑

3. 简单的标量加法, 剔除lambda表达式

上面的例子的表达式都是在

theano.scan

中用

lambda

表达式写的, 有一件事一定要注意: 提供的初始状态, 也就是

outputs_info

必须与每次迭代的输出变量的形状大小相同

下面计算的是

results=n∗(n+1)2=1+2+3+⋯+n

先定义变量

, 以及使用

def

外部定义乘法函数

#定义n
up_to=T.iscalar('up_to')
#定义加法操作,这只是上一个结果加上下一个数字, 所以在scan中需要循环
def accumulate_by_adding(arrange_val,sum_to_date):
return sum_to_date+arrange_val#返回值给scan的outputs_info参数
seq=T.arange(up_to)

定义

scan

中的循环

#定义scan操作
outputs_info=T.as_tensor_variable(np.array(0,seq.dtype))
scan_result, scan_updates=theano.scan(accumulate_by_adding,
sequences=seq,#传给arrange_val
outputs_info=outputs_info,#传给sum_to_date
non_sequences=None)
triangular_sequence=theano.function(inputs=[up_to],outputs=scan_result)

测试一下:

#test
some_num=15
print(triangular_sequence(some_num))
print [n * (n + 1) // 2 for n in range(some_num)]
#[  0   1   3   6  10  15  21  28  36  45  55  66  78  91 105]
#[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105]

4.设置指定索引值

此例子是定义一个全零矩阵, 然后对指定索引出赋值, 如

(1,1)

处把

改为42, 把

(2,3)

赋值为

等

先定义三个变量

location=T.imatrix('location')#位置
values=T.vector('values')#位置对应的赋值
output_model=T.matrix('output_model')#输出矩阵

然后定义替换函数, 注意使用

theano.tensor

的

set_subtensor

函数可以替换值, 这个在博客《【theano-windows】学习笔记五——theano中张量部分函数》中有提到过

#定义替换函数
def set_value_at_position(a_location,a_value,output_model):
zeros=T.zeros_like(output_model)
zeros_subtensor=zeros[a_location[0],a_location[1]]
return T.set_subtensor(zeros_subtensor,a_value)#替换值

然后设计

scan

函数, 以及使用

function

编译

#设计scan
result, updates = theano.scan(fn=set_value_at_position,
outputs_info=None,
sequences=[location, values],
non_sequences=output_model)
assign_values_at_positions=theano.function(inputs=[location,values,output_model],
outputs=result)

测试

#test
test_locations=np.asarray([[1,1],[2,3]],dtype=np.int32)
test_values=np.asarray([42,50],dtype=np.float32)
test_output_model=np.zeros((5,5),dtype=np.float32)
print assign_values_at_positions(test_locations,test_values,test_output_model)
'''
[[[  0.   0.   0.   0.   0.]
[  0.  42.   0.   0.   0.]
[  0.   0.   0.   0.   0.]
[  0.   0.   0.   0.   0.]
[  0.   0.   0.   0.   0.]]

[[  0.   0.   0.   0.   0.]
[  0.   0.   0.   0.   0.]
[  0.   0.   0.  50.   0.]
[  0.   0.   0.   0.   0.]
[  0.   0.   0.   0.   0.]]]
'''

5. 共享变量——吉布斯采样

例子: 进行十次吉布斯采样

expressionloop:P(h|v),P(v|h),P(h|v),⋯,P(v|h)wrt.P(h|v)=sigmoid(w∗v+bh)P(v|h)=sigmoid(w∗h+bv)

定义三个变量: 权重、可见层偏置、隐藏层偏置

W=theano.shared(W_values)#权重
bvis=theano.shared(bvis_values)#可见层偏置
bhid=theano.shared(hvis_values)#隐藏层偏置

计算概率，并采样

#随机流
trng=T.shared_randomstreams.RandomStreams(1234)

#一次吉布斯采样
def OneStep(vsample):
hmean=T.nnet.sigmoid(theano.dot(vsample,W)+bhid)#从v到h,激活概率
hsample=trng.binomial(size=hmean.shape,n=1,p=hmean)#采样
vmean=T.nnet.sigmoid(theano.dot(hsample,W.T)+bvis)#从h到v激活概率
return trng.binomial(size=vsample.shape,n=1,p=vmean,dtype=thenao.config.floatX)#采样

在

scan

中循环十次, 用

function

激活参数更新

sample=T.vector()

values,updates=theano.scan(Onestep,
sequences=None,
outputs_info=sample,
nstep=10)

gibbs10=theano.function([sample],values[-1],updates=updates)

【注】这个代码暂时运行不了, 后面用

theano

构建受限玻尔兹曼机RBM的时候再细究

这里需要注意两个问题:

第一个就是更新字典的重要性. 它将k步后的更新值与共享变量链接起来. 它指出十次迭代之后随机流是如何更新的. 如果不将更新字典传递给

function

, 那么会得到十组相同的随机数. 比如

a = theano.shared(1)
b=T.dscalar('b')
c=T.dscalar('c')

values, updates = theano.scan(lambda :{a: a+1}, n_steps=10)
b = a + 1
c = updates[a] + 1
f = theano.function([], [b, c], updates=updates)
print f()#[array(2), array(12)]
print a.get_value()#11
print f()#[array(12), array(22)]
print a.get_value()#21

【注】这个例子的官方文档书写可能有问题, 可以参考我的改改，但是我写的不一定对嘛

我们可以发现这个例子中, 更新

和

的区别在于, 一个用

updates

, 而另一个没有, 因而使用了

updates

的变量可以在每次迭代中获取到

的更新值

, 而没有使用

updates

更新规则的函数中,

的值始终是

,这就是为什么看到了两个结果

1+1=2

和

11+1=12

第二个就是如果使用了共享变量, 但是不想对他们进行迭代, 你可以不将他们传递为参数. 但是还是建议传递到

Scan

, 因为可以省去

scan

查找它们并放入到图中的时间, 然后把它们给

non_sequences

参数.那么就可以再写一遍

Gibbs

采样

W=theano.scan(W_values)
bvis=theano.shared(bvis_values)
bhid=theano.shared(bhid_values)

trng=T.shared_randomstreams.RandomStreams(1234)

def OneStep(vsample,W,bvis,bhid):
hmean=T.nnet.sigmoid(theano.dot(vsample,W)+bhid)
hsample=trng.binomial(size=hmean.shape,n=1,p=hmean)
vmean=T.nnet.sigmoid(theano.dot(hsample,W.T)+bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)

sample=T.vector()
values,updates=theano.scan(fn=OneStep,
sequences=None,
outputs_info=sample,
non_sequences=[W,bvis,bhid])
gibbs10=theano.function([sample],values[-1],updates=updates)

上面说将共享变量传入

scan

可以简化计算图, 这可以提高优化以及执行速度. 一个比较好的记住使用

scan

中传递每一个共享变量的方法是使用

strict

标志. 当我们把它设置为

True

的时候,

scan

会检查在

fn

中所有必要的共享变量是否被传显示传递给

fn

,这必须由用户保证, 否则报错

然后我们又可以写一次

Gibbs

采样, 设置

strict=True

def OneStep(vsample) :
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)

#设置strict=True

values, updates = theano.scan(OneStep,
outputs_info=sample,
n_steps=10,
strict=True)#没有传递共享变量,会报错

↑↑↑↑↑↑上面这个写法会报错, 因为缺少共享变量的传递信息，错误信息如下：

Traceback (most recent call last):
...
MissingInputError: An input of the graph, used to compute
DimShuffle{1,0}(<TensorType(float64, matrix)>), was not provided and
not given a value.Use the Theano flag exception_verbosity='high',for
more information on this error.

加入

non_sequences

参数就对了

def OneStep(vsample) :
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)

#设置strict=True

values, updates = theano.scan(OneStep,
sequences=None,
outputs_info=sample,
non_sequences=[W,bvis,bhid],
n_steps=10,
strict=True)

6.Scan的条件结束

让

Scan

结束循环, 我们可以使用除了上面指定迭代次数

n_steps

以外, 还能用条件去提前结束循环, 类似于

while(condition)

, 比如我们计算指数, 如果它大于设置的

max_value

阈值就停止

“`python

def power_of_2(previous_power,max_value):

return previous_power*2,theano.scan_module.until(previous_power*2>max_value)

max_value=T.dscalar()

values,_ = theano.scan(power_of_2,

sequences=None,

outputs_info=T.constant(1.),

non_sequences=max_value,

n_steps=1024)

f=theano.function([max_value],values)

print f(45)

#[ 2. 4. 8. 16. 32. 64.]

“`

注意, 这个

theano.scan()

中迭代会在

outputs_info

的基础上继续迭代, 所以运行结果是1∗2∗2∗2⋯∗2

可以发现为了提前终止循环, 在函数内部进行了条件控制, 而使用的参数被包含在类

theano.scan_module.until

中

7.多输出, 多时间拍-RNN

上面都是简单的

scan

实例, 然而

scan

不仅支持先验结果和当前序列值, 还能够向后看不止一步. 比如我们设计RNN的时候, 假设RNN的定义如下:

【注】这个网络与经典RNN相去甚远, 可能没什么用,主要是为了清除阐述

scan

的向后看特点, 我们后续会跟进RNN的实现

这个例子中,我们有一个序列, 需要迭代

和两个输出

x,y

,计算一步迭代:

#RNN
def oneStep(u_tm4,u_t,x_tm3,x_tm1,y_tm1,W,W_in_1,W_in_2,W_feedback,W_out):
x_t=T.tanh(theano.dot(x_tm1,W)+\
theano.dot(u_t,   W_in_1) + \
theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback))
y_t = theano.dot(x_tm3, W_out)
return [x_t,y_t]

之前我们介绍过

scan

中

sequences

和

outputs_info

中的一个参数叫

taps

,可以控制向后移动的结果长度, 这里为了获取各种时间的结果值, 就要用到它

W = T.matrix()
W_in_1 = T.matrix()
W_in_2 = T.matrix()
W_feedback = T.matrix()
W_out = T.matrix()

u = T.matrix()
x0 = T.matrix()
y0 = T.vector()

([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
sequences=dict(input=u, taps=[-4,-0]),
outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
strict=True)

现在

x_vals

和

y_vals

就是在

上迭代以后生成的指向序列x和y的符号变量, 其中

sequences_taps

和

outputs_taps

指出哪个切片是明确需要的. 注意如果我们想使用

x[t-k]

, 我们并非总需要

x[t-(k-1)],x[t-(k-2)],...

, 但是使用编译的函数时, 表示它的

numpy

阵列将会足够大去包含这个值. 假设我们编译了上述函数, 就会将

作为

uvals=[0,1,2,3,4,5,6,7,8]

给出, 而

scan

会将

uvals[0]

当做

u[-4]

,将会从

uvals[4]

向后遍历. 关于这个建议看官方文档的reference

暂时还没涉及到RNN的搭建, 不过要知道

scan

可以想后看好几步的结果, 使用的是

taps

即可, 后面到实例搭建的时候, 用到了自然就理解了

简单的实战实例

可能我写的稍微改动了一下官网的源码, 但是结果应该是对的, 可以对照看看添加了什么,方便掌握

theano

的各种参数的基本操作.

1.逐元素计算

tanh(W∗x+b)

#逐元素计算tanh(x(t).dot(W) + b)
#定义三个变量
X=T.matrix('X')
W=T.matrix('W')
b_sym=T.vector('b_sym')
#使用scan计算
results,updates=theano.scan(lambda v,W,b_sym: T.tanh(T.dot(v,W)+b_sym),
sequences=X,
outputs_info=None,
non_sequences=[W,b_sym])
compute_elementwise=theano.function(inputs=[X,W,b_sym],outputs=results)
#测试
x = np.eye(2, dtype=theano.config.floatX)
w = np.ones((2, 2), dtype=theano.config.floatX)
b = np.ones((2), dtype=theano.config.floatX)
b[1] = 2

print compute_elementwise(x, w, b)
#计算结果
print np.tanh(x.dot(w)+b)
'''
[[ 0.96402758  0.99505478]
[ 0.96402758  0.99505478]]
[[ 0.96402758  0.99505478]
[ 0.96402758  0.99505478]]
'''

2.计算序列,只涉及到一步结果

x(t)=tanh(W∗x(t−1)+U∗y(t)+V∗p(T−t))

注意这个式子中x(t−1)在实现的时候, 由于

scan

本身当前次迭代就是在上一次迭代的结果进行的, 所以不需要使用

taps=[-1]

取值, 后面的t和T−t分别表示按顺序取值和逆序取值

#计算序列 x(t) = tanh(x(t - 1).dot(W) + y(t).dot(U) + p(T - t).dot(V))
#定义参数
X = T.vector("X")
W = T.matrix("W")
U = T.matrix("U")
Y = T.matrix("Y")
V = T.matrix("V")
P = T.matrix("P")
#在scan中迭代
results,updates=theano.scan(lambda y,p,x_tm1: T.tanh( T.dot(x_tm1,W)+T.dot(y,U)+T.dot(p,V) ),
sequences=[Y,P[::-1]],
outputs_info=[X],
non_sequences=None)
#function编译
compute_seq=theano.function([X,W,Y,U,P,V],outputs=results)

#测试
x=np.zeros((2),dtype=theano.config.floatX)
x[1]=1
w=np.ones((2,2),dtype=theano.config.floatX)
y=np.ones((5,2),dtype=theano.config.floatX)
y[0,:]=-3
u=np.ones((2,2),dtype=theano.config.floatX)
p=np.ones((5,2),dtype=theano.config.floatX)
p[0,:]=3
v=np.ones((2,2),dtype=theano.config.floatX)
print (compute_seq(x,w,y,u,p,v))
#用numpy测试结果
x_res=np.zeros((5,2),dtype=theano.config.floatX)
x_res[0]=np.tanh(x.dot(w)+y[0].dot(u)+p[4].dot(v))
for i in range(1,5):
x_res[i]=np.tanh(x_res[i-1].dot(w)+y[i].dot(u)+p[4-i].dot(v))
print x_res
'''
[[-0.99505478 -0.99505478]
[ 0.96471971  0.96471971]
[ 0.99998587  0.99998587]
[ 0.99998772  0.99998772]
[ 1.          1.        ]]
[[-0.99505478 -0.99505478]
[ 0.96471971  0.96471971]
[ 0.99998587  0.99998587]
[ 0.99998772  0.99998772]
[ 1.          1.        ]]
'''

3.按行(列)计算X的范数

#按行计算
X=T.dmatrix('X')
results,updates=theano.scan(lambda x: T.sqrt((x**2).sum()),
sequences=[X],
outputs_info=None,
non_sequences=None)
computer_norm_lines=theano.function(inputs=[X],outputs=results)
#测试
x=np.diag(np.arange(1,6,dtype=theano.config.floatX),1)
print computer_norm_lines(x)
#[ 1.  2.  3.  4.  5.  0.]
#用numpy得出结果看看
print np.sqrt((x**2).sum(1))
#[ 1.  2.  3.  4.  5.  0.]

#按列计算
X=T.dmatrix('X')
results,updates=theano.scan(lambda x: T.sqrt((x**2).sum()),
sequences=[X.T],
outputs_info=None,
non_sequences=None)
computer_norm_lines=theano.function(inputs=[X],outputs=results)
#测试
x=np.diag(np.arange(1,6,dtype=theano.config.floatX),1)
print computer_norm_lines(x)
#[ 0.  1.  2.  3.  4.  5.]
#用numpy得出结果看看
print np.sqrt((x**2).sum(0))
#[ 0.  1.  2.  3.  4.  5.]

4. 计算矩阵的迹

其实就是矩阵主对角线元素和, 主要是要对行列都进行遍历, 从而取到每个元素值

floatX='float32'
X=T.matrix('X')
results,_=theano.scan(lambda i,j,traj: T.cast(X[i,j]+traj,floatX),
sequences=[T.arange(X.shape[0]),T.arange(X.shape[1])],
outputs_info=np.asarray(0.,dtype=floatX),
non_sequences=None)
results=results[-1]
compute_traj=theano.function(inputs=[X],outputs=results)

#测试
x=np.eye(5,dtype=theano.config.floatX)
x[0]=np.arange(5,dtype=theano.config.floatX)
print compute_traj(x)
#4.0
#用numpy计算结果
print np.diagonal(x).sum()
#4.0

5.计算序列,涉及到两步结果

x(t)=U∗x(t−2)+V∗x(t−1)+tanh(W∗x(t−1)+b)

这个例子就涉及到对

的前两步结果的提取了, 用

taps

, 建议再去刷一遍前面《参考手册》的关于

outputs_info

中设置

taps

后传递参数到

fn

那部分

U,V,W=T.matrices('U','V','W')
X=T.matrix('X')
b_sym=T.vector('b_sym')
n_sym=T.iscalar('n_sym')
#更新
results,_=theano.scan(lambda x_tm2, x_tm1: T.dot(x_tm2,U)+T.dot(x_tm1,V)+T.tanh(T.dot(x_tm1,W)+b_sym),
sequences=None,
outputs_info=[dict(initial=X,taps=[-2,-1])],
non_sequences=None,
n_steps=n_sym)
compute_seq2=theano.function(inputs=[X,U,V,W,b_sym,n_sym],outputs=results)
#测试
x = np.zeros((2, 2), dtype=theano.config.floatX) # the initial value must be able to return x[-2]
x[1, 1] = 1
w = 0.5 * np.ones((2, 2), dtype=theano.config.floatX)
u = 0.5 * (np.ones((2, 2), dtype=theano.config.floatX) - np.eye(2, dtype=theano.config.floatX))
v = 0.5 * np.ones((2, 2), dtype=theano.config.floatX)
n = 10
b = np.ones((2), dtype=theano.config.floatX)

print(compute_seq2(x, u, v, w, b, n))
'''
[[  1.40514827   1.40514827]
[  2.88898897   2.38898897]
[  4.34018326   4.34018326]
[  6.53463173   6.78463173]
[  9.82972336   9.82972336]
[ 14.22203922  14.09703922]
[ 20.07440186  20.07440186]
[ 28.12292099  28.18542099]
[ 39.19137192  39.19137192]
[ 54.28408051  54.25283051]]
'''

6.计算雅可比式

y=tanh(A∗x)∂y∂x=?

import theano
import theano.tensor as T
import numpy as np

# 定义参数
v = T.vector()
A = T.matrix()
y = T.tanh(T.dot(v, A))
#利用grad计算一阶导
results, updates = theano.scan(lambda i: T.grad(y[i], v), sequences=[T.arange(y.shape[0])])
compute_jac_t = theano.function([A, v], results) # shape (d_out, d_in)

# 测试
x = np.eye(5, dtype=theano.config.floatX)[0]
w = np.eye(5, 3, dtype=theano.config.floatX)
w[2] = np.ones((3), dtype=theano.config.floatX)
print(compute_jac_t(w, x))

# 与numpy结果对比
print(((1 - np.tanh(x.dot(w)) ** 2) * w).T)
'''
[[ 0.4199743  0.         0.4199743  0.         0.       ]
[ 0.         1.         1.         0.         0.       ]
[ 0.         0.         1.         0.         0.       ]]
[[ 0.41997433  0.          0.41997433  0.          0.        ]
[ 0.          1.          1.          0.          0.        ]
[ 0.          0.          1.          0.          0.        ]]
'''

7. 在循环时做累加

主要注意使用共享变量, 直接在

function

中用

scan

返回的

updates

更新共享变量即可

#在循环过程中累加
k=theano.shared(0)
n_sym=T.iscalar('n_sym')
results,updates=theano.scan(lambda: {k:(k+1)},
sequences=None,
outputs_info=None,
non_sequences=None,
n_steps=n_sym)
accumulator=theano.function(inputs=[n_sym],updates=updates)

print k.get_value()#0
accumulator(5)
print k.get_value()#5

8.乘以二项分布

tanh(W∗v+b)∗d,where.d∈binomial

#定义变量
W=T.matrix('W')
V=T.matrix('V')
b_sym=T.vector('b_sym')
#定义一个二项分布
trng=T.shared_randomstreams.RandomStreams(1234)
d=trng.binomial(size=W[1].shape)

#定义乘法操作
results,updates=theano.scan(lambda v: T.tanh(T.dot(v,W)+b_sym)*d,
sequences=V,
outputs_info=None,
non_sequences=None)
#放到function中编译
compute_with_bnoise=theano.function(inputs=[V,W,b_sym],outputs=results,updates=updates)
#测试一下
x = np.eye(10, 2, dtype=theano.config.floatX)
w = np.ones((2, 2), dtype=theano.config.floatX)
b = np.ones((2), dtype=theano.config.floatX)

print(compute_with_bnoise(x, w, b))
'''
[[ 0.96402758  0.        ]
[ 0.          0.96402758]
[ 0.          0.        ]
[ 0.76159418  0.76159418]
[ 0.76159418  0.        ]
[ 0.          0.76159418]
[ 0.          0.76159418]
[ 0.          0.76159418]
[ 0.          0.        ]
[ 0.76159418  0.76159418]]
'''

9.计算幂

Ak=?

分析:可以利用每上一次的结果继续计算下一次的结果

k=T.iscalar('k')
A=T.vector('A')
#上一次结果乘以底数
def inner_fct(prior_result,B):
return prior_result*B
#使用scan循环获取结果
results,updates=theano.scan(inner_fct,
sequences=None,
outputs_info=T.ones_like(A),
non_sequences=A,
n_steps=k)
#用function编译
final_result=results[-1]
# power=theano.function(inputs=[A,k],outputs=final_result,updates=updates)
#不用updates也行，貌似final_result已经包含更新方法了
power=theano.function(inputs=[A,k],outputs=final_result)
#测试
print power(range(10),2)
#[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]

10.计算多项式

f=c[1]∗x0+c[2]∗x1+c[3]∗x2

参考上面的实例2,这里贴一遍自己写的那个代码, 累加和写到输出先验

results

变量中, 存储在

scan()

函数的

outputs_info

参数中

#尝试自己用累加和写一遍
results=np.array([0],dtype='int32')
c=T.vector('c',dtype='int32')
x=T.scalar('x',dtype='int32')
components, updates=theano.scan(lambda i,results,c,x: results+c[i]*(x**i),
sequences=T.arange(c.shape[0],dtype='int32'),
outputs_info= results,
non_sequences=[c,x])
final_res=components[-1]
cal_poly=theano.function(inputs=[c,x],outputs=final_res)
test_c=np.asarray([1,0,2],dtype=np.int32)
test_value=3
print cal_poly(test_c,test_value)
#19

官方代码:

coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")
max_coefficients_supported = 10000

# Generate the components of the polynomial
full_range=theano.tensor.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
outputs_info=None,
sequences=[coefficients, full_range],
non_sequences=x)

polynomial = components.sum()
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial)

test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print(calculate_polynomial(test_coeff, 3))

突然发现官方文档最后的Exercise也是要求改编这个写法, 这里把练习题的写法也贴过来, 它的通用性更强, 因为我上面的

power

幂刚好就是迭代次数, 而习题的代码是提出来这一项的

X=T.scalar('X')
coefficients=T.vector('coefficients')
max_coefficients=10000
full_range=T.arange(max_coefficients)
out_info=T.as_tensor_variable(np.asarray(0,'float64'))
components,updates=theano.scan(lambda coeff,power,prior_val,free_var:
prior_val+(coeff*(free_var**power)),
sequences=[coefficients,full_range],
outputs_info=out_info,
non_sequences=X)
ploynomial=components[-1]
calculate_polynomial=theano.function(inputs=[coefficients,X],outputs=ploynomial)
test_coeff = np.asarray([1, 0, 2], dtype=np.float32)
print(calculate_polynomial(test_coeff, 3))
#19.0

【注】突然发现很多

funtion

中都不需要把

updates

添加进去都可以计算出正确结果, 难道原因是

results

与

updates

是存在

dict

中, 传递

results

给

function

的输出的同时也已经把其更新规则传递进去了？好像这样理解也没什么不对, 毕竟前面我们发现

function

的输出可以是表达式, 也可以是表达式返回值

code:链接: https://pan.baidu.com/s/1o8wVGjo 密码: 59pg

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航