您的位置：首页 > 其它

时间序列 R 10 其他进阶预测方法 Advanced forecasting methods

2016-05-02 15:32 831 查看

1 Dynamic regression models 动态回归模型

前面的内容中要么只考虑了时间，要么只考虑了其他自变量的影响，这一节将考虑各个变量和时间的综合影响。

1.1 regression models+ ARIMA models

首先我们简单的将回归和Arima组合，做一个简单的动态回归模型。

其组合的方法和实质就是将回归模型中的误差项变为时间序列的ARIMA，也可以理解为下式：

yt=回归+ARIMA+ety_t=\textstyle {回归+ARIMA+e_t}

当ARIMA为（1，1，1）时可以写成下式

yt=β0+β1x1,t+⋯+βkxk,t+nt,(1−ϕ1B)(1−B)nt=(1+θ1B)et,\begin{align*} y_t &= \beta_0 + \beta_1 x_{1,t} +
\dots + \beta_k x_{k,t} + n_t,\\ & (1-\phi_1B)(1-B)n_t =
(1+\theta_1B)e_t, \end{align*}

另外

如果先将原始的数据（包括y和x）都进行差分，直到稳定，这时就可以看做回归+ARMA了。（为了保持y和x的关系不变，他们要差分相同的阶数）

1.1.1 模型估计

在将两种模型结合之后，参数依然可以用最小二乘法或者最大似然估计来计算建立模型（注意计算的是et不是nt）

1.1.2 手动建模步骤

一般的手动建模的方法为：

1. 数据预处理，包括log转换等

2. 检查数据稳定性，不稳定的需要差分值稳定。首先假设模型为regression+AR(2),如果数据是季节性模型则先假设为regression+ARIMA(2,0,0)(1,0,0)m。

3. 利用上面的假设计算出regression的参数后，计算ntn_t,然后根据ntn_t，估计一个合适的ARMA模型

4. 利用新估计的ARMA，重复计算regression模型

5. 检查ete_t是否是白噪声

可以利用AICc来评价模型，选择得分最低的一个。

1.1.3 使用R

可以用Arima函数来计算如下栗子：（ xreg这个参数是加入regression）

(fit2 <- Arima(usconsumption[,1], xreg=usconsumption[,2],
order=c(1,0,2)))
Series: usconsumption[, 1]
ARIMA(1,0,2) with non-zero mean

Coefficients:
ar1      ma1     ma2  intercept  usconsumption[, 2]
0.6516  -0.5440  0.2187     0.5750              0.2420
s.e.  0.1468   0.1576  0.0790     0.0951              0.0513

sigma^2 estimated as 0.3396:  log likelihood=-144.27
AIC=300.54   AICc=301.08   BIC=319.14

也可使用auto.arima（）

fit <- auto.arima(usconsumption[,1], xreg=usconsumption[,2])

1.1.4 随机趋势和确定性趋势

确定性趋势就是在没有差分的时候回归中有斜率，模型如下：

yt=β0+β1t+nt,y_t = \beta_0 + \beta_1 t + n_t,

其中nt是ARMA模型

随机趋势是在一阶差分之后有趋势

yt=yt−1+β1+n′t.y_t = y_{t-1} + \beta_1 + n_t'.

其中nt是ARMA

这里类似于有漂移的random walk

http://blog.csdn.net/bea_tree/article/details/51223501#t8

栗子，计算旅游人数：

有确定性趋势：

> auto.arima(austa,d=0,xreg=1:length(austa))
ARIMA(2,0,0) with non-zero mean

Coefficients:
ar1      ar2  intercept  1:length(austa)
1.0371  -0.3379     0.4173           0.1715
s.e.  0.1675   0.1797     0.1866           0.0102

sigma^2 estimated as 0.02486:  log likelihood=12.7

结果如下：

ytntet=0.42+0.17t+nt=1.04nt−1−0.34nt−2+et∼NID(0,0.025).\begin{align*} y_t &= 0.42 + 0.17 t +
n_t \\ n_t &= 1.04 n_{t-1} - 0.34 n_{t-2} + e_t\\ e_t &\sim
\text{NID}(0,0.025). \end{align*}

随机趋势：

> auto.arima(austa,d=1)
Series: austa
ARIMA(0,1,0) with drift

Coefficients:
drift
0.154
s.e.  0.033

sigma^2 estimated as 0.0324:  log likelihood=9.07
AIC=-14.14   AICc=-13.69   BIC=-11.34

结果如下：

ytntet=y0+0.15t+nt=nt−1+et∼NID(0,0.032).\begin{align*} y_t &= y_0 + 0.15 t + n_t \\ n_t &=
n_{t-1} + e_{t}\\ e_t &\sim \text{NID}(0,0.032). \end{align*}

预测结果：

fit1 <- Arima(austa, order=c(0,1,0), include.drift=TRUE)
fit2 <- Arima(austa, order=c(2,0,0), include.drift=TRUE)
par(mfrow=c(2,1))
plot(forecast(fit2), main="Forecasts from linear trend + AR(2) error",
ylim=c(1,8))
plot(forecast(fit1), ylim=c(1,8))

可以看出两者的预测区间不同

这说明确定性趋势的趋势不随时间改变.而随机趋势的趋势是根据历史数据来估计的, 随机趋势的预测区间较大，如果预测长时间的数据，用随机趋势比较好。

1.2 滞后模型

1.2.1 模型

有些模型中自变量并不是马上就作用于因变量，比如可能收入提高之后不会立即买东西而是在收入提高之后的一个月之后买东西，因此可以在原本的动态规划基础上加入之后因子，下式是只有一个因变量的时候的表达式

yt=β0+γ0xt+γ1xt−1+⋯+γkxt−k+nt,y_t = \beta_0 + \gamma_0x_ t + \gamma_1 x_{t-1} + \dots + \gamma_k x_{t-k} + n_t,

这里ntn_t是Arima模型

之后阶数k的模型可以同计算Arima的p,q的时候一起用AIC计算。

1.2.2 栗子

探究广告和报价之间的关系

计算如下

plot(insurance, main="Insurance advertising and quotations", xlab="Year")

# Lagged predictors. Test 0, 1, 2 or 3 lags.
Advert <- cbind(insurance[,2],
c(NA,insurance[1:39,2]),
c(NA,NA,insurance[1:38,2]),
c(NA,NA,NA,insurance[1:37,2]))
colnames(Advert) <- paste("AdLag",0:3,sep="")

# Choose optimal lag length for advertising based on AIC
# Restrict data so models use same fitting period
fit1 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1], d=0)
fit2 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:2], d=0)
fit3 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:3], d=0)
fit4 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:4], d=0)

# Best model fitted to all data (based on AICc)
# Refit using all data
fit <- auto.arima(insurance[,1], xreg=Advert[,1:2], d=0)
>  fit
ARIMA(3,0,0) with non-zero mean

Coefficients:
ar1     ar2    ar3  intercept  AdLag0  AdLag1
1.412  -0.932  0.359      2.039   1.256   0.162
s.e.  0.170   0.255  0.159      0.993   0.067   0.059

sigma^2 estimated as 0.189:  log likelihood=-23.89
AIC=61.78   AICc=65.28   BIC=73.6

yt=β0+γ0xt+γ1xt−1+nt,nt=ϕ1nt−1+ϕ2nt−2+ϕ3nt−3+ety_t = \beta_0 + \gamma_0x_ t + \gamma_1 x_{t-1} + n_t,\\
n_ t = \phi _1 n_{t-1} + \phi _2 n_{t-2} + \phi _3 n_{t-3} + e_ t

预测：

fc8 <- forecast(fit, xreg=cbind(rep(8,20),c(Advert[40,1],rep(8,19))), h=20)
plot(fc8, main="Forecast quotes with advertising set to 8", ylab="Quotes")

2 向量自回归 Vector autoregressions

之前都只考虑了自变量对因变量的单边影响关系，这里我们要讨论下因变量与自变量间相互影响的关系，也就是这节中的 Vector autoregressions，VAR

2.1 模型

在AR中我们是探究的一个量与他之前1……k lag的关系

yt=c+ϕ1yt−1+ϕ2yt−2+⋯+ϕpyt−p+et,y_{t} = c + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + \dots + \phi_{p}y_{t-p} + e_{t},

现在是考虑多个变量间的关系，下式是考虑了两个y1和y2，1 阶lag（称为VAR（1））

y1,ty2,t=c1+ϕ11,1y1,t−1+ϕ12,1y2,t−1+e1,t=c2+ϕ21,1y1,t−1+ϕ22,1y2,t−1+e2,t\begin{align*}
y_{1,t} &= c_1+\phi_{11,1}y_{1,t-1}+\phi_{12,1}y_{2,t-1}+e_{1,t} \\
y_{2,t} &= c_2+\phi_{21,1}y_{1,t-1}+\phi_{22,1}y_{2,t-1}+e_{2,t}
\end{align*}

这里讲两个变量的公式联立，建立了彼此之间的联系。

在解方程的时候，上式同AR一样必须是稳定的，如果是不稳定的需要先进行差分稳定之后再解方程。

假设有k个变量（上式中有2个），p阶lag，（上式只有1lag）每一个方程有有1+k*p个系数，总共有k（1+kp）个系数。所以lag越小越容易计算。

对于p的选取有多中方法：

AIC, HQ, SC and FPE

其中SC又叫做BIC，可以产生较小的p

而AIC倾向于产生较大的p

VAR适用场景

1. 预测的变量相互影响，不要明确的解释各参数的含义;

2. 测试变量是否对预测有用（这是Granger 因果测试的基础）

3. 当变量突然变化时，分析它的响应作用

4. 误差方差分解（forecast error variance decomposition，where the proportion of the forecast variance of one variable is attributed to the effect of other variables.）

另外，还有一种情况本书没有做详细介绍：经济变量的本身是非平稳序列，但是，它们的线性组合却有可能是平稳序列。这时候需要建立VEC模型（vector error correction model），这里不做介绍

2.2 R 使用

有vars工具箱

下列代码中可以看出，AIC FPE指标推荐5 lag ；hq、sc推荐1 lag

使用 Portmanteau test测试误差的不相关，文中说var(1)var(2)都没有通过测试，var(3)通过所以使用var（3）

library(vars)
> VARselect(usconsumption, lag.max=8, type="const")$selection
AIC(n)  HQ(n)  SC(n) FPE(n)
5      1      1      5
> var <- VAR(usconsumption, p=3, type="const")
> serial.test(var, lags.pt=10, type="PT.asymptotic")
Portmanteau Test (asymptotic)
data:  Residuals of VAR object var
Chi-squared = 33.3837, df = 28, p-value = 0.2219

> summary(var)
VAR Estimation Results:
=========================
Endogenous variables: consumption, income
Deterministic variables: const
Sample size: 161

Estimation results for equation consumption:
============================================
Estimate Std. Error t value Pr(>|t|)
consumption.l1  0.22280    0.08580   2.597 0.010326 *
income.l1       0.04037    0.06230   0.648 0.518003
consumption.l2  0.20142    0.09000   2.238 0.026650 *
income.l2      -0.09830    0.06411  -1.533 0.127267
consumption.l3  0.23512    0.08824   2.665 0.008530 **
income.l3      -0.02416    0.06139  -0.394 0.694427
const           0.31972    0.09119   3.506 0.000596 ***

Estimation results for equation income:
=======================================
Estimate Std. Error t value Pr(>|t|)
consumption.l1  0.48705    0.11637   4.186 4.77e-05 ***
income.l1      -0.24881    0.08450  -2.945 0.003736 **
consumption.l2  0.03222    0.12206   0.264 0.792135
income.l2      -0.11112    0.08695  -1.278 0.203170
consumption.l3  0.40297    0.11967   3.367 0.000959 ***
income.l3      -0.09150    0.08326  -1.099 0.273484
const           0.36280    0.12368   2.933 0.003865 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix of residuals:
consumption income
consumption      1.0000 0.3639
income           0.3639 1.0000
-----------------------------------
-----------------------------------
fcst <- forecast(var)
plot(fcst, xlab="Year")

1.3 神经网络 Neural network models

关于啥是神经网络看这里：

/article/9559533.html

这节要用的就是使用神经网络与ARIMA结合

这里以一层前馈神经网路为例：

NNAR(p,k)代表有p阶lag，k个神经元的神经网络，

NNAR(p,0)相当于ARIMA(p,0,0)

对于季节性数据：

形式为NNAR(p,P,K)mNNAR(p,P,K)_m 以为输入的是(yt−1,yt−2,…,yt−p,yt−m,yt−2m,yt−Pm)(y_{t-1},y_{t-2},\dots ,y_{t-p},y_{t-m},y_{t-2m},y_{t-Pm})并且有k个隐藏神经单元。

NNAR(p,P,0)mNNAR(p,P,0)_m 与ARIMA(p,0,0)（P,0,0）mARIMA(p,0,0)（P,0,0）_m 相同

函数为

nnetar()

。若p and P未指定, 会被自动选择. 对于非季节性数据会根据AIC计算的线性AR（p）设置默认值，季节性数据P=1，p 根据线性设置. 如果k未指定则 k=(p+P+1)/2（选择其最近的整数）.

4 Forecasting hierarchical or grouped time series

有些物品是可以分类的，比如自行车可以分为山地车，公路车等等,有时候我们需要各个类别的预测值，本文将利用这种层次结构进行预测

下图是两层时间序列

含义为total分了两类a b a由分了三类 abc b由分了两类 a b

定义总的系列（series）数为：

n=1+n1+⋯+nKn=1+n_1+\dots +n_ K他一共有8个系列（1+2+5）

,

用矩阵表达为

简记为

yt=SyK,t,{y}_ t={S}{y}_{K,t},

其中S是求和的意思。

我们要预测时total这一series，方法是用以前学过的方法来预测各个series的数据，然后用他们来估计最后的预测值。

4.1 自底向上

用最底层的来预测各个层的值，以图9.12为例

首先预测 AA AB AC的值相加得到A的值

预测ＢＡ　ＢＢ相加得到Ｂ的值

然后A+B得到total的值

用矩阵的形式表达为

y~h=Sy^K,h.\tilde{{y} }_{h}={S} \hat{{y} }_{K,h}.

其优点是不会漏掉数据，但是底层可能有较多的噪声，很难精确预测各个上层的数值。

4.2 自上至下

先预测最上层的，然后根据比例还计算最下层的，然后利用上面的矩阵公式来计算各个层的，比例计算方法

Average historical proportions

pj=1T∑t=1Tyj,tytp_j = \frac{1}{T}\sum_{t=1}^T \frac{y_{j,t}}{y_t}

yj,ty_{j,t}代表底层的第j个节点，pjp_j代表它所占有的比例

Proportions of the historical averages

pj=∑t=1Tyj,tT/∑t=1TytTp_ j={\sum_{t=1}^{T}\frac{y_{j,t}}{T}}/{\sum_{t=1}^{T}\frac{y_ t}{T}}

Forecasted proportions

以上的比例都不能表示出随着比例随着时间的变化关系

这里利用预测值的比例来进行计算。比例由下式得到

pj=∏ℓ=0K−1y^(ℓ)j,hS^(ℓ+1)j,hp_ j=\prod^{K-1}_{\ell =0}\frac{\hat{y}_{j,h}^{(\ell )}}{\hat{S}_{j,h}^{(\ell +1)}}

以图9.12 解释之

记

先从左边

y~A,h=(y^A,hS^(1)A,h)y~h=(y^(1)AA,hS^(2)AA,h)y~h\tilde{y}_{\text{A},h} = (\frac{\hat{y}_{\text{A},h}}{\hat{S}_{\text{A},h}^{(1)}}) \tilde{y}_{h} = (\frac{\hat{y}_{\text{AA},h}^{(1)}}{\hat{S}_{\text{AA},h}^{(2)}}) \tilde{y}_{h}

然后

y~AA,h=(y^AA,hS^(1)AA,h)y~A,h=(y^AA,hS^(1)AA,h)(y^(1)AA,hS^(2)AA,h)y~h.\tilde{y}_{\text{AA},h} = (\frac{\hat{y}_{\text{AA},h}}{\hat{S}_{\text{AA},h}^{(1)}}) \tilde{y}_{\text{A},h} =(\frac{\hat{y}_{\text{AA},h}}{\hat{S}_{\text{AA},h}^{(1)}}) (\frac{\hat{y}_{\text{AA},h}^{(1)}}{\hat{S}_{\text{AA},h}^{(2)}})\tilde{y}_{h}.

得到

p1=(y^AA,hS^(1)AA,h)(y^(1)AA,hS^(2)AA,h)p_1=(\frac{\hat{y}_{\text {AA},h}}{\hat{S}_{\text {AA},h}^{(1)}}) (\frac{\hat{y}_{\text {AA},h}^{(1)}}{\hat{S}_{\text {AA},h}^{(2)}})

4.3 Middle-out approach

从中间选节预测节点，该点以上level（层）的用自下至上，该层以下的用自上至下

4.4 The optimal combination approach

y^h=Sβh+εh\hat{{y} }_ h={S} {\beta }_{h}+{\varepsilon }_{h}

4.5 R The hts package

hts函数需要输入最底层的数据和结构信息

以9.12的结构为例：

# bts is a time series matrix containing the bottom level series
# The first three series belong to one group, and the last two series
# belong to a different group
y <- hts(bts, nodes=list(2, c(3,2)))

栗子

这是澳大利亚的旅游业的层次结构，中间层是州的名字。

使用hts，方法默认为The optimal combination approach

library(hts)
y <- hts(vn, nodes=list(4,c(2,2,2,2)))
allf <- forecast(y, h=8)
plot(allf)

各个系列的预测结果如下：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航