您的位置:首页 > 移动开发

Applied Nonparametric Statistics-lec10

2017-06-23 11:31 267 查看
Ref:https://onlinecourses.science.psu.edu/stat464/print/book/export/html/14

估计CDF

The Empirical CDF



绘制empirical cdf的图像:

x = c(4, 0, 3, 2, 2)
plot.ecdf(x)




Kolmogorov-Smirnov test

testing the "sameness" of two independent samples from a continuous distribution

大的p-value可以说明不同,但小的p-value不能说明相同

样本数量较小时,p-value可能偏大

> x = c(4, 0, 3, 2, 2)
> plot.ecdf(x)
> plot(ecdf(x))
> ecdf(x)
Empirical CDF
Call: ecdf(x)
x[1:4] =      0,      2,      3,      4
> ks.test(x, y="pnorm", mean(x), sd(x))

One-sample Kolmogorov-Smirnov test

data:  x
D = 0.24637, p-value = 0.9219
alternative hypothesis: two-sided

Warning message:
In ks.test(x, y = "pnorm", mean(x), sd(x)) :
Kolmogorov - Smirnov检验里不应该有连结


Ps:

在R中,与正态分布相关的有四个函数。dnorm是pdf,pnorm是cdf,qnorm是the inverse cumulative density function (quantiles)

rnorm是randomly generated numbers

关于qnorm,它给定一个概率,返回cdf对应的值。如果使用标准正态分布的,那么给定一个概率,返回的就是Z-score

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Density Estimation  

> x
[1] 4 0 3 2 2
> density(x)

Call:
density.default(x = x)

Data: x (5 obs.);       Bandwidth 'bw' = 0.4868

x                 y
Min.   :-1.4604   Min.   :0.001837
1st Qu.: 0.2698   1st Qu.:0.059033
Median : 2.0000   Median :0.141129
Mean   : 2.0000   Mean   :0.144277
3rd Qu.: 3.7302   3rd Qu.:0.205314
Max.   : 5.4604   Max.   :0.351014
> plot(density(x))








如果在density(x)里面加上bandwidth参数,那么图片会发生变化,如上图所示。

  

  
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: