您的位置:首页 > 产品设计 > UI/UE

WEEK3-Quick guide to linear regression

2014-09-29 19:16 357 查看
Explore Statistics with R (EDX)

WEEK3-Quick guide to linear regression 笔记

一、引子

Imagine you have measured two variables for each subject in a study.

1. Variables are on interval scale or ratio scale

2. Variables are at least roughly normally distributed

You suspect that the variables may be associated. Maybe a linear model would fit:

y = a + bx + error.

Here we will not go into any depth describing the theory of linear regression. Use the example code below to learn how to add a linear regression line to a scatter plot. If you already know something about linear regression you will find some additional useful
lines of code below.

(已经测量了两个变量,它们满足:1. 在区间尺或比例尺上,2. 大致正态分布;它们可能是线性关系哦~)

二、例子

#Create some data

set.seed(278)
x <- rnorm(25, mean=100, sd=10)
y <- 2 * x + 20 + rnorm(25, mean=10, sd=4)

plot(x,y) #Do you think a linear model would fit?



相关系数R和决定系数R^2

> cor(x,y) #If you just want the correlation coefficient
[1] 0.9548738
> cor(x,y)^2 #Or the coefficient of determination
[1] 0.911784

> lm.obj <- lm(y~x) # See how models are described in R. y depends on x
> abline(lm.obj) #We can add the regression line to the scatterplot
> predict(lm.obj) #The predicted y-values for your x-values
1 2 3 4 5 6 7 8
226.2604 209.0715 239.2173 210.2221 219.6353 211.7828 202.1819 238.3487
9 10 11 12 13 14 15 16
236.4330 215.9328 243.1476 228.5103 214.0388 241.6755 232.5671 206.4833
17 18 19 20 21 22 23 24
216.5090 212.1111 234.9200 234.7542 239.8300 206.8384 248.6047 215.7893
25
221.9042
> points(x,predict(lm.obj), col="green") #Add predicted values to the graph
> summary(lm.obj) #Lets look at the content of lm.obj

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-6.7166 -2.1476 -0.5456 2.2163 9.9858

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 48.9354 11.4055 4.29 0.000273 ***
x 1.7973 0.1166 15.42 1.28e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.375 on 23 degrees of freedom
Multiple R-squared: 0.9118, Adjusted R-squared: 0.9079
F-statistic: 237.7 on 1 and 23 DF, p-value: 1.284e-13

> str(lm.obj) #You can retrieve parts of the lm object
List of 12
$ coefficients : Named num [1:2] 48.9 1.8
..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
$ residuals : Named num [1:25] -1.659 3.734 0.911 -4.294 1.325 ...
..- attr(*, "names")= chr [1:25] "1" "2" "3" "4" ...
$ effects : Named num [1:25] -1121.35 67.46 1.89 -4.71 1.36 ...
..- attr(*, "names")= chr [1:25] "(Intercept)" "x" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:25] 226 209 239 210 220 ...
..- attr(*, "names")= chr [1:25] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:25, 1:2] -5 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:25] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "x"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.2 1.23
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 23
$ xlevels : Named list()
$ call : language lm(formula = y ~ x)
$ terms :Classes 'terms', 'formula' length 3 y ~ x
.. ..- attr(*, "variables")= language list(y, x)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "y" "x"
.. .. .. ..$ : chr "x"
.. ..- attr(*, "term.labels")= chr "x"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(y, x)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "y" "x"
$ model :'data.frame': 25 obs. of 2 variables:
..$ y: num [1:25] 225 213 240 206 221 ...
..$ x: num [1:25] 98.7 89.1 105.9 89.7 95 ...
..- attr(*, "terms")=Classes 'terms', 'formula' length 3 y ~ x
.. .. ..- attr(*, "variables")= language list(y, x)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "y" "x"
.. .. .. .. ..$ : chr "x"
.. .. ..- attr(*, "term.labels")= chr "x"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(y, x)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "y" "x"
- attr(*, "class")= chr "lm"
> lm.obj$coefficients
(Intercept) x
48.935353 1.797343

par(mfrow=c(2,2)) #prepare for a 2x2 layout
plot(lm.obj) #The built in controls for your regression analysis
par(mfrow=c(1,1)) #Restore 1x1 layout

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  R linear regression