您的位置:首页 > 其它

小白学习R语言——回归分析实例之男女身高体重

2016-03-08 14:13 393 查看
R读取数据

1,read .csv

data<-read.csv("E:\\necessary\\huba\\R\\table.csv")

默认header=F

2,read .txt

data<-read.table("E:\\necessary\\huba\\R\\table.txt")

note:如果.txt里面含有中文,需要补充encoding = "UTF-8"

实例分析四步走(数据输入->研究分布->样本检测-〉回归预测)

1,数据输入

> c1<-read.table("E:\\necessary\\huba\\R\\table.txt",col.names=c("name","sex","age","height","weight"),row.names = "name")

2,研究身高分布。

a,身高分布

b,身高与其他的关系

> pairs(cbind(height,weight,age))

从上图不能明确的看出各自的关系,故逐个分析

> oldpar=par(mfcol=c(1,3))

There were 11 warnings (use warnings() to see them)

> boxplot(weight~sex,ylab="weight")

> boxplot(height~sex,ylab="height")

> boxplot(age~sex,ylab="age")

> par(oldpar)

从上图可知,男女身高有明显差异。

3,样本检验

> attach(c1)

> t.test(height~sex,conf.level=0.99)

Welch Two Sample t-test

data: height by sex

t = -0.79241, df = 2.1575, p-value =

0.5059

alternative hypothesis: true difference in means is not equal to 0

99 percent confidence interval:

-90.68703 75.68703

sample estimates:

mean in group F mean in group M

160.0 167.5

p>0.01(检验水平),故男女身高差异显著。

4,回归分析

> lm.fit1=lm(weight~height,data=c1)

> lm.fit1

Call:

lm(formula = weight ~ height, data = c1)

Coefficients:

(Intercept) height

-85.3553 0.8868

拟合模型方程为weight=-85.3553+0.8868height

> summary(lm.fit1)

Call:

lm(formula = weight ~ height, data = c1)

Residuals:

王 李 张 陈 赵

2.3289 -1.5395 -5.4079 5.1579 -0.5395

Coefficients:

Estimate Std. Error t value

(Intercept) -85.3553 38.6565 -2.208

height 0.8868 0.2368 3.745

Pr(>|t|)

(Intercept) 0.1143

height 0.0332 *

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1

‘ ’ 1

Residual standard error: 4.616 on 3 degrees of freedom

Multiple R-squared: 0.8238, Adjusted R-squared: 0.765

F-statistic: 14.02 on 1 and 3 DF, p-value: 0.03323

复相关系数平方为0.8238,p=0.03323,模型是显著的。

#观察回归效果

> oldpar=par(mfrow=c(2,2),mar=c(2.5,2,1.5,0.2),mgp=c(1.2,0.2,0))

> plot(lm.fit1)

#加入其他变量能否改善模型预报能力

> add1(lm.fit1,~.+age+sex)

Single term additions

Model:

weight ~ height

Df Sum of Sq RSS AIC

<none> 63.934 16.742

age 1 11.624 52.311 17.739

sex 1 13.268 50.667 17.579

AIC越小越好。

预测

> predict(lm.fit1)

王 李 张 陈 赵

47.67105 56.53947 65.40789 69.84211 56.53947

> plot(weight)

> plot(lm.fit1,col="blue",pch=8)

也可加入新数据预测

> new.data=data.frame(height=c(150,160),sex=c("M","F"))

> predict(lm.fit1,new.data)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: