您的位置:首页 > 其它

利用R语言分析挖掘Titanic数据集(二)

2017-05-27 10:29 459 查看

6.视别与可视化技术

1)执行数据的探索与可视化技术

>barplot(table(train.data$Survived),main="passenger survival",names = c("perished","survived"))




2)绘制乘客舱位等级分布图

>barplot(table(train.data$Pclass),main = "passenger class",names = c("first","second","third"))




3)用条形图展示性别信息

>barplot(table(train.data$Sex),main = "passenger gender",names = c("F","M"))




4)使用hist绘制不同年龄乘客数目的直方图

>hist(train.data$Age,main = "passager age",xlab = "Age")




5)绘制乘客同船的兄弟姐妹或者配偶数目的条形图

> barplot(table(train.data$SibSp),main = "passenger SibSp")




6)绘制父母子女同乘船的信息

> barplot(table(train.data$Parch),main = "passenger parch")




7)绘制乘客票价直方图

>hist(train.data$Fare,main = "passenger parch",xlab = "Fare")




#####8)乘客港口出发信息

> barplot(table(train.data$Embarked),main = "port of embarkation")




9)使用barplot函数寻找什么性别乘客在沉船事故中丧生概率更大

>counts = table(train.data$Survived,train.data$Sex)
>barplot(counts,col = c("darkblue","red"),legend = c("Perished","Survived"),main = "passenger survived by sex")




10)船舱等级(pclass)是否对逃生概率有影响

> counts = table(train.data$Survived,train.data$Pclass)
> barplot(counts,col = c("darkblue","red"),legend = c("Perished","Survived"),main = "passenger survived by pclass")




11)分析每种舱位中乘客的性别分布

counts = table(train.data$Sex,train.data$Pclass)
barplot(counts,col = c("darkblue","red"),legend = rownames(counts),main = "passenger Gender by pclass")




12)用直方图查看乘客年龄的分布

> hist(train.data$Age[which(train.data$Survived == "0")],main = "Passenger Age Histogram",xlab = "Age",ylab = "count",col = "blue",breaks=seq(0,80,by=2))
> hist(train.data$Age[which(train.data$Survived == "1")],col = "red",add=T,breaks=seq(0,80,by=2))




13)为了获得更多有关年龄与逃生概率之间的细节,使用boxplot函数箱图来分析:

> boxplot(train.data$Age ~ train.data$Survived,
+ main = "passenger survival by age",
+ xlab = "survived",ylab = "age")




14)将乘客按照年龄段分成不同的组,例如儿童(小于13岁),少年(13-19岁),成年(20-65岁),老年(大于65岁)

> train.child = train.data$Survived[train.data$Age<13]
> length(train.child[which(train.child == 1)])/length(train.child)
[1] 0.57525
> train.youth = train.data$Survived[trai
b733
n.data$Age >= 13 & train.data$Age < 25]
> length(train.youth[which(train.youth == 1)])/length(train.youth)
33
1] 0.408133
> train.adult = train.data$Survived[train.data$Age >= 25 & train.data$Age < 65]
> length(train.adult[which(train.adult == 1)])/length(train.adult)
[1] 0.3540925
> train.older = train.data$Survived[train.data$Age>=65]
> length(train.older[which(train.older == 1)])/length(train.older)
[1] 0.09090909


15)分析

从1图可以知道,死亡人数要大于获救人数。

从2图可以知道,三等舱所占的比例最大

从3图可以知道,男性乘客多于女性乘客

从4图可以知道,大多数乘客年龄在20岁与40岁之间

从5图可以知道,大多数乘客都有随行的兄弟姐妹或者配偶

从6图可以知道,大多数乘客父母或者子女的随行人数都在0到2人之间

从7图可以知道,票价的不同暗示舱位的不同

从8图可以知道,曾经在三个港口停留搭载乘客

从9图可以知道,女性获救的概率要大于男性

从10图可以知道,表面上等级越获救的概率越大,但是真的是这样么??

从11图可以知道,大多数三等舱的乘客是男性,所以三等的舱的死亡概率大点

从12图可以知道,各个年龄段的获救情况,并不能很明确有告诉我们不同年龄段在逃生概率上的不同,也不能证明那一个更容易获救

从13图可以知道,获救的情况与年龄的分布情况,显示出数据的分布情况

从最后的详细年龄段划分,年龄越小,逃生概率越大。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: