Python 决策树 泰坦尼克号乘客是否生还决策模型
2017-12-20 22:51
218 查看
与网上的其他内容均一样
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
import pandas as pd titanic = pd.read_csv('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt')
titanic.head()
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
row.names | pclass | survived | name | age | embarked | home.dest | room | ticket | boat | sex | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1st | 1 | Allen, Miss Elisabeth Walton | 29.0000 | Southampton | St Louis, MO | B-5 | 24160 L221 | 2 | female |
1 | 2 | 1st | 0 | Allison, Miss Helen Loraine | 2.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
2 | 3 | 1st | 0 | Allison, Mr Hudson Joshua Creighton | 30.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | (135) | male |
3 | 4 | 1st | 0 | Allison, Mrs Hudson J.C. (Bessie Waldo Daniels) | 25.0000 | Southampton | Montreal, PQ / Chesterville, ON | C26 | NaN | NaN | female |
4 | 5 | 1st | 1 | Allison, Master Hudson Trevor | 0.9167 | Southampton | Montreal, PQ / Chesterville, ON | C22 | NaN | 11 | male |
titanic.info()
X = titanic[['pclass', 'age', 'sex']] y = titanic['survived']
X.info() X.head()
# 使用均值对AGE进行插值 X['age'].fillna(X['age'].mean(), inplace=True) X.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1313 entries, 0 to 1312 Data columns (total 3 columns): pclass 1313 non-null object age 1313 non-null float64 sex 1313 non-null object dtypes: float64(1), object(2) memory usage: 30.9+ KB D:\Program Files\Anaconda35\lib\site-packages\pandas\core\generic.py:3660: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self._update_inplace(new_data)
# 数据分割 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=33) X_train.head() X_train.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 984 entries, 1086 to 1044 Data columns (total 3 columns): pclass 984 non-null object age 984 non-null float64 sex 984 non-null object dtypes: float64(1), object(2) memory usage: 30.8+ KB
from sklearn.feature_extraction import DictVectorizer
# 原文 vec = DictVectorizer(sparse=False) 报错 vec = DictVectorizer() X_train = vec.fit_transform(X_train.to_dict(orient = 'record')) X_test = vec.transform(X_test.to_dict(orient = 'record')) #X_train.to_dict(orient='record') vec.feature_names_
['age', 'pclass=1st', 'pclass=2nd', 'pclass=3rd', 'sex=female', 'sex=male']
from sklearn.tree import DecisionTreeClassifier dtc = DecisionTreeClassifier() dtc.fit(X_train, y_train) y_predict = dtc.predict(X_test)
from sklearn.metrics import classification_report print(dtc.score(X_test, y_test))
0.781155015198
print(classification_report(y_predict, y_test, target_names=['died', 'survived']))
precision recall f1-score support died 0.91 0.78 0.84 236 survived 0.58 0.80 0.67 93 avg / total 0.81 0.78 0.79 329
相关文章推荐
- Python机器学习实践例子&&Kagle入门 Titanic乘客生存预测模型分析(利用决策树)
- 【Kaggle笔记】预测泰坦尼克号乘客生还情况(决策树)
- 泰坦尼克号上的乘客是否生还的预测分析
- 机器学习之决策树预测——泰坦尼克号乘客数据实例
- 机器学习经典算法详解及Python实现--CART分类决策树、回归树和模型树
- python决策树泰坦尼克生还预测
- 泰坦尼克号乘客生存分析--使用决策树
- 机器学习经典算法详解及Python实现--CART分类决策树、回归树和模型树
- 集成模型python实现,随机森林,梯度提升决策树
- CART分类决策树、回归树和模型树算法详解及Python实现
- python中利用决策树实现泰坦尼克死亡和生还人生的分类
- 使用python+sklearn的决策树方法预测是否有信用风险
- python 数据科学 - 【分类模型】 ☞ 决策树
- python中使用集成模型,随机森林分类器,梯度提升决策树性能模型分析 可视化
- [kaggle系列 二] 使用决策树判断是否能从泰坦尼克号生还
- 使用python写神经网络模型之分类器
- C++嵌套调用 用Python 脚本写的 基于Gurobi 的解数学模型的经验记录
- python判断list中是否包含某个元素
- 精心整理的8道Python面试题!是否难到你了
- Python时间序列分析--从线性模型到GARCH模型