Kaggle/Titanic python分析和建模
2017-07-29 21:28
441 查看
Titanic是Kaggle入门项目,本文跟随https://www.kaggle.com/startupsci/titanic/titanic-data-science-solutions学习。
1.Workflow stages
完整的流程分7步;当然,Kaggle已经提供了第1和第2步了;绝大部分都是数据整理工作,即所谓的“特征工程”,其中,通过画图来探索数据是必备技能。
其中,Wrangle是什么意思?
Question or problem definition.
Acquire training and testing data.
Wrangle, prepare, cleanse the data.
Analyze, identify patterns, and explore the data.
Model, predict and solve the problem.
Visualize, report, and present the problem solving steps and final solution.
Supply or submit the results.
2. Analyze by describing data
通过pandas进行数据集的早期探索,可以问答以下的问题:
Which features are available in the dataset?
Which features are categorical?
Which features are numerical?
Which features are mixed data types?
Which features may contain errors or typos?
Which features contain blank, null or empty values?
What are the data types for various features?
What is the distribution of numerical feature values across the samples?
What is the distribution of categorical features?
3. Assumtions based on data analysis
在“Analyze by describing data”基础上按照以下几类进行假设分析。
Correlating feature:此例中,比如female的存活概率较高
Completing feature
Correcting feature
Creating new feature
4. Analyze by pivoting features | Analyze by visualizing data
section 3 and section 4是必须一起考虑和操作的,通过这2步骤,能更深的理解数据的各特征。
并且通过此2步骤,将会考虑哪些特征是有用的,哪些特征是无用可丢弃的。
Assumtions必须通过本步骤提供证据,表格和直方图都是“透视”数据规律的好办法。
特征参数是类别变量时,使用表格进行“透视”数据。
特征参数是数值变量时,通过直方图进行“透视”数据。
4.1 Correlating feature
Correlating numerical features
Correlating numerical and ordinal features
Correlating categorical features
5. Wrangle data
这一步才是真正的“特征工程”处理了,之前的section 2/3/4都只是分析特征而已。
Correcting by dropping features
Creating new feature extracting from existing
Converting a categorical feature
Completing a numerical continuous feature
Create new feature combining existing features
Completing a categorical feature
Converting categorical feature to numeric
Quick completing and converting a numeric feature
1.Workflow stages
完整的流程分7步;当然,Kaggle已经提供了第1和第2步了;绝大部分都是数据整理工作,即所谓的“特征工程”,其中,通过画图来探索数据是必备技能。
其中,Wrangle是什么意思?
Question or problem definition.
Acquire training and testing data.
Wrangle, prepare, cleanse the data.
Analyze, identify patterns, and explore the data.
Model, predict and solve the problem.
Visualize, report, and present the problem solving steps and final solution.
Supply or submit the results.
2. Analyze by describing data
通过pandas进行数据集的早期探索,可以问答以下的问题:
Which features are available in the dataset?
Which features are categorical?
Which features are numerical?
Which features are mixed data types?
Which features may contain errors or typos?
Which features contain blank, null or empty values?
What are the data types for various features?
What is the distribution of numerical feature values across the samples?
What is the distribution of categorical features?
3. Assumtions based on data analysis
在“Analyze by describing data”基础上按照以下几类进行假设分析。
Correlating feature:此例中,比如female的存活概率较高
Completing feature
Correcting feature
Creating new feature
4. Analyze by pivoting features | Analyze by visualizing data
section 3 and section 4是必须一起考虑和操作的,通过这2步骤,能更深的理解数据的各特征。
并且通过此2步骤,将会考虑哪些特征是有用的,哪些特征是无用可丢弃的。
Assumtions必须通过本步骤提供证据,表格和直方图都是“透视”数据规律的好办法。
特征参数是类别变量时,使用表格进行“透视”数据。
特征参数是数值变量时,通过直方图进行“透视”数据。
4.1 Correlating feature
Correlating numerical features
Correlating numerical and ordinal features
Correlating categorical features
5. Wrangle data
这一步才是真正的“特征工程”处理了,之前的section 2/3/4都只是分析特征而已。
Correcting by dropping features
Creating new feature extracting from existing
Converting a categorical feature
Completing a numerical continuous feature
Create new feature combining existing features
Completing a categorical feature
Converting categorical feature to numeric
Quick completing and converting a numeric feature
相关文章推荐
- 用python实现Kaggle的Titanic数据分析例子
- Kaggle 入门级题目titanic数据分析(EDA)尝试
- 代写CS|留学生|金融编程|代码代做|C++语言|JAVA|R语言|Python|经济统计|数值分析|建模|作业加急|天才写手网
- Python机器学习实践例子&&Kagle入门 Titanic乘客生存预测模型分析(利用决策树)
- Kaggle_Titanic 集成算法样例程序分析
- 建模分析之机器学习算法(附python&R代码)
- Python数据挖掘建模 chapter_6 主成分分析(简)
- 机器学习(十一)使用sklearn对kaggle的Titanic进行建模
- kaggle数据挖掘竞赛初步--Titanic<原始数据分析&缺失值处理>
- [Kaggle] 数据建模分析与竞赛平台介绍
- 【kaggle】数据分析kaggle + 如何系统地自学 Python?
- Kaggle实例-Titanic分析(一)
- [Kaggle] 数据建模分析与竞赛平台介绍
- Python数据分析与挖掘实战—挖掘建模
- 基于Python的Kaggle案例分析(一)
- 『Python数据分析与挖掘实战』第五章:挖掘建模
- Titanic数据分析报告(python)
- kaggle-titanic 数据分析过程
- 【Python数据分析】2nd-挖掘建模