【大数据部落】电信公司churn数据客户流失knn预测分析(二)
2017-07-09 23:25
661 查看
Relationships
between variables.
[/code]
the statistics node, report
[/code] Data Manipulation a.![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340666WR8NWR0JHA59GR5CJL.jpg)
[/code] b. Discretize
(make categorical) a relevant numeric variable
[/code] a. construct
a distribution of the variable with a churn overlay![](http://userimage8.360doc.com/17/0709/22/36427088_2017070922473406827DITCMAHREDUWEQI6V.jpg)
b. construct
a histogram of the variable with a churn overlay![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340682HI8X7FAUQIDAWTOUVM.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340698KEMWQL47M8MERS94CP.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340698QYZ7XBYMV4UA6XM9L2.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340713DLG5I07N48QF1KVCTP.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340713Q1B8UPOFE1TFWDB21K.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340729ZYAITHFTLCW6YJPRMI.jpg)
c. Find a pair of numeric variables which are interesting with respect to churn.
[/code] Model Building
[/code]
K-Nearest-Neighbors (K-NN) algorithm to develop a model for predicting Churn
[/code]
[/code]
[/code]
[/code] Findings
[/code]
统计分析和数据挖掘咨询服务 :y0.cn/teradat(咨询服务请联系官网客服)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201706/d9fe65056f37327c4862ebc562bdb331)
QQ:3025393450
【服务场景】
科研项目;
公司项目外包。【大数据部落】提供定制化的一站式数据挖掘和统计分析咨询服务
![](https://oscdn.geek-share.com/Uploads/Images/Content/201706/522caf310fbe8e56614fa6efb0efe2ba)
分享最新的大数据资讯,每天学习一点数据分析,让我们一起做有态度的数据人
微信客服号:lico_9e
QQ交流群:186388004
between variables.
![](http://userimage8.360doc.com/17/0709/22/36427088_2017070922473406513LM44S64DL9HUDNEQA.jpg)
从结果中我们可以看到两者之间存在显著的正相关线性关系。
[/code]
![](http://userimage8.360doc.com/17/0709/22/36427088_2017070922473406516VKXJX0LG3BYAMQ40C.jpg)
[code]从结果中我们可以看到两者之间也存在显著的正相关线性关系。[/code] c. Using
the statistics node, report
## account.length area.code ## account.length 1.0000000000 -0.018054187 ## area.code -0.0180541874 1.000000000 ## number.vmail.messages -0.0145746663 -0.003398983 ## total.day.minutes -0.0010174908 -0.019118245 ## total.day.calls 0.0282402279 -0.019313854 ## total.day.charge -0.0010191980 -0.019119256 ## total.eve.minutes -0.0095913331 0.007097877 ## total.eve.calls 0.0091425790 -0.012299947 ## total.eve.charge -0.0095873958 0.007114130 ## total.night.minutes 0.0006679112 0.002083626 ## total.night.calls -0.0078254785 0.014656846 ## total.night.charge 0.0006558937 0.002070264 ## total.intl.minutes 0.0012908394 -0.004153729 ## total.intl.calls 0.0142772733 -0.013623309 ## total.intl.charge 0.0012918112 -0.004219099 ## number.customer.service.calls -0.0014447918 0.020920513 ## number.vmail.messages total.day.minutes ## account.length -0.0145746663 -0.001017491 ## area.code -0.0033989831 -0.019118245 ## number.vmail.messages 1.0000000000 0.005381376 ## total.day.minutes 0.0053813760 1.000000000 ## total.day.calls 0.0008831280 0.001935149 ## total.day.charge 0.0053767959 0.999999951 ## total.eve.minutes 0.0194901208 -0.010750427 ## total.eve.calls -0.0039543728 0.008128130 ## total.eve.charge 0.0194959757 -0.010760022 ## total.night.minutes 0.0055413838 0.011798660 ## total.night.calls 0.0026762202 0.004236100 ## total.night.charge 0.0055349281 0.011782533 ## total.intl.minutes 0.0024627018 -0.019485746 ## total.intl.calls 0.0001243302 -0.001303123 ## total.intl.charge 0.0025051773 -0.019414797 ## number.customer.service.calls -0.0070856427 0.002732576 ## total.day.calls total.day.charge ## account.length 0.0282402279 -0.001019198 ## area.code -0.0193138545 -0.019119256 ## number.vmail.messages 0.0008831280 0.005376796 ## total.day.minutes 0.0019351487 0.999999951 ## total.day.calls 1.0000000000 0.001935884 ## total.day.charge 0.0019358844 1.000000000 ## total.eve.minutes -0.0006994115 -0.010747297 ## total.eve.calls 0.0037541787 0.008129319 ## total.eve.charge -0.0006952217 -0.010756893 ## total.night.minutes 0.0028044650 0.011801434 ## total.night.calls -0.0083083467 0.004234934 ## total.night.charge 0.0028018169 0.011785301 ## total.intl.minutes 0.0130972198 -0.019489700 ## total.intl.calls 0.0108928533 -0.001306635 ## total.intl.charge 0.0131613976 -0.019418755 ## number.customer.service.calls -0.0107394951 0.002726370 ## total.eve.minutes total.eve.calls ## account.length -0.0095913331 0.009142579 ## area.code 0.0070978766 -0.012299947 ## number.vmail.messages 0.0194901208 -0.003954373 ## total.day.minutes -0.0107504274 0.008128130 ## total.day.calls -0.0006994115 0.003754179 ## total.day.charge -0.0107472968 0.008129319 ## total.eve.minutes 1.0000000000 0.002763019 ## total.eve.calls 0.0027630194 1.000000000 ## total.eve.charge 0.9999997749 0.002778097 ## total.night.minutes -0.0166391160 0.001781411 ## total.night.calls 0.0134202163 -0.013682341 ## total.night.charge -0.0166420421 0.001799380 ## total.intl.minutes 0.0001365487 -0.007458458 ## total.intl.calls 0.0083881559 0.005574500 ## total.intl.charge 0.0001593155 -0.007507151 ## number.customer.service.calls -0.0138234228 0.006234831 ## total.eve.charge total.night.minutes ## account.length -0.0095873958 0.0006679112 ## area.code 0.0071141298 0.0020836263 ## number.vmail.messages 0.0194959757 0.0055413838 ## total.day.minutes -0.0107600217 0.0117986600 ## total.day.calls -0.0006952217 0.0028044650 ## total.day.charge -0.0107568931 0.0118014339 ## total.eve.minutes 0.9999997749 -0.0166391160 ## total.eve.calls 0.0027780971 0.0017814106 ## total.eve.charge 1.0000000000 -0.0166489191 ## total.night.minutes -0.0166489191 1.0000000000 ## total.night.calls 0.0134220174 0.0269718182 ## total.night.charge -0.0166518367 0.9999992072 ## total.intl.minutes 0.0001320238 -0.0067209669 ## total.intl.calls 0.0083930603 -0.0172140162 ## total.intl.charge 0.0001547783 -0.0066545873 ## number.customer.service.calls -0.0138363623 -0.0085325365
如果把高相关性的变量保存下来,可能会造成多重共线性问题,因此需要把高相关关系的变量删去。、
[/code] Data Manipulation a.
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340666WR8NWR0JHA59GR5CJL.jpg)
从结果中可以看到,total.day.calls和total.day.charge之间存在一定的相关关系。特别是voicemial为no的变量之间存在负相关关系。
[/code] b. Discretize
(make categorical) a relevant numeric variable
[code][/code]
[code]对变量进行离散化
[/code] a. construct
a distribution of the variable with a churn overlay
![](http://userimage8.360doc.com/17/0709/22/36427088_2017070922473406827DITCMAHREDUWEQI6V.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340682WKE8VBTPKA1ORGP0QN.jpg)
a histogram of the variable with a churn overlay
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340682HI8X7FAUQIDAWTOUVM.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340698KEMWQL47M8MERS94CP.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340698QYZ7XBYMV4UA6XM9L2.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340713DLG5I07N48QF1KVCTP.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340713Q1B8UPOFE1TFWDB21K.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340729ZYAITHFTLCW6YJPRMI.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340729HGYD46KRO4TQ4SYMMG.jpg)
![](http://userimage8.360doc.com/17/0709/22/36427088_201707092247340729C4DVF2THJAA51H7VMR.jpg)
从结果中可以看到,total.day.calls和total.day.charge之间存在一定的相关关系。特别是churn为no的变量之间存在相关关系。
[/code] Model Building
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.3082150 0.0735760 4.189 2.85e-05 *** ## stateAL 0.0151188 0.0462343 0.327 0.743680 ## stateAR 0.0894792 0.0490897 1.823 0.068399 . ## stateAZ 0.0329566 0.0494195 0.667 0.504883 ## stateCA 0.1951511 0.0567439 3.439 0.000588 ***
## international.plan yes 0.3059341 0.0151677 20.170 < 2e-16 *** ## voice.mail.plan yes -0.1375056 0.0337533 -4.074 4.70e-05 *** ## number.vmail.messages 0.0017068 0.0010988 1.553 0.120402 ## total.day.minutes 0.3796323 0.2629027 1.444 0.148802 ## total.day.calls 0.0002191 0.0002235 0.981 0.326781 ## total.day.charge -2.2207671 1.5464583 -1.436 0.151056 ## total.eve.minutes 0.0288233 0.1307496 0.220 0.825533 ## total.eve.calls -0.0001585 0.0002238 -0.708 0.478915 ## total.eve.charge -0.3316041 1.5382391 -0.216 0.829329 ## total.night.minutes 0.0083224 0.0695916 0.120 0.904814 ## total.night.calls -0.0001824 0.0002225 -0.820 0.412290 ## total.night.charge -0.1760782 1.5464674 -0.114 0.909355 ## total.intl.minutes -0.0104679 0.4192270 -0.025 0.980080 ## total.intl.calls -0.0063448 0.0018062 -3.513 0.000447 *** ## total.intl.charge 0.0676460 1.5528267 0.044 0.965254 ## number.customer.service.calls 0.0566474 0.0033945 16.688 < 2e-16 *** ## total.day.minutes1medium 0.0502681 0.0160228 3.137 0.001715 ** ## total.day.minutes1short 0.2404020 0.0322293 7.459 1.02e-13 ***
从结果中看,我们可以发现 state total.intl.calls 、number.customer.service.calls 、 total.day.minutes1medium 、 total.day.minutes1short 的变量有重要的影响。
[/code]
Use
K-Nearest-Neighbors (K-NN) algorithm to develop a model for predicting Churn
## Direction.2005 ## knn.pred 1 2 ## 1 760 97 ## 2 100 43[/code]
[code] [1] 0.803
[/code]
[code]混淆矩阵(英语:confusion matrix)是可视化工具,特别用于监督学习,在无监督学习一般叫做匹配矩阵。 矩阵的每一列代表一个类的实例预测,而每一行表示一个实际的类的实例。
[/code]
[code]从训练集的结果中,我们可以看到准确度有80%[/code]
## Direction.2005 ## knn.pred 1 2 ## 1 827 104 ## 2 33 36[/code]
[1] 0.863
[/code]
从测试集的结果,我们可以看到准确度达到86%。
[/code] Findings
我们可以发现 ,total.day.calls和total.day.charge之间存在一定的相关关系。特别是churn为no的变量之间存在相关关系。
[/code]
同时我们可以发现 state total.intl.calls 、number.customer.service.calls 、 total.day.minutes1medium 、 total.day.minutes1short 的变量有重要的影响。[/code]
同时我们可以发现,total.day.calls和total.day.charge之间存在一定的相关关系。[/code]
最后从knn模型结果中,我们可以发现从训练集的结果中,我们可以看到准确度有80%,从测试集的结果,我们可以看到准确度达到86%。[/code]
说明模型有很好的预测效果。[/code]
大数据部落——中国专业的第三方数据服务提供商,提供定制化的一站式数据挖掘和统计分析咨询服务
统计分析和数据挖掘咨询服务 :y0.cn/teradat(咨询服务请联系官网客服)
QQ:3025393450
【服务场景】
科研项目;
公司项目外包。【大数据部落】提供定制化的一站式数据挖掘和统计分析咨询服务
分享最新的大数据资讯,每天学习一点数据分析,让我们一起做有态度的数据人
![](https://oscdn.geek-share.com/Uploads/Images/Content/201706/0367f24a0361a78c1a3aef043e761618.gif)
QQ交流群:186388004
相关文章推荐
- 【大数据部落】电信公司churn数据客户流失knn预测分析(一)
- KNIMI数据挖掘建模与分析系列_004_利用KNIMI做客户流失预测
- 分析以数据挖掘技术预测用户流失情况的方法
- 利用数据挖掘实现电信行业客户流失分析
- 金融风控-->客户流失预警模型-->金融数据分析
- 基于数据挖掘的客户流失分析案例
- 转载:案例用Excel对会员客户交易数据进行RFM分析
- 预测分析和数据挖掘服务的好处
- 数据挖掘案例:建立客户流失模型
- 浅谈网络游戏中新用户首日流失的数据分析
- 浅谈网络游戏中新用户首日流失的数据分析
- 客户关系管理系统中对客户及相关数据的导入导出分析处理
- 深入探索 IBM 数据分析和预测软件 - PASW Modeler
- 小白学数据分析-----> 有关于流失分析的探讨
- 大数据能否将预测分析权还给大众
- 2014年大数据和数据分析发展趋势预测
- 企业客户交易数据分析
- 昨天客户拿过来7G的oracle数据让分析,体会到了一些编写软件的方法....
- 51CTO博客周刊数据统计表及下期分析预测
- [Step By Step]SAP HANA PAL KNN 近邻预测分析K- Nearest Neighbor编程实例KNN