R语言主成分分析——prcomp VS princomp
2013-11-23 04:47
363 查看
最简单的主成分分析函数,prcomp 和 princomp 都是自带的函数,不需要额外的包
http://strata.uga.edu/software/pdf/pcaTutorial.pdf很好的一个介绍
http://gastonsanchez.wordpress.com/2012/06/17/principal-components-analysis-in-r-part-1/很好的一个介绍
主成分分析的结果包含特征根集,PC scores表,(变量和PC)相关系数表(table of loadings)
特征根包含了数据变化度的信息,scores提供了观测结构的信息,相关系数表提供了变量之间,以及和PC之间的关系的大致感官概念
princomp : Performs a principal components analysison the givennumeric
data matrix and returns the results as an object of class princomp.
princomp :
The calculation is doneby a singular value decomposition奇异值分解
The print method for these objects prints the results in a nice format and theplot method produces a screeplot.
Unlike princomp, variances are computed with the usual divisor N - 1.
Note that scale= TRUE cannot be used if there are zero or constant(for center = TRUE) variables.
The calculation is done using eigen on the correlation or covariance matrix,
as determined by cor. This is done for compatibility with the S-PLUS result. Apreferred method of calculation is to use svd on x,
as is done in prcomp.
Note that the default calculation uses divisor N for the covariance matrix.
The print method for these objects prints the results in a nice formatand the plot method
produces a scree plot (screeplot).There is also a biplot method.
If x is a formula then the standard NA-handling is applied to the scores (if requested): seenapredict.
princomp only handles so-calledR-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many
units as variables. ForQ-mode PCA use prcomp.
Q-mode focusses on the correlations or covariances among samples.样本的相关性和协方差
通常多变量分析,例如计算相关系数,是在数据列(features或者Question)上完成的;然而每一行是一个样本单位sample unit,也就是Respondents(R way
analysis)
有时候数据列Question被当做样本单位那么就是Q analysis. 区别也许就在于标准化和结果解释的时候。
回归的一大问题是多重共线性对结果的干扰。对此提出了解决方法PCA回归
原始数据有很高的VI(最后一列大于4的都算比较大)
特征值Eigenvalue ,也代表了样本variance覆盖率
主成分之间的VI完美为1
http://strata.uga.edu/software/pdf/pcaTutorial.pdf很好的一个介绍
http://gastonsanchez.wordpress.com/2012/06/17/principal-components-analysis-in-r-part-1/很好的一个介绍
主成分分析的结果包含特征根集,PC scores表,(变量和PC)相关系数表(table of loadings)
特征根包含了数据变化度的信息,scores提供了观测结构的信息,相关系数表提供了变量之间,以及和PC之间的关系的大致感官概念
描述:
prcomp : Performs a principalcomponents analysis on the givendata matrix and returns the results as anobject of class prcomp.princomp : Performs a principal components analysison the givennumeric
data matrix and returns the results as an object of class princomp.
使用:
以下使用内置数据集USArrests
> str(USArrests)
'data.frame': 50 obs. of 4 variables:
$ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
$ Assault : int NA 263 294 190 276 204 110 238 335 211 ...
$ UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...
$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
prcomp :
prcomp(x, ...)
prcomp(formula, data = NULL, subset, na.action, ...)
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...)
prcomp(USArrests) #inappropriate,没有scale不太合适
prcomp(USArrests,
scale = TRUE) #直接数据矩阵
prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) #直接方程
plot(prcomp(USArrests))
summary(prcomp(USArrests, scale = TRUE))
biplot(prcomp(USArrests, scale = TRUE))
princomp :
princomp(x, ...) #完全一样
princomp(formula, data = NULL, subset, na.action, ...) #继续完全一样
princomp(x,
cor = FALSE, scores = TRUE, covmat = NULL, subset = rep(TRUE,nrow(as.matrix(x))), ...) #参数变化
princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE) 近似但不完全一样,标准差differ by a factor of sqrt(49/50)
summary(pc.cr <- princomp(USArrests, cor = TRUE))
loadings(pc.cr) #一个列包含了特征向量的矩阵,对应rotation in prcomp
plot(pc.cr) # shows a screeplot.
biplot(pc.cr)
返回值:
prcomp :sdev 标准差 | the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). |
rotation 特征向量矩阵 | the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function princomp returns this in the element loadings. |
x | 在retx值为true的情况下,返回旋转后的数据,也就是(centred (and scaled if requested) data multiplied by the rotation matrix). 所以, cov(x) 就是矩阵对角元素(sdev^2). For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action. |
center, scale | the centering and scaling used, or FALSE. 因为PCA必须建立在标准正态数据上(mean=0, variance=1)所以通常需要标准化。 |
sdev 标准差 | the standard deviations of the principal components. |
loadings 特征向量矩阵 | the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). This is of class "loadings": see loadings for its print method. |
center | the means that were subtracted. |
scale | the scalings applied to each variable. |
n.obs | the number of observations. |
scores | if scores = TRUE, the scores of the supplied data on the principal components. These are non-null only if x was supplied, and if covmat was also supplied if it was a covariance list. For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action. |
call | the matched call. |
na.action | If relevant. |
细节:
prcomp :
The calculation is doneby a singular value decomposition奇异值分解
of the (centered and possibly scaled) datamatrix, not by using eigen on the covariance matrix而不使用协方差矩阵的特征根. This is generally the preferred method for numerical accuracy提高数值型准确性.
The print method for these objects prints the results in a nice format and theplot method produces a screeplot.Unlike princomp, variances are computed with the usual divisor N - 1.
Note that scale= TRUE cannot be used if there are zero or constant(for center = TRUE) variables.
princomp :
princomp is a generic function with "formula" and "default" methods.The calculation is done using eigen on the correlation or covariance matrix,
as determined by cor. This is done for compatibility with the S-PLUS result. Apreferred method of calculation is to use svd on x,
as is done in prcomp.
Note that the default calculation uses divisor N for the covariance matrix.
The print method for these objects prints the results in a nice formatand the plot method
produces a scree plot (screeplot).There is also a biplot method.
If x is a formula then the standard NA-handling is applied to the scores (if requested): seenapredict.
princomp only handles so-calledR-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many
units as variables. ForQ-mode PCA use prcomp.
R和Q-Mode区别:
R-mode PCA examines the correlations or covariances among variables变量的相关性和协方差Q-mode focusses on the correlations or covariances among samples.样本的相关性和协方差
通常多变量分析,例如计算相关系数,是在数据列(features或者Question)上完成的;然而每一行是一个样本单位sample unit,也就是Respondents(R way
analysis)
有时候数据列Question被当做样本单位那么就是Q analysis. 区别也许就在于标准化和结果解释的时候。
使用PCA结果进行回归分析
参考http://sites.stat.psu.edu/~ajw13/stat505/fa06/16_princomp/10_princomp_reg_example.htm回归的一大问题是多重共线性对结果的干扰。对此提出了解决方法PCA回归
原始数据有很高的VI(最后一列大于4的都算比较大)
Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 134.96790 237.81430 0.57 0.5778 0 occup 1 -1.28377 0.80469 -1.60 0.1291 2.16276 checkin 1 1.80351 0.51624 3.49 0.0028 4.52397 hours 1 0.66915 1.84640 0.36 0.7215 1.35735 common 1 -21.42263 10.17160 -2.11 0.0504 2.33264 wings 1 5.61923 14.74609 0.38 0.7079 3.65318 cap 1 -14.48025 4.22018 -3.43 0.0032 37.12912 rooms 1 29.32475 6.36590 4.61 0.0003 63.70809
特征值Eigenvalue ,也代表了样本variance覆盖率
Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative 1 4.64302239 3.90281147 0.6633 0.6633 2 0.74021092 0.03390878 0.1057 0.7690 3 0.70630215 0.25669541 0.1009 0.8699 4 0.44960674 0.15020062 0.0642 0.9342 5 0.29940611 0.14798282 0.0428 0.9769 6 0.15142329 0.14139489 0.0216 0.9986 7 0.01002840 0.0014 1.0000
主成分之间的VI完美为1
Parameter Estimates Variance Variable DF Inflation Intercept 1 0 Prin1 1 1.00000 Prin2 1 1.00000 Prin3 1 1.00000 Prin4 1 1.00000 Prin5 1 1.00000 Prin6 1 1.00000 Prin7 1 1.00000
相关文章推荐
- R语言 主成分分析
- R语言主成分分析之SVD
- R语言 PCA(主成分分析)
- R语言主成分和因子分析篇
- 应用统计学与R语言实现学习笔记(十二)——主成分分析
- R语言与数据分析之五:主成分分析
- R进行主成分分析之princomp
- R_R语言做主成分分析
- R语言学习笔记:主成分分析及因子分析
- 广告投入是怎样提高新用户数的(岭回归及主成分回归) | R语言商业分析实践3
- R语言做主成分分析
- 应用统计学与R语言实现学习笔记(十二)——主成分分析
- R语言主成分和因子分析篇
- 主成分分析(PCA)原理及R语言实现
- R语言-主成分分析
- 非常简单而又非常完整的R语言主成分分析实例
- R语言与主成分分析
- SAS PRINCOMP 主成分分析
- 主成分分析方法和matlab函数中的princomp