您的位置：首页 > 其它

mlpy库-----降维模块（学习笔记）

2017-01-08 18:21 537 查看

mlpy库-----降维模块（学习笔记）

1、降维（Dimensionality Reduction）

Linear Discriminant Analysis (LDA)---线性判别分析
Spectral Regression Discriminant Analysis (SRDA)---谱回归判别分析
Kernel Fisher Discriminant Analysis (KFDA)---核Fisher判别分析
Principal Component Analysis (PCA)---主成分分析
Fast Principal Component Analysis (PCAFast)---快速主成分分析
Kernel Principal Component Analysis (KPCA)---核主成分分析

2、主成分分析(PCA)

定义：class mlpy.PCA(method='svd', whiten=False)

参数：

method [str] method, ‘svd’ or ‘cov’
whiten [bool] whitening. The eigenvectors will be scaled by eigenvalues**-(1/2)

成员函数：

（1）coeff()
返回映射矩阵 (P,L)，其中 L=min(N,P)，按特征值降序排列. 每一列包含一个主成分的系数。
（2）coeff_inv()
返回映射矩阵(L,P)的逆, 其中L=min(N,P), 按特征值降序排列.
（3）evals()
返回排序特征值 (L), 其中L=min(N,P).
（4）learn(x)
计算主成分系数。x 是一个矩阵 (N,P)。x的每一列表示一个变量，而行包含观测。
（5）transform(t, k=None)
把t (M,P)嵌入k维子空间。返回一个(M,K)矩阵。如果 k =None将被设为min(N,P)。
（6）transform_inv(z)
把数据变换到原来的空间，其中z是一个(M,K)矩阵。返回一个 (M,P)矩阵。

Example:

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import mlpy
>>> np.random.seed(0)
>>> mean, cov, n = [0, 0], [[1,1],[1,1.5]], 100
>>> x = np.random.multivariate_normal(mean, cov, n)
>>> pca = mlpy.PCA()
>>> pca.learn(x)
>>> coeff = pca.coeff()
>>> fig = plt.figure(1)	 # plot
>>> plot1 = plt.plot(x[:, 0], x[:, 1], 'o')
>>> plot2 = plt.plot([0,coeff[0, 0]], [0, coeff[1, 0]], linewidth=4, color='r') 	# first PC
>>> plot3 = plt.plot([0,coeff[0, 1]], [0, coeff[1, 1]], linewidth=4, color='g') 	# second PC
>>> xx = plt.xlim(-4, 4)
>>> yy = plt.ylim(-4, 4)
>>> plt.show()

>>> z = pca.transform(x, k=1)
# transform x using the first PC

>>> xnew = pca.transform_inv(z) # transform data back to its original space

>>> fig2 = plt.figure(2) # plot

>>> plot1 = plt.plot(xnew[:, 0], xnew[:, 1], 'o')

>>> xx = plt.xlim(-4, 4)

>>> yy = plt.ylim(-4, 4)

>>> plt.show()

3、快速主成分分析 (PCAFast)

定义：class mlpy.PCAFast(k=2, eps=0.01)

参数：

k [integer] the number of principal axes or eigenvectors required

eps [float (> 0)] tolerance error

函数：

（1）coeff()
按特征值降序排列返回变换矩阵(P,K)。每一列包含一个主成分的系数。

（2）coeff_inv()
返回变换矩阵(K,P)的逆，按特征值降序排列。
（3）learn(x)
计算第一个k主成分系数。 x 是一个矩阵(N,P)。x
的每一列代表一个变量，而行包含观察值。

（4）transform(t)
把 t (M,P)嵌入到 k维子空间中，返回一个(M,K)矩阵。
（5）transform_inv(z)
把数据变换到原来的空间，其中z是一个(M,K)矩阵。返回一个 (M,P)矩阵。

Example ：

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import mlpy
>>> np.random.seed(0)
>>> h = 10 	# dimension reduced to h=10
>>> n = 100 	# number of samples
>>> d = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000]) 	# number of dimensions
>>> mse_eig, mse_fast = np.zeros(len(d)), np.zeros(len(d))
>>> pca = mlpy.PCA(method='cov') 		# pca (eigenvalue decomposition)
>>> pca_fast=mlpy.PCAFast(k=h) 		# fast pca
>>> for i in range(d.shape[0]):
...     x = np.random.rand(n, d[i])
...     pca.learn(x) 				# pca (eigenvalue decomposition)
...     y_eig = pca.transform(x, k=h) 	# reduced dimensional feature vectors
...     xhat_eig = pca.transform_inv(y_eig) 	# reconstructed vector
...     pca_fast.learn(x) 			# pca (eigenvalue decomposition)
...     y_fast = pca_fast.transform(x) 	# reduced dimensional feature vectors
...     xhat_fast = pca_fast.transform_inv(y_fast) 		# reconstructed vector
...     for j in range(n):
...         mse_eig[i] += np.sum((x[j] - xhat_eig[j])**2)
...         mse_fast[i] += np.sum((x[j] - xhat_fast[j])**2)
...     mse_eig[i] /= n
...     mse_fast[i] /= n
...
>>> fig = plt.figure(1)
>>> plot1 = plt.plot(d, mse_eig, '|-b', label="PCA using eigenvalue decomposition")
>>> plot2 = plt.plot(d, mse_fast, '.-g', label="Fast PCA")
>>> leg = plt.legend(loc = 'best')
>>> xl = plt.xlabel("Data dimensionality")
>>> yl = plt.ylabel("Mean Squared Error")
>>> plt.show()

Result：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航