经典机器学习算法系列8-PCA
2017-03-14 10:15
260 查看
PCA是经典的降维手段,降维的代码在《机器学习实战》里面有完整的python代码,现在将代码书写如下
写一个pca.py文件,将下面的代码复制到文件中。
'''
Created on Jun 1, 2011
@author: Peter Harrington
'''
from numpy import *
def loadDataSet(fileName, delim='\t'):
fr = open(fileName)
stringArr = [line.strip().split(delim) for line in fr.readlines()]
datArr = [map(float,line) for line in stringArr]
return mat(datArr)
def pca(dataMat, topNfeat=9999999):
meanVals = mean(dataMat, axis=0)
meanRemoved = dataMat - meanVals #remove mean
covMat = cov(meanRemoved, rowvar=0)
eigVals,eigVects = linalg.eig(mat(covMat))
eigValInd = argsort(eigVals) #sort, sort goes smallest to largest
eigValInd = eigValInd[:-(topNfeat+1):-1] #cut off unwanted dimensions
redEigVects = eigVects[:,eigValInd] #reorganize eig vects largest to smallest
lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions
reconMat = (lowDDataMat * redEigVects.T) + meanVals
return lowDDataMat, reconMat
def replaceNanWithMean():
datMat = loadDataSet('secom.data', ' ')
numFeat = shape(datMat)[1]
for i in range(numFeat):
meanVal = mean(datMat[nonzero(~isnan(datMat[:,i].A))[0],i]) #values that are not NaN (a number)
datMat[nonzero(isnan(datMat[:,i].A))[0],i] = meanVal #set NaN values to mean
return datMat
书写main.py文件,复制如下代码
import pca
lowDMat1,feature1 = pca.pca(feature,200);这样就可以把一个向量降到200维度了
写一个pca.py文件,将下面的代码复制到文件中。
'''
Created on Jun 1, 2011
@author: Peter Harrington
'''
from numpy import *
def loadDataSet(fileName, delim='\t'):
fr = open(fileName)
stringArr = [line.strip().split(delim) for line in fr.readlines()]
datArr = [map(float,line) for line in stringArr]
return mat(datArr)
def pca(dataMat, topNfeat=9999999):
meanVals = mean(dataMat, axis=0)
meanRemoved = dataMat - meanVals #remove mean
covMat = cov(meanRemoved, rowvar=0)
eigVals,eigVects = linalg.eig(mat(covMat))
eigValInd = argsort(eigVals) #sort, sort goes smallest to largest
eigValInd = eigValInd[:-(topNfeat+1):-1] #cut off unwanted dimensions
redEigVects = eigVects[:,eigValInd] #reorganize eig vects largest to smallest
lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions
reconMat = (lowDDataMat * redEigVects.T) + meanVals
return lowDDataMat, reconMat
def replaceNanWithMean():
datMat = loadDataSet('secom.data', ' ')
numFeat = shape(datMat)[1]
for i in range(numFeat):
meanVal = mean(datMat[nonzero(~isnan(datMat[:,i].A))[0],i]) #values that are not NaN (a number)
datMat[nonzero(isnan(datMat[:,i].A))[0],i] = meanVal #set NaN values to mean
return datMat
书写main.py文件,复制如下代码
import pca
lowDMat1,feature1 = pca.pca(feature,200);这样就可以把一个向量降到200维度了
相关文章推荐
- 机器学习算法笔记系列之深入理解主成分分析PCA
- 经典机器学习算法系列-目录
- 经典机器学习算法系列7-svd
- 经典机器学习算法系列2-朴素贝叶斯法
- 经典机器学习算法系列1-决策树
- 机器学习算法笔记系列之深入理解主成分分析PCA-Python实现篇
- 经典机器学习算法系列-svm
- 机器学习算法笔记系列之深入理解主成分分析PCA-原理篇
- 机器学习算法笔记系列之深入理解主成分分析PCA-原理篇
- 经典机器学习算法系列3-k近邻算法
- 白话经典算法系列之六 高速排序 高速搞定
- 白话经典算法系列之七 堆与堆排序
- 机器学习算法工程师的经典面试问题
- C#经典系列-跨语言
- 白话经典算法系列之七 堆与堆排序
- 【转载】白话经典算法系列之五 归并排序的实现
- learn-with-open-source 如何进行开源社区的学习系列经典文章
- 白话经典算法系列之五 归并排序的实现(讲的真好)(转载)
- 算法系列15天速成——第六天 五大经典查找【下】
- 程序员面试、算法研究、编程艺术、红黑树4大经典原创系列集锦与总结