您的位置:首页 > 编程语言 > Python开发

python下PCA算法与人脸识别

2015-07-25 19:16 627 查看
关于这部分主要是想在python下试验一下主成分分析(PCA)算法以及简单的人脸识别。曾经详述过matlab下的PCA以及SVM算法进行人脸识别技术,参考如下:

主成分分析法-简单人脸识别(一)

主成分分析-简单人脸识别(二)

PCA实验人脸库-人脸识别(四)

PCA+支持向量机-人脸识别(五)

主成分分析(PCA)算法主要是对高维数据进行降维,最大限度的找到数据间的相互关系,在机器学习、数据挖掘上很有用。在机器学习领域算法众多,贴一个:

大神博客索引

关于PCA的核心思想与原理介绍上述已经给出。而对与在人脸识别上的应用需要说明一下,首先是人脸数据库下载(上述也有)。其次是对人脸数据进行准备。最后使用PCA处理。这里只贴出修改到python下以后的完整代码,其中只对ORL数据库进行了试验:

# -*- coding: utf-8 -*-
"""
Created on Sat Jul 25 09:41:24 2015

@author: on2way
"""

import os
import operator
from numpy import *
import matplotlib.pyplot as plt
import cv2

# define PCA
def pca(data,k):
    data = float32(mat(data)) 
    rows,cols = data.shape#取大小
    data_mean = mean(data,0)#对列求均值
    data_mean_all = tile(data_mean,(rows,1))
    Z = data - data_mean_all
    T1 = Z*Z.T #使用矩阵计算,所以前面mat
    D,V = linalg.eig(T1) #特征值与特征向量
    V1 = V[:,0:k]#取前k个特征向量
    V1 = Z.T*V1
    for i in xrange(k): #特征向量归一化
        L = linalg.norm(V1[:,i])
        V1[:,i] = V1[:,i]/L

    data_new = Z*V1 # 降维后的数据
    return data_new,data_mean,V1

#covert image to vector
def img2vector(filename):
    img = cv2.imread(filename,0) #read as 'gray'
    rows,cols = img.shape
    imgVector = zeros((1,rows*cols)) #create a none vectore:to raise speed
    imgVector = reshape(img,(1,rows*cols)) #change img from 2D to 1D            
    return imgVector

#load dataSet
def loadDataSet(k):  #choose k(0-10) people as traintest for everyone
    ##step 1:Getting data set
    print "--Getting data set---"
    #note to use '/'  not '\'
    dataSetDir = 'C:/Users/myself/.spyder2/machine_learning/att_faces'
    #显示文件夹内容
    choose = random.permutation(10)+1 #随机排序1-10 (0-9)+1
    train_face = zeros((40*k,112*92))
    train_face_number = zeros(40*k)
    test_face = zeros((40*(10-k),112*92))
    test_face_number = zeros(40*(10-k))
    for i in xrange(40): #40 sample people
        people_num = i+1
        for j in xrange(10): #everyone has 10 different face
            if j < k:
                filename = dataSetDir+'/s'+str(people_num)+'/'+str(choose[j])+'.pgm'
                img = img2vector(filename)     
                train_face[i*k+j,:] = img
                train_face_number[i*k+j] = people_num
            else:
                filename = dataSetDir+'/s'+str(people_num)+'/'+str(choose[j])+'.pgm'
                img = img2vector(filename)     
                test_face[i*(10-k)+(j-k),:] = img
                test_face_number[i*(10-k)+(j-k)] = people_num

    return train_face,train_face_number,test_face,test_face_number

# calculate the accuracy of the test_face
def facefind(): 
    # Getting data set
    train_face,train_face_number,test_face,test_face_number = loadDataSet(3)
    # PCA training to train_face
    data_train_new,data_mean,V = pca(train_face,30)
    num_train = data_train_new.shape[0]
    num_test = test_face.shape[0]
    temp_face = test_face - tile(data_mean,(num_test,1))
    data_test_new = temp_face*V #得到测试脸在特征向量下的数据
    data_test_new = array(data_test_new) # mat change to array
    data_train_new = array(data_train_new)
    true_num = 0
    for i in xrange(num_test):
        testFace = data_test_new[i,:]
        diffMat = data_train_new - tile(testFace,(num_train,1))
        sqDiffMat = diffMat**2
        sqDistances = sqDiffMat.sum(axis=1)
        sortedDistIndicies = sqDistances.argsort()
        indexMin = sortedDistIndicies[0]
        if train_face_number[indexMin] == test_face_number[i]:
            true_num += 1

    accuracy = float(true_num)/num_test
    print 'The classify accuracy is: %.2f%%'%(accuracy * 100)


将这个脚本存起来为PCA,使用也很简单的直接调用里面的facefind函数就可以了。

ORL人脸库包含40个人的每个人10张人脸样本,比如上述选择每个人中随机3张作为训练集,其他的作为测试集,并且降维到30维,得到的准确率结果:

import PCA
PCA.facefind()


--Getting data set---
The classify accuracy is: 91.43%


每次运行的结果都不一样,因为每次的训练集样本随机选的不同。当然你也可以修改每个人训练集样本的个数(肯定越多准确率越高),也可以修改降维数。这些在上述的matlab版本中都有做过实验,可以看看那里的对比实验。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: