您的位置:首页 > Web前端

caffe笔记之例程学习(二)

2015-02-23 02:54 429 查看

Classification with HDF5 data

1.导入库

import os
import h5py
import shutil
import sklearn
import tempfile
import numpy as np
import pandas as pd
import sklearn.datasets
import sklearn.linear_model
import matplotlib.pyplot as plt
%matplotlib inline


2.产生数据

sklearn.datasets.make_classification产生测试数据。
10000组数据,特征向量维数为4。
sklearn.cross_validation.train_test_split为交叉验证。就是把data拆分为不同的train set和test set。
这里拆分为7500:2500


X, y = sklearn.datasets.make_classification(
n_samples=10000, n_features=4, n_redundant=0, n_informative=2,
n_clusters_per_class=2, hypercube=False, random_state=0
)

# Split into train and test
X, Xt, y, yt = sklearn.cross_validation.train_test_split(X, y)


3.数据可视化

# Visualize sample of the data
# np.random.permutation产生序列或随机交换序列
# X.shape=7500
# 在此产生0-7499乱序序列并取前1000
ind = np.random.permutation(X.shape[0])[:1000]
df = pd.DataFrame(X[ind])
# 绘图 'kde'核密度估计,'hist'直方图
_ = pd.scatter_matrix(df, figsize=(9, 9), diagonal='kde', marker='o', s=40, alpha=.4, c=y[ind])


pd.scatter_matrix函数说明


def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False,
diagonal='hist', marker='.', density_kwds=None,
hist_kwds=None, range_padding=0.05, **kwds):
"""
Draw a matrix of scatter plots.

Parameters
----------
frame : DataFrame
alpha : float, optional
amount of transparency applied
figsize : (float,float), optional
a tuple (width, height) in inches
ax : Matplotlib axis object, optional
grid : bool, optional
setting this to True will show the grid
diagonal : {'hist', 'kde'}
pick between 'kde' and 'hist' for
either Kernel Density Estimation or Histogram
plot in the diagonal
marker : str, optional
Matplotlib marker type, default '.'
hist_kwds : other plotting keyword arguments
To be passed to hist function
density_kwds : other plotting keyword arguments
To be passed to kernel density estimate plot
range_padding : float, optional
relative extension of axis range in x and y
with respect to (x_max - x_min) or (y_max - y_min),
default 0.05
kwds : other plotting keyword arguments
To be passed to scatter function

Examples
--------
>>> df = DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D'])
>>> scatter_matrix(df, alpha=0.2)
"""


View Code
4.SGD learning及正确率

documents:scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

# Train and test the scikit-learn SGD logistic regression.
clf = sklearn.linear_model.SGDClassifier(
loss='log', n_iter=1000, penalty='l2', alpha=1e-3, class_weight='auto')

# Fit linear model with Stochastic Gradient Descent.
clf.fit(X, y)
# Predict class labels for samples in X.
yt_pred = clf.predict(Xt)
print('Accuracy: {:.3f}'.format(sklearn.metrics.accuracy_score(yt, yt_pred)))


5.写HDF5数据。很直观的文件读写操作。需要注意路径。我没有改路径,而是把生成的数据手动复制到了caffe_root/examples/hdf5_classification中

# Write out the data to HDF5 files in a temp directory.
# This file is assumed to be caffe_root/examples/hdf5_classification.ipynb
dirname = os.path.abspath('./hdf5_classification/data')
if not os.path.exists(dirname):
os.makedirs(dirname)

train_filename = os.path.join(dirname, 'train.h5')
test_filename = os.path.join(dirname, 'test.h5')

# HDF5DataLayer source should be a file containing a list of HDF5 filenames.
# To show this off, we'll list the same data file twice.
with h5py.File(train_filename, 'w') as f:
f['data'] = X
f['label'] = y.astype(np.float32)
with open(os.path.join(dirname, 'train.txt'), 'w') as f:
f.write(train_filename + '\n')
f.write(train_filename + '\n')

# HDF5 is pretty efficient, but can be further compressed.
comp_kwargs = {'compression': 'gzip', 'compression_opts': 1}
with h5py.File(test_filename, 'w') as f:
f.create_dataset('data', data=Xt, **comp_kwargs)
f.create_dataset('label', data=yt.astype(np.float32), **comp_kwargs)
with open(os.path.join(dirname, 'test.txt'), 'w') as f:
f.write(test_filename + '\n')


6.更改路径到caffe_root,用solver.prototxt设置参数,train_val.prototxt配置模型。

模型分析看这里www.cnblogs.com/nwpuxuezha/p/4297298.html

# Run caffe. Scroll down in the output to see the final
# test accuracy, which should be about the same as above.
!cd .. && ./build/tools/caffe train -solver examples/hdf5_classification/solver.prototxt


7.使用非线性模型进行优化,用solver2.prototxt设置参数,train_val2.prototxt配置模型。(占坑)

!cd .. && ./build/tools/caffe train -solver examples/hdf5_classification/solver2.prototxt

总结:467步骤我的计算结果和历程中的结果有一些差距,7步骤最高,只能做到0.73左右。原因待思考。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: