caffe笔记之例程学习(二)
2015-02-23 02:54
429 查看
Classification with HDF5 data
1.导入库import os import h5py import shutil import sklearn import tempfile import numpy as np import pandas as pd import sklearn.datasets import sklearn.linear_model import matplotlib.pyplot as plt %matplotlib inline
2.产生数据
sklearn.datasets.make_classification产生测试数据。 10000组数据,特征向量维数为4。 sklearn.cross_validation.train_test_split为交叉验证。就是把data拆分为不同的train set和test set。 这里拆分为7500:2500
X, y = sklearn.datasets.make_classification( n_samples=10000, n_features=4, n_redundant=0, n_informative=2, n_clusters_per_class=2, hypercube=False, random_state=0 ) # Split into train and test X, Xt, y, yt = sklearn.cross_validation.train_test_split(X, y)
3.数据可视化
# Visualize sample of the data # np.random.permutation产生序列或随机交换序列 # X.shape=7500 # 在此产生0-7499乱序序列并取前1000 ind = np.random.permutation(X.shape[0])[:1000] df = pd.DataFrame(X[ind]) # 绘图 'kde'核密度估计,'hist'直方图 _ = pd.scatter_matrix(df, figsize=(9, 9), diagonal='kde', marker='o', s=40, alpha=.4, c=y[ind])
pd.scatter_matrix函数说明
def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwds): """ Draw a matrix of scatter plots. Parameters ---------- frame : DataFrame alpha : float, optional amount of transparency applied figsize : (float,float), optional a tuple (width, height) in inches ax : Matplotlib axis object, optional grid : bool, optional setting this to True will show the grid diagonal : {'hist', 'kde'} pick between 'kde' and 'hist' for either Kernel Density Estimation or Histogram plot in the diagonal marker : str, optional Matplotlib marker type, default '.' hist_kwds : other plotting keyword arguments To be passed to hist function density_kwds : other plotting keyword arguments To be passed to kernel density estimate plot range_padding : float, optional relative extension of axis range in x and y with respect to (x_max - x_min) or (y_max - y_min), default 0.05 kwds : other plotting keyword arguments To be passed to scatter function Examples -------- >>> df = DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D']) >>> scatter_matrix(df, alpha=0.2) """
View Code
4.SGD learning及正确率
documents:scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html
# Train and test the scikit-learn SGD logistic regression. clf = sklearn.linear_model.SGDClassifier( loss='log', n_iter=1000, penalty='l2', alpha=1e-3, class_weight='auto') # Fit linear model with Stochastic Gradient Descent. clf.fit(X, y) # Predict class labels for samples in X. yt_pred = clf.predict(Xt) print('Accuracy: {:.3f}'.format(sklearn.metrics.accuracy_score(yt, yt_pred)))
5.写HDF5数据。很直观的文件读写操作。需要注意路径。我没有改路径,而是把生成的数据手动复制到了caffe_root/examples/hdf5_classification中
# Write out the data to HDF5 files in a temp directory. # This file is assumed to be caffe_root/examples/hdf5_classification.ipynb dirname = os.path.abspath('./hdf5_classification/data') if not os.path.exists(dirname): os.makedirs(dirname) train_filename = os.path.join(dirname, 'train.h5') test_filename = os.path.join(dirname, 'test.h5') # HDF5DataLayer source should be a file containing a list of HDF5 filenames. # To show this off, we'll list the same data file twice. with h5py.File(train_filename, 'w') as f: f['data'] = X f['label'] = y.astype(np.float32) with open(os.path.join(dirname, 'train.txt'), 'w') as f: f.write(train_filename + '\n') f.write(train_filename + '\n') # HDF5 is pretty efficient, but can be further compressed. comp_kwargs = {'compression': 'gzip', 'compression_opts': 1} with h5py.File(test_filename, 'w') as f: f.create_dataset('data', data=Xt, **comp_kwargs) f.create_dataset('label', data=yt.astype(np.float32), **comp_kwargs) with open(os.path.join(dirname, 'test.txt'), 'w') as f: f.write(test_filename + '\n')
6.更改路径到caffe_root,用solver.prototxt设置参数,train_val.prototxt配置模型。
模型分析看这里www.cnblogs.com/nwpuxuezha/p/4297298.html
# Run caffe. Scroll down in the output to see the final # test accuracy, which should be about the same as above. !cd .. && ./build/tools/caffe train -solver examples/hdf5_classification/solver.prototxt
7.使用非线性模型进行优化,用solver2.prototxt设置参数,train_val2.prototxt配置模型。(占坑)
!cd .. && ./build/tools/caffe train -solver examples/hdf5_classification/solver2.prototxt
总结:467步骤我的计算结果和历程中的结果有一些差距,7步骤最高,只能做到0.73左右。原因待思考。
相关文章推荐
- caffe笔记之例程学习
- 深度学习框架Caffe学习笔记(2)-MNIST手写数字识别例程
- caffe笔记之例程学习(三)
- 深度学习框架Caffe学习笔记(7)-cifar10例程
- 深度学习框架Caffe学习笔记(3)-MNIST例程深入
- 基础学习笔记之opencv(16):grabcut使用例程
- Delphi 2010学习笔记(20)---例程的定义与使用---2011-01-26
- halcon例程学习笔记(2)----check_smd_tilt.hdev
- Delphi 2010学习笔记(17)---程序终止例程---2011-01-21
- caffe 学习笔记(一)
- halcon例程学习笔记(7)---检测漏焊board.hdev
- contiki学习笔记——cc2530dk例程实践和UDP重启问题解决
- Java学习笔记2:理解运行例程
- openFrameworks 学习笔记(一): 简单例程分析
- [笔记2.1]学习AFEPack的例程
- 学习笔记:Caffe上LeNet模型理解
- OpenCV学习笔记一 例程
- OpenCV学习笔记: 快速入门例程
- OpenCV学习笔记: 快速入门例程
- halcon例程学习笔记(5)----halcon中如何自己创建子过程