您的位置:首页 > 其它

Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create

2015-01-13 18:43 337 查看
Note: Many of the code snippets can take a very long time without GPU speedup. Please install the GPU version of GraphLab Create to follow along. 

One of machine learning’s core goals is classification of input data. This is the task of taking
novel data and assigning it to one of a pre-determined number of labels, based on what the classifier learns from a training set. For instance, a classifier could take an image and predict whether it is a cat or a dog.


The pieces of information fed to a classifier for each data point are called features,
and the category they belong to is a ‘target’ or ‘label’. Typically, the classifier is given data points with both features and labels, so that it can learn the correspondence between the two. Later, the classifier
is queried with a data point and the classifier tries to predict what category it belongs to. A large group of these query data-points constitute a prediction-set, and the classifier is usually evaluated on its accuracy, or how many prediction queries it gets
correct.
There are many methods to perform classification, such as SVMs, logistic regression, deep learning, and more. To read about the different methods GraphLab Create supports, I invite you to read
the API Documentation. Today, however, we’ll focus on deep
learning methods, which have recently been shown to give incredible results on challenging problems. Yet this comes at cost of extreme sensitivity to model hyper-parameters and long training time. This means that one can spend months testing different model
configurations, much too long to be worth the effort. 


This blog post focuses on minimizing these pains, and exploring how GraphLab Create 1.1 makes deep learning Easy.


What is Deep Learning?

Before we start, let's explore the idea of deep learning. 'Deep learning' is a phrase being thrown around everywhere in the world of machine learning. In fact, it's even been in The
New York Times. It seems to be helping make tremendous breakthroughs, but what is it? It's a methodology for learning high-level concepts about data, frequently through models that have multiple layers of non-linear transformations. Let's take a moment
to analyze that last sentence. 'Learning high-level concepts about data' means that deep learning models take data, for instance raw pixel values of an image, and learns abstract ideas like 'is animal' or 'is cat' about that data. OK, easy enough, but what
does having 'multiple layers of non-linear transformations' mean. Conceptually, all this means is that you have a composition of simple non-linear functions, forming a complex non-linear function, which can map things as complex as raw pixel values to image
category. Let's illustrate a simple example of  this:

f(x)=cos(a∗x)

g(x)=exp(b∗x)

f(g(x))=cos(a∗exp(b∗x))

Notice how the composition of functions f(g(x)) is
much more complex than either f(x) or g(x).
 Furthermore, by adjusting the values of a and b you
can adjust the mapping between input and output. These values, called parameters, are what is learned in a deep learning model. This same idea of composition is used many, many times within deep learning models, and can enable learning very complex relationships
between input and output. This complexity is what allows deep learning models to attain such amazing results.
The most common class of methods, and what GraphLab uses, within the deep learning domain are Deep Neural Nets (DNNs). Deep Neural Networks are simply artificial neural networks with many hidden
layers. To learn more about artificial neural networks and deep learning, click here.
Typically, DNN’s are used for classification of input, and frequently for images. As I mentioned before, they are very good at this. So we should simply take whatever algorithm I had for image
classification before and replace it with a DNN?
Not so fast.
Before you can do this, you have to choose how many layers your network has. And how many hidden units each layer has. And how to initialize the model parameter values (also known as weights).
And how much L2-regularization to apply. There’s a lot more, too.  Basically, a deep learning model is a machine with many confusing knobs (called hyper-parameters, basically parameters that are not learned by the algorithm) and dials that will not work if
set randomly. 


Making Deep Learning Easy with GraphLab Create

GraphLab Create allows you to get started with neural networks without being an expert by eliminating the need to choose a good architecture and hyper-parameter starting values. Based on the input
data, theneuralnet_classifier.create() function
chooses an  architecture to use and sets reasonable values for hyper-parameters. Let’s check this out on MNIST, a dataset composed of handwritten digits where the task is to identify the digit:
>>> data = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/mnist/sframe/train')
>>> model = graphlab.neuralnet_classifier.create(data, target='label')

Evaluating this model on the prediction data will tell us how well the model functioned:
>>> testing_data = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/mnist/sframe/test')
>>> model.evaluate(testing_data)

{'accuracy': 0.9803000092506409, 'confusion_matrix': Columns:
target_label	int
predicted_label	int
count	int

Rows: 65

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  974  |
|      2       |        0        |   3   |
|      5       |        0        |   1   |
|      6       |        0        |   7   |
|      8       |        0        |   6   |
|      9       |        0        |   5   |
|      0       |        1        |   1   |
|      1       |        1        |  1128 |
|      2       |        1        |   1   |
|      6       |        1        |   3   |
|     ...      |       ...       |  ...  |
+--------------+-----------------+-------+
[65 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

We got 98.1% accuracy. This is deep learning made Easy!


When Deep Learning made Easy is not easy enough, Making Deep Learning Doubly Easy

Although GraphLab Create tries to choose a good architecture and hyper-parameters, this automatic process often isn't enough. Optimal settings are often extremely problem specific, and it’s impossible
to determine them without good intuition, lots of experience, and many PhD students.
Yet, when good hyper-parameter settings come together, results are very strong. What’s more, it’s not uncommon for the task you wanted to solve to be related something that has already been solved.
Take, for example, the task of distinguishing cats from dogs. The famous ImageNet Challenge, for which DNN’s are the state-of-the-art, asks the trained model to categorize input into one of 1000 classes (as Jay described in a previous post).
Shouldn't features that distinguish between categories like lions and wolves should also be useful for discriminating between cats and dogs?
The answer is a definitive yes. It is accomplished by simply removing the output layer of the Deep Neural Network for 1000 categories, and taking the signals that would have been propagating to
the output layer and feeding them as features to any classifier for our new cats vs dogs task. The training procedure breaks down something like this: 
Stage 1: Train a DNN classifier on a large, general dataset. A good example is ImageNet ,with 1000 categories and 1.2 million images. GraphLab hosts a model trained on ImageNet to allow you
to skip this step in your own implementation. Simply load the model with
gl.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')

Stage 2: The outputs of each layer in the DNN can be viewed as a meaningful vector representaion of each image. Extract these feature vectors from the layer prior to the output layer on each image of your task. 
Stage 3: Train a new classifier with those features as input for your own task.  
At first glance, this seems even more complicated than just training the deep learning model . However, Stage 1 is re-usable for many different problems, and GraphLab is hosting
the model so you don't have to train it yourself.  Stage 2 is easy to do with GraphLab's API (as shown below), and Stage 3 is typically done with a simpler classifier than a deep learning model so it's easy to build
yourself. In the end, this pipeline results in not needing to adjust hyper-parameters, faster training, and better performance even in cases where you don't have enough data to train a convention deep learning model. What's more, this technique is effective
even if your Stage 3classification task is relatively unrelated to the task Stage 1 is trained on. 



This idea was first explored by Donahue
et al. (2013), and was used for the Dogs vs Cats competition as described for nolearn's ConvNetFeatures. In
our NeuralNetworkClassifer API, we put this functionality into the.extract_features() method. Let’s
explore the GraphLab Create API on the Cats vs. Dogs dataset. To get a feel for what we're trying to accomplish, a few sample images from the dataset are shown below:




 
First, lets load in the model trained on ImageNet. This corresponds to the end of Stage 1 in our pipeline:
>>> pretrained_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')

Now, let's load in the cats vs dogs images. We resize because the original ImageNet model was trained on 256 x 256 x 3 images:
>>> cats_dogs_sf = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cats_vs_dogs/cats_dogs_sf')
>>> cats_dogs_sf['image'] = graphlab.image_analysis.resize(cats_dogs_sf['image'], 256, 256, 3)

And extract features, per Stage 2 of our pipeline:
>>> cats_dogs_sf['features'] = pretrained_model.extract_features(cats_dogs_sf)
>>> cats_dogs_train, cats_dogs_test = cats_dogs_sf.random_split(0.8)

And now, let's train a simple classifier as described by Stage 3
>>> simple_classifier = graphlab.classifier.create(cats_dogs_train, features = ['features'], target = 'label')

[/code]
And now, to see how our trained model did, we evaluate it:
>>> simple_classifier.evaluate(cats_dogs_test)
{'accuracy': 0.9545091779728652, 'confusion_matrix': Columns:
target_label	str
predicted_label	str
count	int

Rows: 4

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  2406 |
|      0       |        1        |   73  |
|      1       |        0        |  155  |
|      1       |        1        |  2378 |
+--------------+-----------------+-------+
[4 rows x 3 columns]}

[/code]
We get ~96% accuracy! I don’t know about you, but that feels like a pretty good number. For comparisons sake, let’s try using just the .create() method.
>>> model = gl.neuralnet_classifier.create(cats_dogs_train, target='label', features = ['image'] )
>>> model.evaluate(cats_dogs_test)
{'accuracy': 0.6049019694328308, 'confusion_matrix': Columns:
target_label	int
predicted_label	int
count	int

Rows: 4

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  922  |
|      1       |        0        |  415  |
|      0       |        1        |  1600 |
|      1       |        1        |  2163 |
+--------------+-----------------+-------+
[4 rows x 3 columns]}

[/code]
Accuracy is a disappointing 59%. Clearly, combining a simple classifier with the extracted features helped tremendously. And you STILL didn’t have to tune architecture or
hyper-paramters. You don’t even have to take the time to train a NeuralNet classifier, you can just repurpose one that already existed. Sounds like if using .create() was Easy, then using .extract_features() is Doubly Easy!


Making Doubly Easy also Doubly Powerful

It’s always important to make sure any machine learning technique is consistent in its usefulness, and that its success is not afluke. In order to do that, I tested it on the CIFAR-10 dataset developed
by Alex Krizhevsky. The CIFAR-10 dataset has 50000 training images and 10000 prediction images divided into 10 classes. Each images is of size 32x32.  A few examples from each category are shown below:



Let's repeat the procedure we just went through for the Cats vs Dogs dataset:
>>>  cifar_train = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cifar_10/cifar_10_train_sframe')
>>> cifar_test = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cifar_10/cifar_10_test_sframe')
# preprocess
>>> cifar_train['image'] = graphlab.image_analysis.resize(cifar_train['image'], 256, 256, 3)
>>> cifar_test['image'] = graphlab.image_analysis.resize(cifar_test['image'], 256, 256, 3)
# Stage 2
>>> cifar_train['features'] = pretrained_model.extract_features(cifar_train)
>>> cifar_test['features'] = pretrained_model.extract_features(cifar_test)
# Stage 3
>>>  classifier = graphlab.classifier.create(cifar_train, features=['features'], target='label')
# Evaluate
>>> classifier.evaluate(cifar_test)

And evaluate:
{'accuracy': 0.9478, 'confusion_matrix': Columns:
target_label	str
predicted_label	str
count	int

Rows: 100

Data:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      0       |        0        |  733  |
|      0       |        1        |   25  |
|      0       |        2        |   76  |
|      0       |        3        |   19  |
|      0       |        4        |   13  |
|      0       |        5        |   7   |
|      0       |        6        |   8   |
|      0       |        7        |   26  |
|      0       |        8        |   58  |
|      0       |        9        |   23  |
|     ...      |       ...       |  ...  |
+--------------+-----------------+-------+
[100 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

[/code]
We get almost 95% accuracy! In fact, the results are better than any published result and are on par with the winning results from the Kaggle competition. Human
performance is about 94%, to give some perspective. Clearly, feature extraction makes deep learning not only Doubly Easy, but also Doubly Powerful.
Deep learning Models are powerful, and are now easier to use than ever before. Download GraphLab Create , load in our ImageNet model, and tell us your deep learning success stories! 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: