[Javascript] Classify JSON text data with machine learning in Natural
2017-10-03 20:23
585 查看
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
In a new project, we can test the train result by:
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
// train data [{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train var natural = require('natural'); var fs = require('fs'); var classifier = new natural.BayesClassifier(); fs.readFile('training_data.json', 'utf-8', function(err, data){ if (err){ console.log(err); } else { var trainingData = JSON.parse(data); train(trainingData); } }); function train(trainingData){ console.log("Training"); trainingData.forEach(function(item){ classifier.addDocument(item.text, item.label); }); var startTime = new Date(); classifier.train(); var endTime = new Date(); var trainingTime = (endTime-startTime)/1000.0; console.log("Training time:", trainingTime, "seconds"); loadTestData(); } function loadTestData(){ console.log("Loading test data"); fs.readFile('test_data.json', 'utf-8', function(err, data){ if (err){ console.log(err); } else { var testData = JSON.parse(data); testClassifier(testData); } }); } function testClassifier(testData){ console.log("Testing classifier"); var numCorrect = 0; testData.forEach(function(item){ var labelGuess = classifier.classify(item.text); if (labelGuess === item.label){ numCorrect++; } }); console.log("Correct %:", numCorrect/testData.length); saveClassifier(classifier) }
function saveClassifier(classifier){ classifier.save('classifier.json', function(err, classifier){ if (err){ console.log(err); } else { console.log("Classifier saved!"); } }); }
In a new project, we can test the train result by:
var natural = require('natural'); natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){ if (err){ console.log(err); } else { var testComment = "is this about the sun and moon?"; console.log(classifier.classify(testComment)); } });
相关文章推荐
- [Javascript] Classify text into categories with machine learning in Natural
- Dealing with unbalanced data in machine learning
- How To Load CSV Machine Learning Data in Weka (如何在Weka中加载CSV机器学习数据)
- TensorFlow Machine Learning with Financial Data on Google Cloud Platform
- Detecting Text in Natural Image with Connectionist Text Proposal Network
- Detecting Text in Natural Image with Connectionist Text Proposal Network论文笔记
- Mastering Machine Learning with Python in Six Steps 免积分下载
- 论文阅读(Weilin Huang——【ECCV2016】Detecting Text in Natural Image with Connectionist Text Proposal Network)
- [论文复现]Detecting Text in Natural Image with Connectionist Text Proposal Network
- ctpn-Detecting Text in Natural Image with Connectionist Text Proposal Network 论文解读
- 译文:Detecting Text in Natural Image with Connectionist Text Proposal Network
- Replace JSON.NET with ServiceStack.Text in ASP.NET Web API
- directly receive json data from javascript in mvc
- CTPN: Detecting Text in Natural Image with Connectionist Text Proposal Network
- 论文笔记之Synthetic Data for Text Localisation in Natural Images(人工合成带有文本的图片)
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
- 论文阅读:Synthetic Data for Text Localisation in Natural Images
- Rescuing a running virtual machine with dd when datastore metadata is inacces
- using Silverlight 4 in an ASP.NET MVC 3 application and accessing data with JSON
- Trigger a button click with JavaScript on the Enter key in a text box