tensorflow_cookbook:Ch 1: Getting Started with TensorFlow_(7)Data Source Information
2017-05-25 14:50
519 查看
7. Data Source Information
Here are the sources of the data sets used in this book. The following links are the original sources with explanations and citations. The script in this directory demonstrates how to access these data sets.Iris Data
Low Birthweight Data
Housing Price Data
MNIST Dataset of Handwritten Digits
SMS Spam Data
Movie Review Data
William Shakespeare Data
German-English Sentence Data
# 07_data_gathering.py # Data gathering #---------------------------------- # # This function gives us the ways to access # the various data sets we will need # Data Gathering import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from tensorflow.python.framework import ops ops.reset_default_graph() # Iris Data from sklearn import datasets iris = datasets.load_iris() print(len(iris.data)) print(len(iris.target)) print(iris.data[0]) print(set(iris.target)) # Low Birthrate Data import requests birthdata_url = 'https://www.umass.edu/statdata/statdata/data/lowbwt.dat' birth_file = requests.get(birthdata_url) birth_data = birth_file.text.split('\r\n')[5:] birth_header = [x for x in birth_data[0].split(' ') if len(x)>=1] birth_data = [[float(x) for x in y.split(' ') if len(x)>=1] for y in birth_data[1:] if len(y)>=1] print(len(birth_data)) print(len(birth_data[0])) # Housing Price Data import requests housing_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data' housing_header = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'] housing_file = requests.get(housing_url) housing_data = [[float(x) for x in y.split(' ') if len(x)>=1] for y in housing_file.text.split('\n') if len(y)>=1] print(len(housing_data)) print(len(housing_data[0])) # MNIST Handwriting Data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) print(len(mnist.train.images)) print(len(mnist.test.images)) print(len(mnist.validation.images)) print(mnist.train.labels[1,:]) # Ham/Spam Text Data import requests import io from zipfile import ZipFile # Get/read zip file zip_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip' r = requests.get(zip_url) z = ZipFile(io.BytesIO(r.content)) file = z.read('SMSSpamCollection') # Format Data text_data = file.decode() text_data = text_data.encode('ascii',errors='ignore') text_data = text_data.decode().split('\n') text_data = [x.split('\t') for x in text_data if len(x)>=1] [text_data_target, text_data_train] = [list(x) for x in zip(*text_data)] print(len(text_data_train)) print(set(text_data_target)) print(text_data_train[1]) # Movie Review Data import requests import io import tarfile movie_data_url = 'http://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz' r = requests.get(movie_data_url) # Stream data into temp object stream_data = io.BytesIO(r.content) tmp = io.BytesIO() while True: s = stream_data.read(16384) if not s: break tmp.write(s) stream_data.close() tmp.seek(0) # Extract tar file tar_file = tarfile.open(fileobj=tmp, mode="r:gz") pos = tar_file.extractfile('rt-polaritydata/rt-polarity.pos') neg = tar_file.extractfile('rt-polaritydata/rt-polarity.neg') # Save pos/neg reviews pos_data = [] for line in pos: pos_data.append(line.decode('ISO-8859-1').encode('ascii',errors='ignore').decode()) neg_data = [] for line in neg: neg_data.append(line.decode('ISO-8859-1').encode('ascii',errors='ignore').decode()) tar_file.close() print(len(pos_data)) print(len(neg_data)) print(neg_data[0]) # The Works of Shakespeare Data import requests shakespeare_url = 'http://www.gutenberg.org/cache/epub/100/pg100.txt' # Get Shakespeare text response = requests.get(shakespeare_url) shakespeare_file = response.content # Decode binary into string shakespeare_text = shakespeare_file.decode('utf-8') # Drop first few descriptive paragraphs. shakespeare_text = shakespeare_text[7675:] print(len(shakespeare_text)) # English-German Sentence Translation Data import requests import io from zipfile import ZipFile sentence_url = 'http://www.manythings.org/anki/deu-eng.zip' r = requests.get(sentence_url) z = ZipFile(io.BytesIO(r.content)) file = z.read('deu.txt') # Format Data eng_ger_data = file.decode() eng_ger_data = eng_ger_data.encode('ascii',errors='ignore') eng_ger_data = eng_ger_data.decode().split('\n') eng_ger_data = [x.split('\t') for x in eng_ger_data if len(x)>=1] [english_sentence, german_sentence] = [list(x) for x in zip(*eng_ger_data)] print(len(english_sentence)) print(len(german_sentence)) print(eng_ger_data[10])
相关文章推荐
- tensorflow_cookbook:Ch 1: Getting Started with TensorFlow_(1,2)
- tensorflow_cookbook:Ch 1: Getting Started with TensorFlow_(3,4)
- tensorflow_cookbook:Ch 1: Getting Started with TensorFlow_(5,6)
- tensorflow_cookbook:Ch 1: Getting Started with TensorFlow(8)08_Additional_Resources
- TensorFlow学习篇【1】Getting Started With TensorFlow
- 【附原文:深度学习-开始Tensorflow】1.Getting Started With TensorFlow
- 【TensorFlow官网搬运201704】Getting Started With TensorFlow
- Tensorflow学习笔记一:getting started with tensorflow
- 《Tensorflow》Getting Started with Tensorflow
- Getting Started with TensorFlow
- Getting started with TensorFlow
- Getting Started with TensorFlow
- Getting started with TensorFlow
- Getting started with TensorFlow on iOS
- TensorFlow学习笔记7----Large-scale Linear Models with TensorFlow
- Keras Tensorflow tutorial: Practical guide from getting started to developing complex deep neural ne
- TensorFlow官方教程学习笔记之1-TensorFlow入门教程(Getting Started With TensorFlow)
- TensorFlow学习——Getting Started With TensoFlow
- Getting started with SourceTree, Git and git flow
- 读Getting Started With Windows PowerShell笔记