您的位置:首页 > 其它

Using R to Fix Data Quality: Section 2

2013-04-13 11:52 253 查看



Section 2: Visualizing Variables

Overview

In this section, we will talk about how to create charts and graphs so that you can explore your data in a quick visual summary.

Dot Plots & Jitter Plots

An easy way to visualize a single variable is to create a dot plot or a jitter plot.

First of all, we can use the way in section 1 to read the CSV file and check the data.

> data=read.csv("weather.csv")

> head(data)

Ozone Solar.R Wind Temp Month Day

1 41 190 7.4 67 5 1

2 36 118 8.0 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

We can use $ operator to get one column in the table:

> data$Ozone

The easy way to get a dot plot of it:

> stripchart(data$Ozone)

The way to get a jitter plot:

> stripchart(data$Ozone, method="jitter")

Histograms

Jitter plots can be used in low volume data, but it is not a good way when there is a big number of data. Histograms can give you a better view to visualize it. Histograms can separate the x-axis into partitions and make a count of each partition. As a result,
you can see the centralized tendency on it.

The way to make histogram:

> hist(data$Ozone)

Try to change breaks:

> hist(data$Ozone,breaks=2)

> hist(data$Ozone,breaks=100)

Practice Questions

1. What is the centralized tendency of the Ozone?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: