您的位置:首页 > 理论基础 > 数据结构算法

pandas的数据结构之Series

2018-01-02 20:11 429 查看
pandas有两个最主要的数据结构Series和DataFrame,要想熟练的运用pandas进行数据分析,离不开Series和DataFrame的运用。Series是一种类似于一维数组的对象,它是由一组数据和一组标签组成,标签与数据之间存在联系。

1、创建一个默认标签的Series

Series字符串的组成形式类似于python中的字典,左边是索引,右边是值。Series默认的索引是从0开始的,如果没有指定索引,它会自动创建一个0到N-1(N为数据的长度)的整数索引。

obj = Series([5, 6, 7, 8])
print(obj)
'''
0 5
1 6
2 7
3 8
'''

2、查看Series的值和索引

可以通过Series的values属性和index属性查看Series的值和索引。

obj = Series([5, 6, 7, 8])
print(obj.index)
#RangeIndex(start=0, stop=4, step=1)
print(obj.values)
#[5 6 7 8]

3、创建一个带有索引的Series

obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print(obj)
'''
a 5
b 6
c 7
d 8
'''

4、通过Series的索引获取值

使用Series的索引获取值,类似于python的字典通过键获取值

obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print(obj["d"])
#8通过Series的索引来获取一组值
obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print(obj[["a","c","d"]])
'''
a 5
c 7
d 8
'''

5、操作Series的值

筛选满足条件的值

obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print(obj[obj > 6])
'''
c 7
d 8
'''将值扩大指定倍数
obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print(obj * 5)
'''
a    25
b    30
c    35
d    40
'''

6、判断索引是否在Series中

obj = Series([5, 6, 7, 8],index=["a","b","c","d"])
print("a" in obj)
#True
print("e" in obj)
#False

7、通过字典来创建Series

字典的键就是Series的索引,字典的值是Series的值

dic = {"a":18,"b":19,"c":20,"d":21}
obj = Series(dic)
print(obj)
'''
a 18
b 19
c 20
d 21
'''

8、修改Series

指定的索引会从字典中寻找相匹配的,如果找不到就返回NaN(not a number 非数字)。在pandas中使用NaN来表示缺失值或者NA值。
dic = {"a":18,"b":19,"c":20,"d":21}
obj = Series(dic,index=["c","d","e"])
print(obj)
'''
c 20.0
d 21.0
e NaN
'''
通过赋值的方式修改索引

obj = Series([1,2,3,4],index=["a","b","c","d"])
obj.index = ["one","two","three","four"]
print(obj)
'''
one 1
two 2
three 3
four 4
'''

9、缺失值判断

pandas提供了isnull和notnull函数来检测缺失值,我们可以使用pd.isnull(obj)来判断缺失值,也可以使用Series提供的isnull函数和notnull函数来判断缺失值。
dic = {"a":18,"b":19,"c":20,"d":21}
obj = Series(dic,index=["c","d","e"])
print(obj.isnull())
'''
c False
d False
e True
'''
print(obj.notnull())
'''
c True
d True
e False
'''

10、Series的数据运算

在算术运算中会自动对齐不同索引的数据,相同索引并且数据类型相同才会相加,否则结果为NaN。

obj1 = Series([1,2,3,4],index=["a","b","c","d"])
obj2 = Series([10,20,30,40],index=["a","b","e","d"])
print(obj1+obj2)
'''
a 11.0
b 22.0
c NaN
d 44.0
e NaN

'''

11、Series的name

Series对象本身和索引都会有一个name属性,默认是None。

obj = Series([1,2,3,4],index=["a","b","c","d"])
print(obj.name)
#None
print(obj.index.name)
#None给name属性赋值
obj = Series([1,2,3,4],index=["a","b","c","d"])
obj.name = "series"
obj.index.name="state"
print(obj.name)
#series
print(obj.index.name)
#state
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: