用pandas分析百万电影数据
2016-05-29 17:48
621 查看
用pandas分析电影数据
Lift is short, use Python.用Python做数据分析,pandas是Python数据分析的重要包,其他重要的包:numpy、matplotlib .
安装pandas(Linux, Mac, Windows皆同):
pip install pandas
电影数据来源:http://grouplens.org/datasets/movielens/
下载数据文件解压,包含如下4个文件:
users.dat 用户数据
movies.dat 电影数据
ratings.dat 评分数据
README 文件解释
查看README文件,可知源数据文件的格式:
users.dat (UserID::Gender::Age::Occupation::Zip-code)
movies.dat (MovieID::Title::Genres)
ratings.dat (UserID::MovieID::Rating::Timestamp)
特别解释:Occupation用户职业,Zip-code邮编, Timestamp时间戳, Genres电影类型(更多解释可以查看README文件).
文件中各每条数据的分割符是 ::
环境:
OS:Windows
Language:Python3.4
编辑器:Jupyter
用pandas读取数据.
导入必要的头文件:
import matplotlib.pyplot as plt import numpy as np import pandas as pd
读取数据,先定义字段名,因为源数据中无字段名,只有用’::’分割的每条数据.
user_names = ['user_id', 'gender', 'age', 'occupation', 'zip'] #用户表的数据字段名
读取数据,注意源文件的地址.
users = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\users.dat', sep='::', header=None, names=user_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'. if __name__ == '__main__':
上面有个警告,可以不管,即:加载数据是用的python engine 而不是 c engine.(更多请google)
查看有多少个数据.
前5行数据.
print(len(users)) users.head()
6040
user_id | gender | age | occupation | zip | |
---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 |
1 | 2 | M | 56 | 16 | 70072 |
2 | 3 | M | 25 | 15 | 55117 |
3 | 4 | M | 45 | 7 | 02460 |
4 | 5 | M | 25 | 20 | 55455 |
ratings_names = ['user_id', 'movie_id', 'rating', 'timestamp'] ratings = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\ratings.dat', sep='::', header=None, names=ratings_names) movies_names = ['movie_id', 'title', 'genres'] movies = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\movies.dat', sep='::', header=None, names=movies_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'. from ipykernel import kernelapp as app D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
加载数据需要一点点时间,应为数据有上百万条.
查看ratings表,movies表.
print(len(ratings)) ratings.head()
1000209
user_id | movie_id | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
print(len(movies)) movies.head()
3883
movie_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children’s|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children’s|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
将3个表合并为一个表data .
data = pd.merge(pd.merge(users, ratings), movies) print(len(data)) data.head()
1000209
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1 | 2 | M | 56 | 16 | 70072 | 1193 | 5 | 978298413 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
2 | 12 | M | 25 | 12 | 32793 | 1193 | 4 | 978220179 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
3 | 15 | M | 25 | 7 | 22903 | 1193 | 4 | 978199279 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
4 | 17 | M | 50 | 1 | 95350 | 1193 | 5 | 978158471 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
data[data.user_id==1]
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1725 | 1 | F | 1 | 10 | 48067 | 661 | 3 | 978302109 | James and the Giant Peach (1996) | Animation|Children’s|Musical |
2250 | 1 | F | 1 | 10 | 48067 | 914 | 3 | 978301968 | My Fair Lady (1964) | Musical|Romance |
2886 | 1 | F | 1 | 10 | 48067 | 3408 | 4 | 978300275 | Erin Brockovich (2000) | Drama |
4201 | 1 | F | 1 | 10 | 48067 | 2355 | 5 | 978824291 | Bug’s Life, A (1998) | Animation|Children’s|Comedy |
5904 | 1 | F | 1 | 10 | 48067 | 1197 | 3 | 978302268 | Princess Bride, The (1987) | Action|Adventure|Comedy|Romance |
8222 | 1 | F | 1 | 10 | 48067 | 1287 | 5 | 978302039 | Ben-Hur (1959) | Action|Adventure|Drama |
8926 | 1 | F | 1 | 10 | 48067 | 2804 | 5 | 978300719 | Christmas Story, A (1983) | Comedy|Drama |
10278 | 1 | F | 1 | 10 | 48067 | 594 | 4 | 978302268 | Snow White and the Seven Dwarfs (1937) | Animation|Children’s|Musical |
11041 | 1 | F | 1 | 10 | 48067 | 919 | 4 | 978301368 | Wizard of Oz, The (1939) | Adventure|Children’s|Drama|Musical |
12759 | 1 | F | 1 | 10 | 48067 | 595 | 5 | 978824268 | Beauty and the Beast (1991) | Animation|Children’s|Musical |
13819 | 1 | F | 1 | 10 | 48067 | 938 | 4 | 978301752 | Gigi (1958) | Musical |
14006 | 1 | F | 1 | 10 | 48067 | 2398 | 4 | 978302281 | Miracle on 34th Street (1947) | Drama |
14386 | 1 | F | 1 | 10 | 48067 | 2918 | 4 | 978302124 | Ferris Bueller’s Day Off (1986) | Comedy |
15859 | 1 | F | 1 | 10 | 48067 | 1035 | 5 | 978301753 | Sound of Music, The (1965) | Musical |
16741 | 1 | F | 1 | 10 | 48067 | 2791 | 4 | 978302188 | Airplane! (1980) | Comedy |
18472 | 1 | F | 1 | 10 | 48067 | 2687 | 3 | 978824268 | Tarzan (1999) | Animation|Children’s |
18914 | 1 | F | 1 | 10 | 48067 | 2018 | 4 | 978301777 | Bambi (1942) | Animation|Children’s |
19503 | 1 | F | 1 | 10 | 48067 | 3105 | 5 | 978301713 | Awakenings (1990) | Drama |
20183 | 1 | F | 1 | 10 | 48067 | 2797 | 4 | 978302039 | Big (1988) | Comedy|Fantasy |
21674 | 1 | F | 1 | 10 | 48067 | 2321 | 3 | 978302205 | Pleasantville (1998) | Comedy |
22832 | 1 | F | 1 | 10 | 48067 | 720 | 3 | 978300760 | Wallace & Gromit: The Best of Aardman Animatio… | Animation |
23270 | 1 | F | 1 | 10 | 48067 | 1270 | 5 | 978300055 | Back to the Future (1985) | Comedy|Sci-Fi |
25853 | 1 | F | 1 | 10 | 48067 | 527 | 5 | 978824195 | Schindler’s List (1993) | Drama|War |
28157 | 1 | F | 1 | 10 | 48067 | 2340 | 3 | 978300103 | Meet Joe Black (1998) | Romance |
28501 | 1 | F | 1 | 10 | 48067 | 48 | 5 | 978824351 | Pocahontas (1995) | Animation|Children’s|Musical|Romance |
28883 | 1 | F | 1 | 10 | 48067 | 1097 | 4 | 978301953 | E.T. the Extra-Terrestrial (1982) | Children’s|Drama|Fantasy|Sci-Fi |
31152 | 1 | F | 1 | 10 | 48067 | 1721 | 4 | 978300055 | Titanic (1997) | Drama|Romance |
32698 | 1 | F | 1 | 10 | 48067 | 1545 | 4 | 978824139 | Ponette (1996) | Drama |
32771 | 1 | F | 1 | 10 | 48067 | 745 | 3 | 978824268 | Close Shave, A (1995) | Animation|Comedy|Thriller |
33428 | 1 | F | 1 | 10 | 48067 | 2294 | 4 | 978824291 | Antz (1998) | Animation|Children’s |
34073 | 1 | F | 1 | 10 | 48067 | 3186 | 4 | 978300019 | Girl, Interrupted (1999) | Drama |
34504 | 1 | F | 1 | 10 | 48067 | 1566 | 4 | 978824330 | Hercules (1997) | Adventure|Animation|Children’s|Comedy|Musical |
34973 | 1 | F | 1 | 10 | 48067 | 588 | 4 | 978824268 | Aladdin (1992) | Animation|Children’s|Comedy|Musical |
36324 | 1 | F | 1 | 10 | 48067 | 1907 | 4 | 978824330 | Mulan (1998) | Animation|Children’s |
36814 | 1 | F | 1 | 10 | 48067 | 783 | 4 | 978824291 | Hunchback of Notre Dame, The (1996) | Animation|Children’s|Musical |
37204 | 1 | F | 1 | 10 | 48067 | 1836 | 5 | 978300172 | Last Days of Disco, The (1998) | Drama |
37339 | 1 | F | 1 | 10 | 48067 | 1022 | 5 | 978300055 | Cinderella (1950) | Animation|Children’s|Musical |
37916 | 1 | F | 1 | 10 | 48067 | 2762 | 4 | 978302091 | Sixth Sense, The (1999) | Thriller |
40375 | 1 | F | 1 | 10 | 48067 | 150 | 5 | 978301777 | Apollo 13 (1995) | Drama |
41626 | 1 | F | 1 | 10 | 48067 | 1 | 5 | 978824268 | Toy Story (1995) | Animation|Children’s|Comedy |
43703 | 1 | F | 1 | 10 | 48067 | 1961 | 5 | 978301590 | Rain Man (1988) | Drama |
45033 | 1 | F | 1 | 10 | 48067 | 1962 | 4 | 978301753 | Driving Miss Daisy (1989) | Drama |
45685 | 1 | F | 1 | 10 | 48067 | 2692 | 4 | 978301570 | Run Lola Run (Lola rennt) (1998) | Action|Crime|Romance |
46757 | 1 | F | 1 | 10 | 48067 | 260 | 4 | 978300760 | Star Wars: Episode IV - A New Hope (1977) | Action|Adventure|Fantasy|Sci-Fi |
49748 | 1 | F | 1 | 10 | 48067 | 1028 | 5 | 978301777 | Mary Poppins (1964) | Children’s|Comedy|Musical |
50759 | 1 | F | 1 | 10 | 48067 | 1029 | 5 | 978302205 | Dumbo (1941) | Animation|Children’s|Musical |
51327 | 1 | F | 1 | 10 | 48067 | 1207 | 4 | 978300719 | To Kill a Mockingbird (1962) | Drama |
52255 | 1 | F | 1 | 10 | 48067 | 2028 | 5 | 978301619 | Saving Private Ryan (1998) | Action|Drama|War |
54908 | 1 | F | 1 | 10 | 48067 | 531 | 4 | 978302149 | Secret Garden, The (1993) | Children’s|Drama |
55246 | 1 | F | 1 | 10 | 48067 | 3114 | 4 | 978302174 | Toy Story 2 (1999) | Animation|Children’s|Comedy |
56831 | 1 | F | 1 | 10 | 48067 | 608 | 4 | 978301398 | Fargo (1996) | Crime|Drama|Thriller |
59344 | 1 | F | 1 | 10 | 48067 | 1246 | 4 | 978302091 | Dead Poets Society (1989) | Drama |
mean_ratings_by_gender = data.pivot_table(values='rating',index='title',columns='gender', aggfunc='mean') mean_ratings_by_gender.head(10)#查看前10条数据
gender | F | M |
---|---|---|
title | ||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 |
‘Night Mother (1986) | 3.388889 | 3.352941 |
‘Til There Was You (1997) | 2.675676 | 2.733333 |
‘burbs, The (1989) | 2.793478 | 2.962085 |
…And Justice for All (1979) | 3.828571 | 3.689024 |
1-900 (1994) | 2.000000 | 3.000000 |
10 Things I Hate About You (1999) | 3.646552 | 3.311966 |
101 Dalmatians (1961) | 3.791444 | 3.500000 |
101 Dalmatians (1996) | 3.240000 | 2.911215 |
12 Angry Men (1957) | 4.184397 | 4.328421 |
mean_ratings_by_gender['diff'] = mean_ratings_by_gender.F - mean_ratings_by_gender.M mean_ratings_by_gender.head()
gender | F | M | diff |
---|---|---|---|
title | |||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 | 0.613095 |
‘Night Mother (1986) | 3.388889 | 3.352941 | 0.035948 |
‘Til There Was You (1997) | 2.675676 | 2.733333 | -0.057658 |
‘burbs, The (1989) | 2.793478 | 2.962085 | -0.168607 |
…And Justice for All (1979) | 3.828571 | 3.689024 | 0.139547 |
mean_ratings_by_gender.sort_values(by='diff',ascending=True).head() #男高女低
gender | F | M | diff |
---|---|---|---|
title | |||
Tigrero: A Film That Was Never Made (1994) | 1.0 | 4.333333 | -3.333333 |
Neon Bible, The (1995) | 1.0 | 4.000000 | -3.000000 |
Enfer, L’ (1994) | 1.0 | 3.750000 | -2.750000 |
Stalingrad (1993) | 1.0 | 3.593750 | -2.593750 |
Killer: A Journal of Murder (1995) | 1.0 | 3.428571 | -2.428571 |
mean_ratings_by_gender.sort_values(by='diff',ascending=False).head() #女高男低
gender | F | M | diff |
---|---|---|---|
title | |||
James Dean Story, The (1957) | 4.000000 | 1.000000 | 3.000000 |
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919) | 4.000000 | 1.000000 | 3.000000 |
Country Life (1994) | 5.000000 | 2.000000 | 3.000000 |
Babyfever (1994) | 3.666667 | 1.000000 | 2.666667 |
Woman of Paris, A (1923) | 5.000000 | 2.428571 | 2.571429 |
total_rating_by_title = data.groupby('title').size() total_rating_by_title #第一列是电影标题,第二列是评分次数
title $1,000,000 Duck (1971) 37 'Night Mother (1986) 70 'Til There Was You (1997) 52 'burbs, The (1989) 303 ...And Justice for All (1979) 199 1-900 (1994) 2 10 Things I Hate About You (1999) 700 101 Dalmatians (1961) 565 101 Dalmatians (1996) 364 12 Angry Men (1957) 616 13th Warrior, The (1999) 750 187 (1997) 55 2 Days in the Valley (1996) 286 20 Dates (1998) 139 20,000 Leagues Under the Sea (1954) 575 200 Cigarettes (1999) 181 2001: A Space Odyssey (1968) 1716 2010 (1984) 470 24 7: Twenty Four Seven (1997) 5 24-hour Woman (1998) 9 28 Days (2000) 505 3 Ninjas: High Noon On Mega Mountain (1998) 47 3 Strikes (2000) 4 301, 302 (1995) 9 39 Steps, The (1935) 253 400 Blows, The (Les Quatre cents coups) (1959) 187 42 Up (1998) 88 52 Pick-Up (1986) 140 54 (1998) 259 7th Voyage of Sinbad, The (1958) 258 ... Wrongfully Accused (1998) 123 Wyatt Earp (1994) 270 X-Files: Fight the Future, The (1998) 996 X-Men (2000) 1511 X: The Unknown (1956) 12 Xiu Xiu: The Sent-Down Girl (Tian yu) (1998) 69 Yankee Zulu (1994) 2 Yards, The (1999) 77 Year My Voice Broke, The (1987) 27 Year of Living Dangerously (1982) 391 Year of the Horse (1997) 4 Yellow Submarine (1968) 399 Yojimbo (1961) 215 You Can't Take It With You (1938) 77 You So Crazy (1994) 13 You've Got Mail (1998) 838 Young Doctors in Love (1982) 79 Young Frankenstein (1974) 1193 Young Guns (1988) 562 Young Guns II (1990) 369 Young Poisoner's Handbook, The (1995) 79 Young Sherlock Holmes (1985) 379 Young and Innocent (1937) 10 Your Friends and Neighbors (1998) 109 Zachariah (1971) 2 Zed & Two Noughts, A (1985) 29 Zero Effect (1998) 301 Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995) 2 Zeus and Roxanne (1997) 23 eXistenZ (1999) 410 dtype: int64
评分次数最多的10部电影.
top_10_total_rating = total_rating_by_title.sort_values(ascending=False).head(10) top_10_total_rating
title American Beauty (1999) 3428 Star Wars: Episode IV - A New Hope (1977) 2991 Star Wars: Episode V - The Empire Strikes Back (1980) 2990 Star Wars: Episode VI - Return of the Jedi (1983) 2883 Jurassic Park (1993) 2672 Saving Private Ryan (1998) 2653 Terminator 2: Judgment Day (1991) 2649 Matrix, The (1999) 2590 Back to the Future (1985) 2583 Silence of the Lambs, The (1991) 2578 dtype: int64
可以看出,评分次数最多的电影一般是我们比较熟知的电影,一般可认为是热门电影. 再来看看评分最高的10大电影(注:最高分为5.0)
mean_ratings_by_title = data.pivot_table(values='rating',index='title',aggfunc='mean') top_10_mean_ratings = mean_ratings_by_title.sort_values(ascending=False).head(10) top_10_mean_ratings
title Gate of Heavenly Peace, The (1995) 5.0 Lured (1947) 5.0 Ulysses (Ulisse) (1954) 5.0 Smashing Time (1967) 5.0 Follow the Bitch (1998) 5.0 Song of Freedom (1936) 5.0 Bittersweet Motel (2000) 5.0 Baby, The (1973) 5.0 One Little Indian (1973) 5.0 Schlafes Bruder (Brother of Sleep) (1995) 5.0 Name: rating, dtype: float64
评分人数最多的10部电影的平均评分.
mean_ratings_by_title[top_10_total_rating.index]
title American Beauty (1999) 4.317386 Star Wars: Episode IV - A New Hope (1977) 4.453694 Star Wars: Episode V - The Empire Strikes Back (1980) 4.292977 Star Wars: Episode VI - Return of the Jedi (1983) 4.022893 Jurassic Park (1993) 3.763847 Saving Private Ryan (1998) 4.337354 Terminator 2: Judgment Day (1991) 4.058513 Matrix, The (1999) 4.315830 Back to the Future (1985) 3.990321 Silence of the Lambs, The (1991) 4.351823 Name: rating, dtype: float64
可以了解到评论人数最多的10部电影在平均评分最高的10大中排名并不高,评分高的电影有一部分是我们不熟知的电影,是不是数据有问题呢?其实不是, 假如有某部烂片,去观影的人很少,这很少的人给了很高的评分,所以导致一些评论人数很少但平均评分和高的电影.
如若不信,请看数据,评分最高的10大电影的评论次数
total_rating_by_title[top_10_mean_ratings.index]
title Gate of Heavenly Peace, The (1995) 3 Lured (1947) 1 Ulysses (Ulisse) (1954) 1 Smashing Time (1967) 2 Follow the Bitch (1998) 1 Song of Freedom (1936) 1 Bittersweet Motel (2000) 1 Baby, The (1973) 1 One Little Indian (1973) 1 Schlafes Bruder (Brother of Sleep) (1995) 1 dtype: int64
现在来重新统计10大热门电影,此处认为热门电影至少有1000人评论。 统计出热门电影
hot_movie = total_rating_by_title[total_rating_by_title>1000] print(len(hot_movie)) hot_movie
207 title 2001: A Space Odyssey (1968) 1716 Abyss, The (1989) 1715 African Queen, The (1951) 1057 Air Force One (1997) 1076 Airplane! (1980) 1731 Aladdin (1992) 1351 Alien (1979) 2024 Aliens (1986) 1820 Amadeus (1984) 1382 American Beauty (1999) 3428 American Pie (1999) 1389 American President, The (1995) 1033 Animal House (1978) 1207 Annie Hall (1977) 1334 Apocalypse Now (1979) 1176 Apollo 13 (1995) 1251 Arachnophobia (1990) 1367 Armageddon (1998) 1110 As Good As It Gets (1997) 1424 Austin Powers: International Man of Mystery (1997) 1205 Austin Powers: The Spy Who Shagged Me (1999) 1434 Babe (1995) 1751 Back to the Future (1985) 2583 Back to the Future Part II (1989) 1158 Back to the Future Part III (1990) 1148 Batman (1989) 1431 Batman Returns (1992) 1031 Beauty and the Beast (1991) 1060 Beetlejuice (1988) 1495 Being John Malkovich (1999) 2241 ... Superman (1978) 1222 Talented Mr. Ripley, The (1999) 1331 Taxi Driver (1976) 1240 Terminator 2: Judgment Day (1991) 2649 Terminator, The (1984) 2098 Thelma & Louise (1991) 1417 There's Something About Mary (1998) 1371 This Is Spinal Tap (1984) 1118 Thomas Crown Affair, The (1999) 1089 Three Kings (1999) 1021 Time Bandits (1981) 1010 Titanic (1997) 1546 Top Gun (1986) 1010 Total Recall (1990) 1996 Toy Story (1995) 2077 Toy Story 2 (1999) 1585 True Lies (1994) 1400 Truman Show, The (1998) 1005 Twelve Monkeys (1995) 1511 Twister (1996) 1110 Untouchables, The (1987) 1127 Usual Suspects, The (1995) 1783 Wayne's World (1992) 1120 When Harry Met Sally... (1989) 1568 Who Framed Roger Rabbit? (1988) 1799 Willy Wonka and the Chocolate Factory (1971) 1313 Witness (1985) 1046 Wizard of Oz, The (1939) 1718 X-Men (2000) 1511 Young Frankenstein (1974) 1193 dtype: int64
#热门电影的评分 hot_movie_mean_rating = mean_ratings_by_title[hot_movie.index] print(len(hot_movie_mean_rating)) hot_movie_mean_rating
207 title 2001: A Space Odyssey (1968) 4.068765 Abyss, The (1989) 3.683965 African Queen, The (1951) 4.251656 Air Force One (1997) 3.588290 Airplane! (1980) 3.971115 Aladdin (1992) 3.788305 Alien (1979) 4.159585 Aliens (1986) 4.125824 Amadeus (1984) 4.251809 American Beauty (1999) 4.317386 American Pie (1999) 3.709863 American President, The (1995) 3.793804 Animal House (1978) 4.053024 Annie Hall (1977) 4.141679 Apocalypse Now (1979) 4.243197 Apollo 13 (1995) 4.073541 Arachnophobia (1990) 3.002926 Armageddon (1998) 3.191892 As Good As It Gets (1997) 3.950140 Austin Powers: International Man of Mystery (1997) 3.710373 Austin Powers: The Spy Who Shagged Me (1999) 3.388424 Babe (1995) 3.891491 Back to the Future (1985) 3.990321 Back to the Future Part II (1989) 3.343696 Back to the Future Part III (1990) 3.242160 Batman (1989) 3.600978 Batman Returns (1992) 2.976722 Beauty and the Beast (1991) 3.885849 Beetlejuice (1988) 3.567893 Being John Malkovich (1999) 4.125390 ... Superman (1978) 3.536825 Talented Mr. Ripley, The (1999) 3.503381 Taxi Driver (1976) 4.183871 Terminator 2: Judgment Day (1991) 4.058513 Terminator, The (1984) 4.152050 Thelma & Louise (1991) 3.680311 There's Something About Mary (1998) 3.904449 This Is Spinal Tap (1984) 4.179785 Thomas Crown Affair, The (1999) 3.641873 Three Kings (1999) 3.807052 Time Bandits (1981) 3.694059 Titanic (1997) 3.583441 Top Gun (1986) 3.686139 Total Recall (1990) 3.682365 Toy Story (1995) 4.146846 Toy Story 2 (1999) 4.218927 True Lies (1994) 3.634286 Truman Show, The (1998) 3.861692 Twelve Monkeys (1995) 3.945731 Twister (1996) 3.173874 Untouchables, The (1987) 4.007986 Usual Suspects, The (1995) 4.517106 Wayne's World (1992) 3.600893 When Harry Met Sally... (1989) 4.073342 Who Framed Roger Rabbit? (1988) 3.679822 Willy Wonka and the Chocolate Factory (1971) 3.861386 Witness (1985) 3.996176 Wizard of Oz, The (1939) 4.247963 X-Men (2000) 3.820649 Young Frankenstein (1974) 4.250629 Name: rating, dtype: float64
#评论人数>=1000的10大评分最高电影 top_10_rating_movie = hot_movie_mean_rating.sort_values(ascending=False).head(10) top_10_rating_movie
title Shawshank Redemption, The (1994) 4.554558 Godfather, The (1972) 4.524966 Usual Suspects, The (1995) 4.517106 Schindler's List (1993) 4.510417 Raiders of the Lost Ark (1981) 4.477725 Rear Window (1954) 4.476190 Star Wars: Episode IV - A New Hope (1977) 4.453694 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) 4.449890 Casablanca (1942) 4.412822 Sixth Sense, The (1999) 4.406263 Name: rating, dtype: float64
%matplotlib inline #在ipython(或jupyter)中使用此命令,其他则不必 import matplotlib.pyplot as plt import numpy as np x = np.arange(1,11) y = top_10_rating_movie.values name = top_10_rating_movie.index #画出图像 plt.plot(x, y, 'r-o') #添加注释 for i in range(10): plt.text(x[i], y[i], name[i]) #设置坐标范围 plt.xlim(0, 15) plt.ylim(4.4, 4.56) #设置坐标标题 #plt.xlabel('Rank') #plt.ylabel=('Rating') #plt.show() #非ipython用户使用此命令
这图太丑,献上下图:
import matplotlib.pyplot as plt import numpy as np plt.rcdefaults() people = name y_pos = np.arange(len(people)) performance = y error = np.random.rand(len(people)) plt.barh(y_pos, performance, xerr=error, align='center', alpha=0.4) plt.yticks(y_pos, people) #plt.xlabel('Rating') #plt.title('Rank') #plt.show() #非ipython用户使用此命令
)
相关文章推荐
- Python动态类型的学习---引用的理解
- Python3写爬虫(四)多线程实现数据爬取
- 垃圾邮件过滤器 python简单实现
- 下载并遍历 names.txt 文件,输出长度最长的回文人名。
- install and upgrade scrapy
- Scrapy的架构介绍
- Centos6 编译安装Python
- 使用Python生成Excel格式的图片
- 让Python文件也可以当bat文件运行
- [Python]推算数独
- Python中zip()函数用法举例
- Python中map()函数浅析
- Python将excel导入到mysql中
- Python在CAM软件Genesis2000中的应用
- 使用Shiboken为C++和Qt库创建Python绑定
- FREEBASIC 编译可被python调用的dll函数示例
- Python 七步捉虫法